The following is a model of an ideal software structure. Whether or not socrates will use it or a variation is by no means decided.
This is basis for a development discussion, where focus should be on community, testability and expandability.
# A picture says more than 1000 words
The archives contains numerous mails to this theme, often with terminology misunderstanding.
+---------- Consumers ---------+
| +--- convert -----+ |
| +--- editor ------+ | |
| +--- dfutil ------+ |-+ |
| +--- html view ---+ |-+ |
| | |-+ |
| +-----------------+ |
+------------------------------+
/\
||
\/
+--------- DocFormat ----------+
| +--- Helper --------+ |
| +--- Management ----+ | |
| +--- Portability ---+ |-+ |
| +--- DataCapsule ---+ |-+ |
| | |-+ |
| +-------------------+ |
+------------------------------+
/\
||
\/
+---------- Filters -----------+
| +--- XML -----+ |
| +--- ODF -----+ | |
| +--- PDF -----+ |-+ |
| +--- HTML ----+ |-+ |
| +--- OXML ----+ |-+ |
| +--- LATEX ---+ |-+ |
| | |-+ |
| +-------------+ |
+------------------------------+
# Description
The following is a description of the modules, which are foreseen but not all made. The description is not to be thought of a developers blue print for programmning, but merely to understand the title.
## Consumers
Most of the consumers will hopefully be created outside the socrates project. The Socrates project will supply a couple of examples.
### convert
A command line utility that can convert between all formats.
### editor
A Qt based editor, that can edit the Datacapsule
### dfutil
The black box and white box unit module. It has all white box unit tests compiled in, and dump result files for black box unit test.
Remark, dfutil currently exist, and need only minor modifications.
### html view
A html viewer, basically firefox or internet explorer.
We simply provide the documentation how to use it.
## DocFormat
DocFormat is the kernel of Socrates, the part which everything else turn
around. DocFormat is already available, and only need some cleaning up.
### Helper
Helper is a library within DocFormat, that offers speciality functions to all
other parts of Socrates. We do not want to use 3rd party libraries all over
the source, so the functions we make available from a 3rd party library is
always covered by functions in Helper.
Helper guarantees that we can freely exchange libraries (e.g. glib instead of
zlib or xalan-c instead of libxml). The freedom is important since these
libraries might change license to a stricter one, which limit our distribution
possibilities.
Helper also guarantees that the rest of Socrates does not need to care about
library versions and differences.
### Management
Management is merely a set of functions, that tie the system together.
Some examples:
- registration of file suffix to filter type,
- open/close files
- activate/deactivate a filter
### Portability
Portability is the place where all platform differences are hidden. Socrates
source are only allowed to use ansi standard functions, if a OS specific
function like dirSearch is needed, portability provides a cover function.
Portability does for platforms what helper does for 3rd party libraries.
### DataCapsule
DataCapsule is the in-memory storage of documents. It contain functions to
traverse the document, copy/move/add/delete atoms in the document.
When a filter read a document it stores the content as atoms in the
DataCapsule. Likewise when a filter wants to write a document, it traverses
the document atom by atom and writes the file.
In theory the way atoms store data is unknown to the filters, who only have
access functions, but for practical purposes (avoid making tons on new header
files) we need to define some "standard" format to use, e.g. css for styles.
The exact definition of atoms, is not yet defined and will only be so after
some long discussions.
However it is important that the DataCapsule give true format indepence,
because it is never used as a target file format. We might choose to continue
using css/html internally, but that will most likely be expanded with our own
tags and therefore different from what the HTML filter writes.
## Filters
Filters are format converters, a filter convert between a specific fileformat
and the DataCapsule.
A filter contain a set of predefined functions
- read file
- write file
- register file extensions
- get statistics
A filter lives independently and can therefore be blackbox tested independent.
This is important since it simplifies test heavely because we do not need to
test all combinations of filters.
### XML
Is a test filter, that basically dumps the DataCapsule. This allows dfutil to
compare old xml file with new xml file, to check if a test case caused
differences.
### ODF
This filter exist, but need to be extended for
spreadsheet/presentation/drawings.
There are programs that can generate pdf files, so the filter is not high
priority, just nice to have.
### HTML
Remark even if Socrates end up using HTML internally, we should still have a
HTML filter.
Think of the following situation. A user reports a rendering bug, we change
the DataCapsule so the HTML is correct, user is happy, but by doing that we
broke the convert between odf and oxml.
If we have a filter, we would do the change in the filter, keep the internal
representation, and have no side effects.
### OXML
This filter exist, but need to be extended for
spreadsheet/presentation/drawings.
### LATEX
This filter exist, but need to be finished
# How to
This is a prioritized list of how to get from where to are to where we would
like to be.
## Rearrange source
We will make new main directories (consumers, filters), and move the code into
these directories.
This helps new people understand which part to attack, currenctly DocFormat
contain multiple libraries for multiple purposes. It is e.g. not evident that
"word" contain the oxml filter.
## Remove cross references
Currently any file potentially use functions in any other file (its not that
bad, but we dont have a call graph).
Function which are used multiple places should move to "helpers", so that its
very clear which functions are common, and which not.
## Isolate 3rd party and OS
A special variant of cross references is the use of 3rd party header files.
Changing that will require new code in helpers.
OS fnctions seems not isolated in the current source, isolating them will make
porting easier.
If we have a portability library, we can test that once on all platforms, and
knowing that works means we dont need to test e.g. filters on all platforms.
## Add interfaces
Having a clean code structure, allows us to add interfaces.
Especially filters, calls for a C++ class, from which the single filter
inherit, doing so makes management a lot easier to program, because its only
when allocating the filter the type is known.
## Define data capsule
This is the last part and for sure the biggest discussion. For the moment this
is left for a future discussion.