aboutsummaryrefslogtreecommitdiffhomepage
path: root/data/doc/sisu/v2/sisu_markup_samples/sisu_manual/sisu_description.sst
diff options
context:
space:
mode:
Diffstat (limited to 'data/doc/sisu/v2/sisu_markup_samples/sisu_manual/sisu_description.sst')
-rw-r--r--data/doc/sisu/v2/sisu_markup_samples/sisu_manual/sisu_description.sst34
1 files changed, 19 insertions, 15 deletions
diff --git a/data/doc/sisu/v2/sisu_markup_samples/sisu_manual/sisu_description.sst b/data/doc/sisu/v2/sisu_markup_samples/sisu_manual/sisu_description.sst
index 5cd8c602..fe3b5c46 100644
--- a/data/doc/sisu/v2/sisu_markup_samples/sisu_manual/sisu_description.sst
+++ b/data/doc/sisu/v2/sisu_markup_samples/sisu_manual/sisu_description.sst
@@ -58,15 +58,15 @@ SiSU is a flexible document preparation, generation publishing and search system
SiSU ("SiSU information Structuring Universe" or "Structured information, Serialized Units"),~{ also chosen for the meaning of the Finnish term "sisu". }~ is a Unix command line oriented framework for document structuring, publishing and search. Featuring minimalistic markup, multiple standard outputs, a common citation system, and granular search.
-Using markup applied to a document, SiSU can produce plain text, HTML, XHTML, XML, OpenDocument, LaTeX or PDF files, and populate an SQL database with objects~{ objects include: headings, paragraphs, verse, tables, images, but not footnotes/endnotes which are numbered separately and tied to the object from which they are referenced. }~ (equating generally to paragraph-sized chunks) so searches may be performed and matches returned with that degree of granularity (e.g. your search criteria is met by these documents and at these locations within each document). Document output formats share a common object numbering system for locating content. This is particularly suitable for "published" works (finalized texts as opposed to works that are frequently changed or updated) for which it provides a fixed means of reference of content.
+Using markup applied to a document, SiSU can produce plain text, HTML, XHTML, XML, OpenDocument, EPUB, LaTeX or PDF files, and populate an SQL database with objects~{ objects include: headings, paragraphs, verse, tables, images, but not footnotes/endnotes which are numbered separately and tied to the object from which they are referenced. }~ (equating generally to paragraph-sized chunks) so searches may be performed and matches returned with that degree of granularity (e.g. your search criteria is met by these documents and at these locations within each document). Document output formats share a common object numbering system for locating content. This is particularly suitable for "published" works (finalized texts as opposed to works that are frequently changed or updated) for which it provides a fixed means of reference of content.
-SiSU is the data/information structuring and transforming tool, that has resulted from work on one of the oldest law web projects. It makes possible the one time, simple human readable markup of documents, that SiSU can then publish in various forms, suitable for paper~{ pdf via LaTeX or lout }~, web~{ currently html (two forms of html presentation one based on css the other on tables), and /PHP/; potentially structured XML }~ and relational database~{ any SQL - currently PostgreSQL and /sqlite/ (for portability, testing and development) }~ presentations, retaining common data-structure and meta-information across the output/presentation formats. Several requirements of legal and scholarly publication on the web have been addressed, including the age old need to be able to reliably cite/pinpoint text within a document, to easily make footnotes/endnotes, to allow for semantic document meta-tagging, and to keep required markup to a minimum. These and other features of interest are listed and described below. A few points are worth making early (and will be repeated a number of times):
+SiSU is the data/information structuring and transforming tool, that has resulted from work on one of the oldest law web projects. It makes possible the one time, simple human readable markup of documents, that SiSU can then publish in various forms, suitable for paper~{ pdf via LaTeX }~, web~{ currently html (two forms of html presentation one based on css the other on tables), and /PHP/; potentially structured XML }~ and relational database~{ any SQL - currently PostgreSQL and /sqlite/ (for portability, testing and development) }~ presentations, retaining common data-structure and meta-information across the output/presentation formats. Several requirements of legal and scholarly publication on the web have been addressed, including the age old need to be able to reliably cite/pinpoint text within a document, to easily make footnotes/endnotes, to allow for semantic document meta-tagging, and to keep required markup to a minimum. These and other features of interest are listed and described below. A few points are worth making early (and will be repeated a number of times):
_1 (i) The SiSU document generator was the first to place material on the web with a system that makes possible citation across different document types, with paragraph, or rather object citation numbering~{ previously called "text object numbering" }~ a text positioning system, available for the pinpointing of text, 1997, a simple idea from which much benefit, and SiSU remains today, to the best of my knowledge, the only multiple format e-book/ electronic-document system on the web that gives you this possibility (including for relational databases).
_1 (ii) Markup is done once for the multiple formats produced.
-_1 (iii) Markup is simple, and human readable (with a little practice), in almost all cases there is less and simpler markup required than basic html. In any event the markup required is very much simpler than the html, LaTeX, [lout], structured XML, ODF (OpenDocument), PostgreSQL or SQLite feed etc. that you can have SiSU generate for you.
+_1 (iii) Markup is simple, and human readable (with a little practice), in almost all cases there is less and simpler markup required than basic html. In any event the markup required is very much simpler than the html, EPUB, LaTeX, [lout], structured XML, ODF (OpenDocument), PostgreSQL or SQLite feed etc. that you can have SiSU generate for you.
_1 (iv) SiSU is a batch processor, dealing with as many files as you need to generate at a time.
@@ -76,11 +76,11 @@ SiSU Sabaki~{ SiSU Sabaki, release version. Pre-release version SiSU Scribe, and
SiSU was born of the need to find a way, with minimal effort, and for as wide a range of document types as possible, to produce high quality publishing output in a variety of document formats. As such it was necessary to find a simple document representation that would work across a large number of document types, and the most convenient way(s) to produce acceptable output formats. The project leading to this program was started in 1993 (together with the trade law project now known as Lex Mercatoria) as an investigation of how to effectively/efficiently place documents on the web. The unified document handling, together with features such as paragraph numbering, endnote handling and tables... appeared in 1996/97. SiSU was originally written in Perl,~{ http://www.perl.org/ }~ and converted to Ruby,~{ http://www.ruby-lang.org/en/ }~ in 2000, one of the most impressive programming languages in existence! In its current form it has been written to run on the Gnu/Linux platform, and in particular on Debian,~{ http://www.debian.org/ }~ taking advantage of many of the wonderful projects that are available there.
-SiSU markup is based on requiring the minimum markup needed to determine the structure of a document. (This can be as little as saying in a header to look for the word Book at a specified level and the word Chapter at another level). SiSU then breaks a document into its smallest parts (at a heading, and paragraph level) while retaining all structural information. This break up of the document and information on its structure is taken advantage of in the transformations made in generating the very different output types that can be created, and in providing as much as can be for what each output type is best at doing, e.g. LaTeX (professional document typesetting, easy conversion to pdf or Postscript), XML (in this case, structural representation), ODF (OpenDocument [experimental]), SQL (e.g. document search; representing constituent parts of documents based on their structure, headings, chapters, paragraphs as required; user control).~{ where explicit structure is provided through the use of tagging headings, it could be reduced (still) further, for example by reducing the number of characters used to identify heading levels; but in many cases even that information is not required as regular expressions can be used to extract the implicit structure. }~
+SiSU markup is based on requiring the minimum markup needed to determine the structure of a document. (This can be as little as saying in a header to look for the word Book at a specified level and the word Chapter at another level). SiSU then breaks a document into its smallest parts (at a heading, and paragraph level) while retaining all structural information. This break up of the document and information on its structure is taken advantage of in the transformations made in generating the very different output types that can be created, and in providing as much as can be for what each output type is best at doing, e.g. LaTeX (professional document typesetting, easy conversion to pdf or Postscript), EPUB, XML (in this case, structural representation), ODF (OpenDocument [experimental]), SQL (e.g. document search; representing constituent parts of documents based on their structure, headings, chapters, paragraphs as required; user control).~{ where explicit structure is provided through the use of tagging headings, it could be reduced (still) further, for example by reducing the number of characters used to identify heading levels; but in many cases even that information is not required as regular expressions can be used to extract the implicit structure. }~
From markup that is simpler and more sparse than html you get:
-_* far greater output possibilities, including html, XML, ODF (OpenDocument), LaTeX (pdf), and SQL;
+_* far greater output possibilities, including html, EPUB, XML, ODF (OpenDocument), LaTeX (pdf), and SQL;
_* the advantages implicit in the very different output possibilities;
@@ -88,7 +88,7 @@ _* a common citation system (for all outputs - including the relational database
For more see the short summary of features provided below.
-SiSU processes files with minimal tagging to produce various document outputs including html, LaTeX or lout (which is converted to pdf) and if required loads the structured information into an SQL database (PostgreSQL and SQLite have been used for this). SiSU produces an intermediate processing format.~{ This proved to be the easiest way to develop syntax, changes could be made, or alternatives provided for the markup syntax whilst the intermediate markup syntax was largely held constant. There is actually an optional second intermediate markup format in YAML http://www.yaml.org/ }~
+SiSU processes files with minimal tagging to produce various document outputs including html, EPUB, ODF, LaTeX (which is converted to pdf) and if required loads the structured information into an SQL database (PostgreSQL and SQLite have been used for this). SiSU produces an intermediate processing format.~{ This proved to be the easiest way to develop syntax, changes could be made, or alternatives provided for the markup syntax whilst the intermediate markup syntax was largely held constant. There is actually an optional second intermediate markup format in YAML http://www.yaml.org/ }~
SiSU is used in constructing Lex Mercatoria http://lexmercatoria.org/ or http://www.jus.uio.no/lm/ (one of the oldest law web sites), and considerable thought went into producing output that would be suitable for legal and academic writings (that do not have formulae) given the limitations of html, and publication in a wide variety of "formats", in particular in relation to the convenient and accurate citation of text. However, the construction of Lex Mercatoria uses only a fraction of the features available from SiSU today, /vis/ generation of flat file structures, rather than in addition the building of ("granular") SQL database content, (at an object level with relevant relational tables, and other outputs also available).
@@ -109,10 +109,10 @@ notes:
* markup defines document structure (this may be done once in a header pattern-match description, or for heading levels individually); basic text attributes (bold, italics, underscore, strike-through etc.) as required; and semantic information related to the document (header information, extended beyond the Dublin core and easily further extended as required); the headers may also contain processing instructions.
!_ (iii)
-(a) multiple outputs primarily industry established and institutionally accepted open standard formats, include amongst others: plaintext (UTF-8); html; (structured) XML; ODF (Open Document text)l; LaTeX; PDF (via LaTeX); SQL type databases (currently PostgreSQL and SQLite). Also produces: concordance files; document content certificates (md5 or sha256 digests of headings, paragraphs, images etc.) and html manifests (and sitemaps of content). (b) takes advantage of the strengths implicit in these very different output types, (e.g. PDFs produced using typesetting of LaTeX, databases populated with documents at an individual object/paragraph level, making possible granular search (and related possibilities))
+(a) multiple outputs primarily industry established and institutionally accepted open standard formats, include amongst others: plaintext (UTF-8); html; EPUB; (structured) XML; ODF (Open Document text)l; LaTeX; PDF (via LaTeX); SQL type databases (currently PostgreSQL and SQLite). Also produces: concordance files; document content certificates (md5 or sha256 digests of headings, paragraphs, images etc.) and html manifests (and sitemaps of content). (b) takes advantage of the strengths implicit in these very different output types, (e.g. PDFs produced using typesetting of LaTeX, databases populated with documents at an individual object/paragraph level, making possible granular search (and related possibilities))
!_ (iv)
-outputs share a common numbering system (dubbed "object citation numbering" (ocn)) that is meaningful (to man and machine) across various digital outputs whether paper, screen, or database oriented, (PDF, html, XML, sqlite, postgresql), this numbering system can be used to reference content.
+outputs share a common numbering system (dubbed "object citation numbering" (ocn)) that is meaningful (to man and machine) across various digital outputs whether paper, screen, or database oriented, (PDF, html, EPUB, XML, Opendocument, sqlite, postgresql), this numbering system can be used to reference content.
!_ (v)
SQL databases are populated at an object level (roughly headings, paragraphs, verse, tables) and become searchable with that degree of granularity, the output information provides the object/paragraph numbers which are relevant across all generated outputs; it is also possible to look at just the matching paragraphs of the documents in the database; [output indexing also work well with search indexing tools like hyperesteier].
@@ -147,7 +147,7 @@ possible to pre-process, which permits: the easy creation of standard form docum
there is a considerable degree of future-proofing, output representations are "upgradeable", and new document formats may be added.
!_ (xv)
-there is a considerable degree of future-proofing, output representations are "upgradeable", and new document formats may be added: (a) modular, (thanks in no small part to Ruby) another output format required, write another module.... (b) easy to update output formats (eg html, XHTML, LaTeX/PDF produced can be updated in program and run against whole document set), (c) easy to add, modify, or have alternative syntax rules for input, should you need to,
+there is a considerable degree of future-proofing, output representations are "upgradeable", and new document formats may be added: (a) modular, (thanks in no small part to Ruby) another output format required, write another module.... (b) easy to update output formats (eg html, XHTML, EPUB, LaTeX/PDF produced can be updated in program and run against whole document set), (c) easy to add, modify, or have alternative syntax rules for input, should you need to,
!_ (xvi)
scalability, dependent on your file-system (ext3, Reiserfs, XFS, whatever) and on the relational database used (currently Postgresql and SQLite), and your hardware,
@@ -168,7 +168,7 @@ remote operations: (a) run SiSU on a remote server, (having prepared sisu markup
document source may be bundled together (automatically) with associated documents (multiple language versions or master document with inclusions) and images and sent as a zip file called a sisupod, if shared on the net these too may be processed locally to produce the desired document outputs, these may be downloaded, shared as email attachments, or processed by running sisu against them, either using a url or the filename.
!_ (xxii)
-for basic document generation, the only software dependency is Ruby, and a few standard Unix tools (this covers plaintext, html, XML, ODF, LaTeX). To use a database you of course need that, and to convert the LaTeX generated to PDF, a LaTeX processor like tetex or texlive.
+for basic document generation, the only software dependency is Ruby, and a few standard Unix tools (this covers plaintext, html, EPUB, XML, ODF, LaTeX). To use a database you of course need that, and to convert the LaTeX generated to PDF, a LaTeX processor like tetex or texlive.
as a developers tool it is flexible and extensible
@@ -176,7 +176,7 @@ SiSU was developed in relation to legal documents, and is strong across a wide v
SiSU has been developed and has been in use for several years. Requirements to cover a wide range of documents within its use domain have been explored.
-Some modules are more mature than others, the most mature being Html and LaTeX / pdf. PostgreSQL and search functions are useable and together with /ocn/ unique (to the best of my knowledge). The XML output document set is "well formed" but largely proof of concept.
+Some modules are more mature than others, the most mature being html and LaTeX / pdf. PostgreSQL and search functions are useable and together with /ocn/ unique (to the best of my knowledge). The XML output document set is "well formed" but largely proof of concept.
2~ How it works
@@ -184,7 +184,7 @@ SiSU markup is fairly minimalistic, it consists of: a (largely optional) documen
2~ Simple markup
-SiSU markup is based on requiring the minimum markup needed to determine the structure of a document. (This can be as little as saying in a header to look for the word Book at a specified level and the word Chapter at another level). SiSU then breaks a document into its smallest parts (at a heading, and paragraph level) while retaining all structural information. This break up of the document and information on its structure is taken advantage of in the transformations made in generating the very different output types that can be created, and in providing as much as can be for what each output type is best at doing, e.g. LaTeX (professional document typesetting, easy conversion to pdf or Postscript), XML (in this case, structural representation), ODF (OpenDocument), SQL (e.g. document search; representing constituent parts of documents based on their structure, headings, chapters, paragraphs as required; user control).~{ where explicit structure is provided through the use of tagging headings, it could be reduced (still) further, for example by reducing the number of characters used to identify heading levels; but in many cases even that information is not required as regular expressions can be used to extract the implicit structure. }~
+SiSU markup is based on requiring the minimum markup needed to determine the structure of a document. (This can be as little as saying in a header to look for the word Book at a specified level and the word Chapter at another level). SiSU then breaks a document into its smallest parts (at a heading, and paragraph level) while retaining all structural information. This break up of the document and information on its structure is taken advantage of in the transformations made in generating the very different output types that can be created, and in providing as much as can be for what each output type is best at doing, e.g. LaTeX (professional document typesetting, easy conversion to pdf or Postscript), EPUB, XML (in this case, structural representation), ODF (OpenDocument), SQL (e.g. document search; representing constituent parts of documents based on their structure, headings, chapters, paragraphs as required; user control).~{ where explicit structure is provided through the use of tagging headings, it could be reduced (still) further, for example by reducing the number of characters used to identify heading levels; but in many cases even that information is not required as regular expressions can be used to extract the implicit structure. }~
3~ Sparse markup requirement, try to get the most out of markup
@@ -270,7 +270,7 @@ The object citation number markers contain additional numbering information with
An advantage is that the numbering remains the same regardless of document structure.
-Text object ("paragraph") numbering is the same for all output versions of the same document, vis html, pdf, pgsql, yaml etc.
+Text object ("paragraph") numbering is the same for all output versions of the same document, vis html, epub, pdf, pgsql, etc.
In the relational database, as individual text objects of a document stored (and indexed) together with object numbers, and all versions of the document have the same numbering, the results of searches may be tailored just to provide the location of the search result in all available document formats.
@@ -431,6 +431,10 @@ _* {~^ *w3m* }http://w3m.sourceforge.net/
The html tables output is rendered more accurately across a wider variety set and older versions of browsers (than the html css output).
+3~ EPUB
+
+SiSU generates EPUB documents.
+
3~ XML
SiSU generates well formed XML, and multiple versions. An XML SAX version with a flat/shallow structure, and XML DOM version with a deeper (embedded) structure. There is also a released working xhtml module. Examples of SAX and DOM versions are provided within this document.
@@ -478,7 +482,7 @@ This is a larger scale project, (with little development on the front end largel
{~^ Sample search frontend }http://search.sisudoc.org
A small database and sample query front-end (search from) that makes use of the citation system, _{object citation numbering}_ to demonstrates functionality.~{ (which could be extended further with current back-end). As regards scaling of the database, it is as scalable as the database (here Postgresql) and hardware allow. }~
-SiSU can provide information on which documents are matched and at what locations within each document the matches are found. These results are relevant across all outputs using object citation numbering, which includes html, XML, LaTeX, PDF and indeed the SQL database. You can then refer to one of the other outputs or in the SQL database expand the text within the matched objects (paragraphs) in the documents matched.
+SiSU can provide information on which documents are matched and at what locations within each document the matches are found. These results are relevant across all outputs using object citation numbering, which includes html, EPUB, XML, LaTeX, PDF and indeed the SQL database. You can then refer to one of the other outputs or in the SQL database expand the text within the matched objects (paragraphs) in the documents matched.
(further work needs to be done on the sample search form, which is rudimentary and only passes simple booleans correctly at present to the SQL engine)
@@ -507,7 +511,7 @@ Expand those same searches, showing the matching text in each document:
Note you may set results either for documents matched and object number locations within each matched document meeting the search criteria; or display the names of the documents matched along with the objects (paragraphs) that meet the search criteria.~{ of this feature when demonstrated to an IBM software innovations evaluator in 2004 he said to paraphrase: this could be of interest to us. We have large document management systems, you can search hundreds of thousands of documents and we can tell you which documents meet your search criteria, but there is no way we can tell you without opening each document where within each your matches are found. }~
!_ OCN index mode,
-(object citation number) the numbers displayed are relevant (and may be used to reference the match) in any sisu generated rendition of the text~{ OCN are provided for HTML, XML, pdf ... though currently omitted in plain-text and opendocument format output }~ the links provided are to the locations of matches within the html generated by SiSU.
+(object citation number) the numbers displayed are relevant (and may be used to reference the match) in any sisu generated rendition of the text~{ OCN are provided for HTML, XML, EPUB, pdf ... though currently omitted in plain-text and opendocument format output }~ the links provided are to the locations of matches within the html generated by SiSU.
!_ Paragraph mode,
you may alternatively display the text of each paragraph in which the match was made, again the object/paragraph numbers are relevant to any SiSU generated/published text.