aboutsummaryrefslogtreecommitdiffhomepage
path: root/markup/pod/sisu-manual/media/text/en/sisu_markup.sst
diff options
context:
space:
mode:
authorRalph Amissah <ralph.amissah@gmail.com>2023-07-04 23:52:33 -0400
committerRalph Amissah <ralph.amissah@gmail.com>2023-07-08 15:57:51 -0400
commit30b5b153afb272e3a7c5c6e76775cffecaea0a78 (patch)
treea5fe5e6b0857af06051da6ae08d708fddd678151 /markup/pod/sisu-manual/media/text/en/sisu_markup.sst
parenthomepage updates, re-read (diff)
add sisu spine description to sisu_markup, review
Diffstat (limited to 'markup/pod/sisu-manual/media/text/en/sisu_markup.sst')
-rw-r--r--markup/pod/sisu-manual/media/text/en/sisu_markup.sst241
1 files changed, 241 insertions, 0 deletions
diff --git a/markup/pod/sisu-manual/media/text/en/sisu_markup.sst b/markup/pod/sisu-manual/media/text/en/sisu_markup.sst
index ae87b76..f9a27d3 100644
--- a/markup/pod/sisu-manual/media/text/en/sisu_markup.sst
+++ b/markup/pod/sisu-manual/media/text/en/sisu_markup.sst
@@ -50,6 +50,247 @@ make:
:A~ @title-author-date
+:B~ SiSU Description
+
+1~ SiSU Description
+
+SiSU is an object-centric, lightweight markup based, document structuring,
+parser, publishing and search tool for document collections. It is command line
+oriented and generates static content that is currently made searchable at an
+object level through an SQL database. Markup helps define (delineate) objects
+(primarily various types of text block) which are tracked in sequence,
+substantive objects being numbered sequentially by the program for object
+citation.
+
+!_ Summary.
+An object is a unit of text within a document the most common being a paragraph.
+Objects include individual headings, paragraphs, tables, grouped text of various
+types such as code blocks and within poems, verse. Objects have properties and
+attributes, of particular significance are headings and their levels which
+provide document structure. A heading is an object with a heirarchical value,
+that conceptually contains other objects (such as paragraphs and possibly
+sub-headings etc.). Objects are tracked sequentially as they relate to each
+other object within a document and substantive objects are numbered
+sequentially, for citation purposes. Notably footnotes are not objects in
+themselves, rather belonging to the object from which they are referenced, and
+following their own numbering sequence. From heading objects (linked) tables of
+content may be generated, and if additional metadata is provided book type
+indexes can be generated that link back to the objects to which they relate.
+
+!_ Unpacking this a bit further.
+SiSU as a concept independent of its markup language and the parsers that have
+been implemented, is based on the following ideas:
+
+!_ Object-Centricity. On objects:
+In SiSU objects are the fundamental unit from which larger constructs within a
+document and the document itself is built. Breaking the document into objects
+provides interesting possibilities.
+
+!_ Objects are fundamental building blocks:
+Conceptually within SiSU, objects are the building blocks or individual units of
+construction of a document. Objects are usually blocks of text, the most common
+of which is the paragraph, other examples include: individual headings, tables,
+grouped text of various types which include code blocks and verse within poems,
+... and as mentioned an object could also, for example, be an image. Objects can
+be formatted and placed as needed, providing flexibility and enabling multiple
+types of representation across disperate formats and text recepticle, examples
+including html, epub, latex (in the past mind-maps) and sql (populated at an
+object level, and thereby providing search with that degree of granularity).
+
+!_ Sequential. Objects have sequence:
+That objects have sequence, goes largely without saying, this follows
+authorship, it is part of the definition of a document and how a document is
+written to convey meaning.
+
+!_ Object Numbers & Citation. Substantive objects are numbered for citation purposes:
+Most objects within a document are meant by the author to be a substantive part
+of the document. All such objects are numbered sequentially and can be
+referenced thereby for citation purposes. Object numbers provide the possibility
+of citing/locating text precisely across different document formats and
+different languages (assuming the document has been translated). For search it
+also makes it possible to identify precisely where search criteria is met within
+in each document in the form of an index or to view those precise text objects
+before deciding which documents are of interest. Additionally the use of objects
+(and that objects are numbered) frees the possibility to represent the document
+in the manner considered most suitable to a specific document format wilst
+retaining its structural (and citation) integrity).
+
+!_ Characteristics. Objects have properties and attributes:
+Objects have properties (and may have attributes). By properties I here refer to
+the fundamental type of object, be it a heading, a paragraph, table, verse etc.
+Attributes extend further and may include other things that one might wish to
+associate with the object (examples not necessarily currently available/
+implemented in SiSU might include, formatting whether it is indented, or
+metadata e.g. the associated language, or programming language for a code block)
+
+!_ Document structure. Heading objects hold documents structure:
+Heading objects hold documents structure through their heading level property.
+The types of document of interest to SiSU have structure that is captured by the
+heading level property. Headings are individual objects like any other with the
+additional properties that (i) they may be regarded as containing the other
+objects following them sequentially (until the next heading of a similar or
+higher level), heading objects may include other headings (sub-headings), and
+(ii) that they have a heirarchy, the root "heading" being the document title. \\
+A complication was intruduced to provide greater flexibility across document
+output formats. Headings have two sets of levels, the level under which
+substantive text occurs, this would be a chapter or segment level, and above
+that in the heirarchy if needed are document section separators, book, section,
+part.
+
+!_ Non-objects
+Most but not all parts of a document are treated as objects. Notably footnotes
+are not objects in themselves, rather belonging to the object from which they
+are referenced, and following their own numbering sequence. From heading objects
+(linked) tables of content may be generated, and if additional metadata is
+provided book type indexes can be generated that link back to the objects to
+which they relate.
+
+!_ The Document Header.
+SiSU document have headers which contain document metadata, at a minimum the
+document title and author. In addition the document header may contain markup
+instruction (e.g. how to identify headings within the document, in which case
+those headings need not be found and treated accordingly)
+
+SiSU parsers have now been implemented in different programming paradigms and
+languages a couple of times, the chosen markup has been left unchanged though
+the document headers have been modified.
+
+This is the core of sisu, beyond which there is more but largely in the form of
+choices based on ... existing output formats and of implementation detail,
+deciding what attributes of objects, or within objects should be supported,
+extending markup to allow for the generation of book indexes from if tagging
+provided.
+
+2~ Older Descriptions
+
+Here is a description that has been used for the original sisu (scribe):
+
+With minimal preparation of a plain-text (UTF-8) file, using sisu markup syntax
+in your text editor of choice, SiSU can generate various document formats, most
+of which share a common object numbering system for locating content, including
+plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODF:ODT), LaTeX, PDF
+files, and populate an SQL database with objects (roughly paragraph-sized
+chunks) so searches may be performed and matches returned with that degree of
+granularity. Think of being able to finely match text in documents, using common
+object numbers, across different output formats (same object identifier for pdf,
+epub or html) and across languages if you have translations of the same document
+(same object identifier across languages). For search, your criteria is met by
+these documents at these locations within each document (equally relevant across
+different output formats and languages). To be clear (if obvious) page numbers
+provide none of this functionality. Object numbering is particularly suitable
+for "published" works (finalized texts as opposed to works that are frequently
+changed or updated) for which it provides a fixed means of reference of content.
+Document outputs can also share provided semantic meta-data.
+
+2~ ...
+
+SiSU is less about document layout than it is about finding a way using little
+markup to construct an abstract representation of a document that makes it
+possible to produce multiple representations of it which may be rather different
+from each other and used for different purposes, whether layout and publishing,
+scrollworthy online viewing/ reading, or content search. To be able to take
+advantage from its minimal preparation starting point of some of the strengths
+of rather different established ways of representing documents for different
+purposes, whether for search (relational database, or indexed flat files
+generated for that purpose whether of complete documents, or say of files made
+up of objects), online or other electronic viewing (e.g. html, xml, epub), or
+paper publication (e.g. pdf via latex)...
+
+The solution arrived at is to extract structural information about the document
+(document sections and headings within the document, available through pattern
+matching or markup) and tracking objects (which primarily are defined units of
+text such as paragraphs, headings, tables, verse, etc. but also images) which
+can be reconstituted as the same documents with relevant object identification
+numbers so text (objects) can be referenced across different output formats and
+presentations.
+
+SiSU generates tables of content, and through its markup the means for metadata
+to be provided for the generation of book style indexes for a document (that
+again due to document object numbers are the same and equally relevant across
+all document formats). Per document classifying/organizing metadata can also be
+provided for automated document curation.
+
+... there have also been working experiments with sisu markup source, two way
+conversion/representation of sisu document markup source in mind-mapping
+(software kdissert was used for its strong focus on producing documents (now
+apparently called semantik)); also po4a software for translators has been used
+successfuly in its regular text mode for sisu markup in translation, (which is
+more an attribute of po4a than of sisu, but) which is of interest due to
+sisu/spine's object citation numbering being available across translations. Open
+Document Format text (odf:odt), has been an output, but much more interesting
+(and requested by potential users of sisu/spine) would be the ability of a word
+processor to save text/a document in sisu markup, making alternative document
+processing and presentations with sisu possible.
+
+also worth mention, in the relatively long history of this project, there has
+been work done on extracting hash representations of each object, that could
+hypothetically be shared to prove the content of a document without sharing its
+content, or of identifying which objects change; these hashes can also be used
+as unique identifiers in a database or as identifying filenames if individual
+objects are saved.
+
+SiSU has evolved, the current implementation focuses on one primary use-case,
+books and literary writings. However the concept on which it is based has wider
+application. Here is a prevously posted souvenir from my encounter with an IBM
+software evaluator in London June 2004 that came about through a chance
+encounter with an IBM manager at a Linux Expo, who was curious about my interest
+in Gnu/Linux with my legal background... on hearing that I also wrote software,
+he suggested, maybe IBM should have a look at it. I was interested, the meeting
+was set up... with an IBM, Software Innovations evaluator<br>His response after
+the meeting:
+
+"Ralph \\ Good to meet with you today, I was very impressed with your
+software. \\ /{ [colleague's name (also posted to an IBM colleague)] }/ - in
+summary - Ralph has built an application that runs on linux and takes ASCII
+documents and pulls them apart in to the smallest constituent parts, storing
+them as XML, PDF and HTML, the HTML are hyperlinked up so the document can be
+browsed in its full form. the format and text data created is stored in a
+database.<br>This has potential in any place that needs the power of full text
+search whilst holding the structural concepts of the document i.e. legal,
+pharma, education, research.. which ones we need to figure out, ..."
+
+Special interest was expressed in the search implications of SiSU. To
+paraphrase, the company has document management systems dealing with hundreds of
+thousands of texts, these tell you which documents match your search criteria,
+but cannot inform you where within a text these matches were found without
+opening the documents. This is achieved through defining document objects and
+making them the building block of the document, trackable document objects (that
+can be placed back in the context of the document or corpus of documents if part
+of a collection). SiSU's early design was to - abstract documents to their
+structure, and identified objects, numbered in a citable way (as pointed out
+document object hashes can be of use for the purpose).
+
+2~ SiSU Spine
+
+SiSU Spine is the new generator for documents prepared in sisu markup, written
+in D as opposed to the original sisu which was first shared in Ruby.
+
+Spine code has not as yet been made publicly available.
+
+As compared with the original sisu generator sisu spine:
+
+- Spine uses the same document markup for the document body, but uses yaml for
+document headers (which contains document metadata and configuration details),
+the original sisu has a bespoke markup for headers.
+
+- Spine (written in D) is considerably faster at generating native output than
+sisu (written in Ruby), on last test at least 60 times faster (what took 1
+minute takes 1 second; 1 hour a minute :-) (admittedly some time ago, ruby has
+been getting faster, hopefully this is not over over promising).
+
+- Spine produces fewer document outputs types than sisu (html, epub, (odt,
+latex) and populates sql db for search)
+
+- As regards non-native output, so far Spine has greater separation of what it
+does and largely leaves calling the external program to the user, e.g.: latex
+output is a native output in the sense that it is generated directly by spine,
+but the pdfs that can be produced from these are produced through use of an
+external program xelatex, which produces fine output but is a very much slower
+process.
+
+- (where both produce the same output type, generally) Spine generally produces
+more up to date output format representations.
+
:B~ SiSU Markup
={ SiSU markup:test }