aboutsummaryrefslogtreecommitdiffhomepage
diff options
context:
space:
mode:
authorRalph Amissah <ralph.amissah@gmail.com>2026-05-22 15:59:17 -0400
committerRalph Amissah <ralph.amissah@gmail.com>2026-05-22 16:01:11 -0400
commit424763ed06ca8820b287e92c56a1d6aaf97d42a4 (patch)
tree4086943278fd5fb24603438f123d44e1c932848b
parentcorrection: a name spelling & a markup error (diff)
homepage reorganize/update - revisit
(assisted by Claude-Code)
-rw-r--r--markup/sisudoc-spine-bespoke-output/html/homepage.index.html1045
-rw-r--r--org/spine-bespoke-output-homepage-html.org1045
2 files changed, 712 insertions, 1378 deletions
diff --git a/markup/sisudoc-spine-bespoke-output/html/homepage.index.html b/markup/sisudoc-spine-bespoke-output/html/homepage.index.html
index adf34f1..601cba8 100644
--- a/markup/sisudoc-spine-bespoke-output/html/homepage.index.html
+++ b/markup/sisudoc-spine-bespoke-output/html/homepage.index.html
@@ -1,759 +1,426 @@
<!DOCTYPE html>
-<html>
+<html lang="en">
<head>
- <meta http-equiv="Content-Type" content="text/plain; charset=UTF-8" />
+ <meta charset="UTF-8" />
<title>≅ SiSU project sisudoc.org</title>
<link href="./css/html_seg.css" rel="stylesheet" />
</head>
<body>
-<h1>≅ - SiSU for documents - structuring, publishing in multiple
-formats &amp; search</h1>
-
-<h2>ℹ - A short description</h2>
+<h1>≅ SiSU - lightweight markup, object-centric documents,
+multiple outputs &amp; search</h1>
<p>
-SiSU is an object-centric, lightweight markup based, document structuring,
-parser, publishing and search tool for document collections. It is command line
-oriented and generates static content that is made searchable at an object level
-through an SQL database.
+SiSU parses a lightweight-markup source into an abstract document object
+model. Every substantive element (paragraph, heading, table, verse, image)
+becomes a typed object carrying its position in the document's sequence
+and hierarchy, and a stable citation number. From that single abstraction
+it emits multiple output formats - HTML (segmented and scroll), EPUB3,
+LaTeX (then PDF via xelatex), ODT, plain text, and an SQLite full-text
+search database. Each object's number stays stable across every output
+format and across translations of the same document.
</p>
<p>
-SiSU markup helps define (delineate) objects (primarily various types of text
-block) which are tracked in sequence, substantive objects being numbered
-sequentially by the program for object citation. Breaking document into numbered
-objects provides interesting possibilities. These object numbers provide the
-possibility of citing/locating text precisely across different document formats
-and different languages (assuming the document has been translated). For search
-it also makes it possible to identify precisely where within in each document
-search criteria is met in the form of an index. Additionally the use of objects
-(and that objects are numbered) frees the possibility to represent the document
-in the manner considered most suitable to a specific document format (whilst
-retaining its structural (and citation) integrity).
+The processing pipeline is <b>markup &#8594; abstraction &#8594;
+output</b>.
</p>
<p>
-Objects which include their inherent associated properties (which vary by type
-of object), constitute building blocks of a document from which alternative
-representations of a document can be (imagined and) built.
+<b>Object-Centric Document Abstraction</b>. The abstraction stage builds an
+in-memory object model: every paragraph, heading, table, footnote and so on is a
+numbered object that carries its own parent / sibling / type metadata, known as
+OCN (Object Citation Numbering). Every output format is generated from that
+single abstraction, so all formats share the same object identifiers. The
+abstraction can also be written out as a human-readable, PEG-parsable text
+format (<code>.ssp</code>) that other tools can consume directly.
</p>
-<h2>Δ - SiSU project source</h2>
+<h2>ℹ - How this differs from a typical "markup &#8594; HTML"
+pipeline</h2>
<p>
- <a href="./projects">
- Δ SiSU projects repo (git)
- </a><br>
- - <a href="https://git.sisudoc.org">
- https://git.sisudoc.org
- </a><br>
-</p>
+<ul>
+ <li><b>Citation that survives format conversion.</b> Quote object 412 and the
+ reference is meaningful in the HTML, the EPUB, the PDF, the plain text and the
+ SQLite search results - and in any translation, because OCN is a property of
+ the abstraction, not of pagination or layout.</li>
-<h3>Δ - sisudoc-spine project source (programmed in D)</h3>
+ <li><b>Object-granular search.</b> The SQLite database is populated at object
+ granularity. A query reports not just "this document matches" but "object 412
+ in this document matches" - and links straight back to that object in the
+ published HTML.</li>
-<p>
- <a href="./projects/sisudoc-spine">
- Δ SiSU (sisudoc-spine): document publishing (multiple formats + search) [D]
- </a><br>
- - <a href="https://git.sisudoc.org/sisudoc-spine">
- https://git.sisudoc.org/sisudoc-spine
- </a><br>
- git clone git://git.sisudoc.org/software/sisudoc-spine
- <br>
-</p>
+ <li><b>Inspectable intermediate form.</b> The document abstraction has a
+ human-readable, PEG-parsable text serialisation (<code>.ssp</code>). Other
+ tools - in any language - can consume the abstraction without re-implementing
+ the parser. This is also what lets the abstraction stage be reasoned about,
+ diffed, fed to embedding pipelines, or used as the input to custom
+ renderers.</li>
-<p>
- <a href="./projects/sisudoc-spine-search-cgi">
- Δ SiSU (sisudoc-spine search): a sample cgi sqlite search for sisudoc-spine [D]
- </a><br>
- - <a href="https://git.sisudoc.org/sisudoc-spine-search-cgi">
- https://git.sisudoc.org/sisudoc-spine-search-cgi
- </a><br>
- git clone git://git.sisudoc.org/software/sisudoc-spine-search-cgi
- <br>
-</p>
+ <li><b>Deterministic and reproducible.</b> The same markup source produces the
+ same OCN sequence and the same outputs every time. Per-object content hashes
+ can be exposed for content identification or verification without disclosing
+ the content itself.</li>
-<p>
- <a href="./projects/sisudoc-spine-samples">
- Δ SiSU (sisudoc-spine markup): markup samples in document pods for sisudoc-spine
- </a><br>
- - <a href="https://git.sisudoc.org/sisudoc-spine-samples">
- https://git.sisudoc.org/sisudoc-spine-samples
- </a><br>
- git clone git://git.sisudoc.org/markup/sisudoc-spine-samples
- <br>
-</p>
-
-<h3>Δ - sisu scribe project source (programmed in Ruby)</h3>
-
-<p>
- <a href="./projects/sisu">
- Δ SiSU (scribe): document publishing (multiple formats + search) [Ruby]
- </a><br>
- - <a href="https://git.sisudoc.org/sisu">
- https://git.sisudoc.org/sisu
- </a><br>
- git clone git://git.sisudoc.org/software/sisu
- <br>
-</p>
+ <li><b>Designed for finished, "published" works.</b> SiSU is aimed at writings
+ that are published as a stable artefact (books, essays, articles, legal and
+ regulatory texts), where a fixed citable reference of object-level granularity
+ is more valuable than the flexibility of fluid text.</li>
-<p>
- <a href="./projects/sisu-markup">
- Δ SiSU markup samples in document pods for sisu (scribe)
- </a><br>
- - <a href="https://git.sisudoc.org/sisu-markup">
- https://git.sisudoc.org/sisu-markup
- </a><br>
- git clone git://git.sisudoc.org/markup/sisu-markup-samples
- <br>
+ <li><b>Static output, optional search.</b> Generated content is static HTML /
+ EPUB / PDF / text - trivial to host and to archive. The SQLite + CGI search is
+ an opt-in component that adds object-granular full-text query without changing
+ the publishing model.</li>
+</ul>
</p>
-<h2>⌘ - SiSU Spine markup sample output</h2>
+<h2>⌘ - See it in action</h2>
<p>
-To give an idea of how this works here is a small collection of documents marked
-up for and generated by the software. The curation of topics for a collection of
-specialized related documents would benefit from a consistently applied bespoke
-ontology or thesaurus.
-<br>
-The documents presented are documents that have been released under various
-creative commons licences, in the public domain, or the author's work, with the
-exception of one that is under GPL and the old abandoned Debian live-manual
+A single document - <i>The Wealth of Networks</i>, Yochai Benkler -
+shown in every output format SiSU Spine produces. The same OCN
+identifies the same object in each:
</p>
-<p>
- <a href="./authors.html">
- ⌘ Authors
- </a>
- (software curated from provided document header metadata)<br>
- - <a href="./authors.html">
- https://sisudoc.org/spine/authors.html
- </a>
-</p>
+<ul>
+ <li><a href="./en/html/the_wealth_of_networks.yochai_benkler/toc.html">
+ HTML (segmented, one page per chapter)</a></li>
+ <li><a href="./en/html/the_wealth_of_networks.yochai_benkler.html">
+ HTML (single scroll)</a></li>
+ <li><a href="./en/epub/the_wealth_of_networks.yochai_benkler.en.epub">
+ EPUB</a></li>
+ <li><a href="./pdf/the_wealth_of_networks.yochai_benkler.en.a4.portrait.pdf">
+ PDF (LaTeX &#8594; xelatex, A4)</a></li>
+ <li><a href="./en/html/metadata.the_wealth_of_networks.yochai_benkler.html">
+ Metadata page</a></li>
+ <li><a href="./spine_search?fn=the_wealth_of_networks.yochai_benkler&amp;rt=txt&amp;ec=on&amp;url=on&amp;sml=1000">
+ Search within this document (object-granular)</a></li>
+ <li><a href="./pod/the_wealth_of_networks.yochai_benkler/">
+ Source pod (markup + assets + manifest)</a></li>
+</ul>
-<p>
- <a href="./topics.html">
- ⌘ Topics
- </a>
- (software curated from provided document header metadata)<br>
- - <a href="./topics.html">
- https://sisudoc.org/spine/topics.html
- </a>
-</p>
+<h2>⌘ - Browse and search the sample collection</h2>
-<h2>፨ - SiSU Spine search</h2>
<p>
- <a href="./spine_search">
- ፨ Search
- </a>
- (granular search of text objects)<br>
- - <a href="https://sisudoc.org/spine_search">
- https://sisudoc.org/spine_search
- </a>
+<a href="./authors.html">⌘ Authors</a>
+&nbsp;-&nbsp;
+<a href="./topics.html">⌘ Topics</a>
+&nbsp;-&nbsp;
+<a href="./spine_search">፨ Search</a>
+<br>
+(Authors and Topics are software-curated from each document's
+header metadata. Search is object-granular.)
</p>
<div class="p">
- <!-- SiSU Spine Search -->
- <form action="https://sisudoc.org/spine_search" target="_top" method="POST" accept-charset="UTF-8" id="search">
- <input type="text" name="sf" size="24" maxlength="255">
- <input type="hidden" name="db" value="spine.search.db">
+ <!-- SiSU Spine Search -->
+ <form action="https://sisudoc.org/spine_search" target="_top"
+ method="POST" accept-charset="UTF-8" id="search">
+ <input type="text" name="sf" size="32" maxlength="255"
+ placeholder="search the collection...">
+ <input type="hidden" name="db" value="spine.search.db">
<input type="hidden" name="sml" value="1000">
- <input type="hidden" name="ec" value="on">
+ <input type="hidden" name="ec" value="on">
<input type="hidden" name="url" value="on">
<button type="submit" form="search">&nbsp;㏈&nbsp;፨&nbsp;</button>
- </form>
- <!-- SiSU Spine Search -->
+ </form>
+ <!-- SiSU Spine Search -->
</div>
-<h2>ℹ - SiSU description</h2>
-
-<p>
-SiSU is an object-centric, lightweight markup based, document structuring,
-parser, publishing and search tool for document collections. It is command line
-oriented and generates static content that is currently made searchable at an
-object level through an SQL database.
-Markup helps define (delineate) objects (primarily various types of text block)
-which are tracked in sequence, substantive objects being numbered sequentially
-by the program for object citation.
-</p>
-
-<p>
-<b>Summary.</b> An object is a unit of text within a document the most common
-being a paragraph. Objects include individual headings, paragraphs, tables,
-grouped text of various types such as code blocks and within poems, verse.
-Objects have properties and attributes, of particular significance are headings
-and their levels which provide document structure. A heading is an object with a
-heirarchical value, that conceptually contains other objects (such as paragraphs
-and possibly sub-headings etc.). Objects are tracked sequentially as they relate
-to each other object within a document and substantive objects are numbered
-sequentially, for citation purposes. Notably footnotes are not objects in
-themselves, rather belonging to the object from which they are referenced, and
-following their own numbering sequence. From heading objects (linked) tables of
-content may be generated, and if additional metadata is provided book type
-indexes can be generated that link back to the objects to which they relate.
-</p>
-
-<p>
-<b>Unpacking this a bit further.</b> SiSU as a concept independent of its markup
-language and the parsers that have been implemented, is based on the following
-ideas:
-</p>
-
-<p>
-<b>Object-Centricity. On objects:</b> In SiSU objects are the fundamental unit
-from which larger constructs within a document and the document itself is built.
-Breaking the document into objects provides interesting possibilities.
-</p>
-
-<p>
-<b>Objects are fundamental building blocks:</b> Conceptually within SiSU,
-objects are the building blocks or individual units of construction of a
-document. Objects are usually blocks of text, the most common of which is the
-paragraph, other examples include: individual headings, tables, grouped text of
-various types which include code blocks and verse within poems, ... and as
-mentioned an object could also, for example, be an image. Objects can be
-formatted and placed as needed, providing flexibility and enabling multiple
-types of representation across disperate formats and text recepticle, examples
-including html, epub, latex (in the past mind-maps) and sql (populated at an
-object level, and thereby providing search with that degree of granularity).
-</p>
-
<p>
-<b>Sequential. Objects have sequence:</b> That objects have sequence, goes
-largely without saying, this follows authorship, it is part of the definition of
-a document and how a document is written to convey meaning.
+The collection contains 25+ documents released under various Creative Commons
+licences, in the public domain, or as the author's own work (with one
+GPL-licensed exception and the abandoned Debian live-manual). A specialised
+collection would benefit from a consistently applied bespoke ontology or
+thesaurus.
</p>
-<p>
-<b>Object Numbers & Citation. Substantive objects are numbered for citation
-purposes:</b> Most objects within a document are meant by the author to be a
-substantive part of the document. All such objects are numbered sequentially and
-can be referenced thereby for citation purposes.
-<br>
-Object numbers provide the possibility of citing/locating text precisely across
-different document formats and different languages (assuming the document has
-been translated). For search it also makes it possible to identify precisely
-where search criteria is met within in each document in the form of an index or
-to view those precise text objects before deciding which documents are of
-interest. Additionally the use of objects (and that objects are numbered) frees
-the possibility to represent the document in the manner considered most suitable
-to a specific document format wilst retaining its structural (and citation)
-integrity).
-</p>
+<h2>Δ - Source repositories</h2>
<p>
-<b>Characteristics. Objects have properties and attributes:</b> Objects have
-properties (and may have attributes). By properties I here refer to the
-fundamental type of object, be it a heading, a paragraph, table, verse etc.
-Attributes extend further and may include other things that one might wish to
-associate with the object (examples not necessarily currently available/
-implemented in SiSU might include, formatting whether it is indented, or
-metadata e.g. the associated language, or programming language for a code block)
+All project repositories are at
+<a href="https://git.sisudoc.org">https://git.sisudoc.org</a>:
</p>
-<p>
-<b>Document structure. Heading objects hold documents structure:</b> Heading
-objects hold documents structure through their heading level property. The types
-of document of interest to SiSU have structure that is captured by the heading
-level property. Headings are individual objects like any other with the
-additional properties that (i) they may be regarded as containing the other
-objects following them sequentially (until the next heading of a similar or
-higher level), heading objects may include other headings (sub-headings), and
-(ii) that they have a heirarchy, the root "heading" being the document
-title.
-<br>
-A complication was intruduced to provide greater flexibility across document
-output formats. Headings have two sets of levels, the level under which
-substantive text occurs, this would be a chapter or segment level, and above
-that in the heirarchy if needed are document section separators, book, section,
-part.
-</p>
-
-<p>
-<b>Non-objects</b> Most but not all parts of a document are treated as objects.
-Notably footnotes are not objects in themselves, rather belonging to the object
-from which they are referenced, and following their own numbering sequence. From
-heading objects (linked) tables of content may be generated, and if additional
-metadata is provided book type indexes can be generated that link back to the
-objects to which they relate.
-</p>
+<ul>
+ <li><b>sisudoc-spine</b> (D) - the current generator
+ <br><code>git clone git://git.sisudoc.org/software/sisudoc-spine</code></li>
+ <li><b>sisudoc-spine-search-cgi</b> (D) - object-granular CGI search
+ <br><code>git clone git://git.sisudoc.org/software/sisudoc-spine-search-cgi</code></li>
+ <li><b>sisudoc-spine-samples</b> - 25+ marked-up sample documents
+ <br><code>git clone git://git.sisudoc.org/markup/sisudoc-spine-samples</code></li>
+ <li><b>sisu</b> (Ruby, original/antecedent) - the original generator
+ <br><code>git clone git://git.sisudoc.org/software/sisu</code></li>
+ <li><b>sisu-markup-samples</b> - samples for the original sisu
+ <br><code>git clone git://git.sisudoc.org/markup/sisu-markup-samples</code></li>
+ <li><b>tree-sitter-sisu</b> - tree sitter for sisu markup
+ <br><code>git clone git://git.sisudoc.org/tools/tree-sitter-sisu</code></li>
+</ul>
-<p>
-<b>The Document Header.</b> SiSU document have headers which contain document
-metadata, at a minimum the document title and author. In addition the document
-header may contain markup instruction (e.g. how to identify headings within the
-document, in which case those headings need not be found and treated
-accordingly)
-</p>
-
-<p>
-SiSU parsers have now been implemented in different programming paradigms and
-languages a couple of times, the chosen markup has been left unchanged though
-the document headers have been modified.
-<br>
-This is the core of sisu, beyond which there is more but largely in the form of
-choices based on ... existing output formats and of implementation detail,
-deciding what attributes of objects, or within objects should be supported,
-extending markup to allow for the generation of book indexes from if tagging
-provided.
-</p>
-
-<h2>ℹ - SiSU Historical Descriptions</h2>
-
-<p>
-Here is a description that has been used for the original sisu (scribe):
-</p>
-
-<p>
-With minimal preparation of a plain-text (UTF-8) file, using sisu markup syntax
-in your text editor of choice, SiSU can generate various document formats, most
-of which share a common object numbering system for locating content, including
-plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODF:ODT), LaTeX, PDF
-files, and populate an SQL database with objects (roughly paragraph-sized
-chunks) so searches may be performed and matches returned with that degree of
-granularity. Think of being able to finely match text in documents, using common
-object numbers, across different output formats (same object identifier for pdf,
-epub or html) and across languages if you have translations of the same document
-(same object identifier across languages). For search, your criteria is met by
-these documents at these locations within each document (equally relevant across
-different output formats and languages). To be clear (if obvious) page numbers
-provide none of this functionality. Object numbering is particularly suitable
-for "published" works (finalized texts as opposed to works that are frequently
-changed or updated) for which it provides a fixed means of reference of content.
-Document outputs can also share provided semantic meta-data.
-</p>
-
-<h3>...</h3>
-
-<p>
-SiSU is less about document layout than it is about finding a way using little
-markup to construct an abstract representation of a document that makes it
-possible to produce multiple representations of it which may be rather different
-from each other and used for different purposes, whether layout and publishing,
-scrollworthy online viewing/ reading, or content search. To be able to take
-advantage from its minimal preparation starting point of some of the strengths
-of rather different established ways of representing documents for different
-purposes, whether for search (relational database, or indexed flat files
-generated for that purpose whether of complete documents, or say of files made
-up of objects), online or other electronic viewing (e.g. html, xml, epub), or
-paper publication (e.g. pdf via latex)...
-</p>
-
-<p>
-The solution arrived at is to extract structural information about the document
-(document sections and headings within the document, available through pattern
-matching or markup) and tracking objects (which primarily are defined units of
-text such as paragraphs, headings, tables, verse, etc. but also images) which
-can be reconstituted as the same documents with relevant object identification
-numbers so text (objects) can be referenced across different output formats and
-presentations.
-</p>
-
-<p>
-SiSU generates tables of content, and through its markup the means for metadata
-to be provided for the generation of book style indexes for a document (that
-again due to document object numbers are the same and equally relevant across
-all document formats). Per document classifying/organizing metadata can also be
-provided for automated document curation.
-</p>
-
-<p>
-... there have also been working experiments with sisu markup source, two way
-conversion/representation of sisu document markup source in mind-mapping
-(software kdissert was used for its strong focus on producing documents (now
-apparently called semantik)); also po4a software for translators has been used
-successfuly in its regular text mode for sisu markup in translation, (which is
-more an attribute of po4a than of sisu, but) which is of interest due to
-sisu/spine's object citation numbering being available across translations. Open
-Document Format text (odf:odt), has been an output, but much more interesting
-(and requested by potential users of sisu/spine) would be the ability of a word
-processor to save text/a document in sisu markup, making alternative document
-processing and presentations with sisu possible.
-</p>
-
-<p>
-also worth mention, in the relatively long history of this project, there has
-been work done on extracting hash representations of each object, that could
-hypothetically be shared to prove the content of a document without sharing its
-content, or of identifying which objects change; these hashes can also be used
-as unique identifiers in a database or as identifying filenames if individual
-objects are saved.
-</p>
-
-<p>
-SiSU has evolved, the current implementation focuses on one primary use-case,
-books and literary writings. However the concept on which it is based has wider
-application. Here is a prevously posted souvenir from my encounter with an IBM
-software evaluator in London June 2004 that came about through a chance
-encounter with an IBM manager at a Linux Expo, who was curious about my interest
-in Gnu/Linux with my legal background... on hearing that I also wrote software,
-he suggested, maybe IBM should have a look at it. I was interested, the meeting
-was set up... with an IBM, Software Innovations evaluator
-<br>
-His response after the meeting:
-</p>
-
-<p>
-"Ralph<br>Good to meet with you today, I was very impressed with your
-software.<br><i>[colleague's name (also posted to an IBM colleague)]</i> - in
-summary - Ralph has built an application that runs on linux and takes ASCII
-documents and pulls them apart in to the smallest constituent parts, storing
-them as XML, PDF and HTML, the HTML are hyperlinked up so the document can be
-browsed in its full form. the format and text data created is stored in a
-database.<br>This has potential in any place that needs the power of full text
-search whilst holding the structural concepts of the document i.e. legal,
-pharma, education, research.. which ones we need to figure out, ..."
-</p>
-
-<p>
-Special interest was expressed in the search implications of SiSU. To
-paraphrase, the company has document management systems dealing with hundreds of
-thousands of texts, these tell you which documents match your search criteria,
-but cannot inform you where within a text these matches were found without
-opening the documents. This is achieved through defining document objects and
-making them the building block of the document, trackable document objects (that
-can be placed back in the context of the document or corpus of documents if part
-of a collection). SiSU's early design was to - abstract documents to their
-structure, and identified objects, numbered in a citable way (as pointed out
-document object hashes can be of use for the purpose).
-</p>
-
-<h2>ℹ - SiSU Spine (sisudoc-spine)</h2>
-
-<p>
-SiSU Spine is the new generator for documents prepared in sisu markup, written
-in D as opposed to the original sisu which was first shared in Ruby.
-</p>
+<h2>ℹ - Spine vs. the original sisu</h2>
<p>
-sisudoc spine code was shared publicly under the AGPLv3 2024-05-01 (after
-considerable procrastination). (It should be fairly straightforward to have this
-work on other OS platforms, I have only used linux since 1999.)
-</p>
-
-<p>
-As compared with the original sisu generator sisu spine:
-</p>
-
-<p>
-- Spine uses the same document markup for the document body, but uses yaml for
-document headers (which contains document metadata and configuration details),
-the original sisu has a bespoke markup for headers.
-</p>
-
-<p>
-- Spine (written in D) is considerably faster at generating native output than
-sisu (written in Ruby), on last test at least 60 times faster (what took 1
-minute takes 1 second; 1 hour a minute :-) (admittedly some time ago, ruby has
-been getting faster, hopefully this is not over over promising).
-</p>
-
-<p>
-- Spine produces fewer document outputs types than sisu (html, epub, (odt,
-latex) and populates sql db for search)
-</p>
-
-<p>
-- As regards non-native output, so far Spine has greater separation of what it
-does and largely leaves calling the external program to the user, e.g.: latex
-output is a native output in the sense that it is generated directly by spine,
-but the pdfs that can be produced from these are produced through use of an
-external program xelatex, which produces fine output but is a very much slower
-process.
-</p>
-
-<p>
-- (where both produce the same output type, generally) Spine generally produces
-more up to date output format representations.
-</p>
-
-<h2>ℹ - Some Observations</h2>
-
-<p>
-SiSU is more suited to finalized/stratified/published writings (writings,
-articles, books), that are to remain and be referenced as published,
-representing a work or ideas, set at a given time. (As opposed to the
-increasingly prevalent and important forms of fluid text).
-</p>
-
-<p>
-Trained AI likely could assist in the preparation of documents (with SiSU
-markup), with resulting deterministic and reproducible outputs (for substantive
-document objects). Caveats: Where text objects may be in blocks (or not) there
-is some room for discretion and ambiguity in the markup with resulting
-possibility of differences in the resulting presentation of a document. Book
-indexes are another area that if desired is markup intensive and unless
-following an already published index, can be prepared differently and possibly
-improved over time, and for specialised collections on a subject area could
-potentially be prepared against a thesaurus.
-</p>
-
-<h2>ℹ - Thank You</h2>
-
-<p>
-Thanks to all who help produce and maintain the software and libraries I am able
-to use and have come to rely on. Reliable infrastructure so far.
+Spine (D) and the original sisu (Ruby) share the same lightweight body markup;
+spine moves the document header to YAML where the original uses a bespoke header
+dialect. Spine is roughly 60x faster on equivalent inputs (a one-minute Ruby run
+is about a one-second D run). Spine emits HTML, EPUB, LaTeX, ODT, plain text and
+the SQLite search database; PDF is delegated to an external xelatex pass (slower
+but produces excellent output). For output formats both produce, spine's
+representations are generally more up to date. Spine was released publicly under
+AGPLv3 on 2024-05-01.
</p>
<hr>
-<p class="tiny"><i>
-ralph.amissah www since 1993 ;-)
-</i></p>
-<hr>
-<h2>Some external links of interest</h2>
+<!-- Below the fold: long-form material wrapped in <details> so the
+ homepage does not have to render or scroll past it on first
+ paint. The content is unchanged, just moved. -->
+
+<details>
+ <summary><b>ℹ - A longer description (design and intent)</b></summary>
+
+ <p>
+ <b>Summary.</b> An object is a unit of text within a document, the most common
+ being a paragraph. Objects include individual headings, paragraphs, tables,
+ and grouped text of various types such as code blocks and (within poems)
+ verse. Objects have properties and attributes; of particular significance are
+ headings and their levels, which provide document structure. A heading is an
+ object with a hierarchical value that conceptually contains other objects
+ (such as paragraphs and possibly sub-headings). Objects are tracked
+ sequentially as they relate to each other within a document, and substantive
+ objects are numbered sequentially for citation purposes. Notably, footnotes
+ are not objects in themselves - they belong to the object from which they are
+ referenced, and follow their own numbering sequence. From heading objects,
+ linked tables of content may be generated; and if additional metadata is
+ provided, book-style indexes can be generated that link back to the objects to
+ which they relate.
+ </p>
+
+ <p>
+ <b>Object-centricity.</b> In SiSU, objects are the fundamental unit from which
+ larger constructs and the document itself are built. Breaking the document
+ into objects provides interesting possibilities.
+ </p>
+
+ <p>
+ <b>Objects are fundamental building blocks.</b> Objects are usually blocks of
+ text - paragraphs, headings, tables, grouped text of various types including
+ code blocks and verse - and may also be, for example, images. Objects can be
+ formatted and placed as needed, enabling multiple types of representation
+ across disparate formats and text receptacles: HTML, EPUB, LaTeX, (in the
+ past, mind-maps) and SQL (populated at object level, so that search has that
+ degree of granularity).
+ </p>
+
+ <p>
+ <b>Sequence.</b> Objects have sequence - this follows authorship and is part
+ of how a document conveys meaning.
+ </p>
+
+ <p>
+ <b>Object numbers and citation.</b> Substantive objects are numbered
+ sequentially and can be referenced for citation purposes. Object numbers
+ locate text precisely across different document formats and different
+ languages (assuming the document has been translated). For search, they
+ identify precisely where within each document the search criteria are met - in
+ the form of an index, or by surfacing the matching text objects so a reader
+ can decide which documents are of interest before opening them. Object
+ numbering also frees the representation of each format to be whatever is most
+ suitable to that format, while structural and citation integrity are retained.
+ </p>
+
+ <p>
+ <b>Characteristics.</b> Objects have properties (the fundamental type:
+ heading, paragraph, table, verse, etc.) and may carry attributes (e.g.
+ indentation, language, programming language for a code block).
+ </p>
+
+ <p>
+ <b>Document structure.</b> Headings hold the document's structure through
+ their heading-level property. Headings are individual objects like any other,
+ with the additional properties that (i) they may be regarded as containing the
+ other objects following them sequentially (until the next heading of similar
+ or higher level), and (ii) they have a hierarchy, the root being the document
+ title. To give greater flexibility across output formats, headings have two
+ sets of levels: the level under which substantive text occurs (chapter or
+ segment), and above that, optional document section separators (book, section,
+ part).
+ </p>
+
+ <p>
+ <b>Non-objects.</b> Footnotes are not objects in themselves; they belong to
+ the referencing object and follow their own numbering sequence. Tables of
+ content may be generated from heading objects; book-style indexes may be
+ generated when the required metadata is provided.
+ </p>
+
+ <p>
+ <b>The document header.</b> A SiSU document has a header carrying document
+ metadata - at a minimum, title and author. The header may also carry markup
+ instructions (e.g. how to identify headings within the document, so that those
+ headings do not need to be inferred).
+ </p>
+
+</details>
+
+<details>
+ <summary><b>ℹ - Historical description (original sisu)</b></summary>
+
+ <p>
+ With minimal preparation of a plain-text (UTF-8) file using SiSU markup syntax
+ in your text editor of choice, SiSU can generate various document formats,
+ most of which share a common object numbering system for locating content -
+ plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODT), LaTeX, PDF - and
+ populate an SQL database with objects (roughly paragraph-sized chunks) so
+ searches may be performed and matches returned with that degree of
+ granularity. Think of being able to finely match text across different output
+ formats (same object identifier for PDF, EPUB or HTML) and across languages
+ where translations exist (same object identifier across languages). For
+ search, your criteria are met by these documents at these locations within
+ each document (equally relevant across different output formats and
+ languages). Page numbers provide none of this functionality. Object numbering
+ is particularly suitable for "published" works (finalised texts as opposed to
+ works that are frequently changed or updated), for which it provides a fixed
+ means of reference of content. Document outputs can also share provided
+ semantic metadata.
+ </p>
+
+ <p>
+ SiSU is less about document layout than about finding a way, using little
+ markup, to construct an abstract representation of a document that makes it
+ possible to produce multiple representations - which may be rather different
+ from each other and used for different purposes - whether layout and
+ publishing, scrollworthy online viewing, or content search. The aim is to take
+ advantage, from a minimal-preparation starting point, of some of the strengths
+ of rather different established ways of representing documents for different
+ purposes: search (relational database, or indexed flat files of complete
+ documents or files made up of objects), online or electronic viewing (HTML,
+ XML, EPUB), or paper publication (PDF via LaTeX).
+ </p>
+
+ <p>
+ The solution arrived at is to extract structural information about the
+ document (sections and headings, available through pattern matching or markup)
+ and to track objects (defined units of text such as paragraphs, headings,
+ tables, verse, etc., but also images), which can then be reconstituted as the
+ same document with relevant object identification numbers - so text (objects)
+ can be referenced across different output formats and presentations.
+ </p>
+
+ <p>
+ SiSU generates tables of content and, through its markup, the means for
+ metadata to be provided for the generation of book-style indexes for a
+ document (that, again, due to document object numbers, are the same and
+ equally relevant across all output formats). Per-document
+ classifying/organizing metadata can also be provided for automated document
+ curation.
+ </p>
+
+ <p>
+ There have also been working experiments with SiSU-markup source: two-way
+ conversion/representation in mind-mapping software (kdissert / semantik, for
+ its strong focus on producing documents); and po4a (for translators) has been
+ used successfully in its regular text mode for SiSU markup in translation -
+ which is more an attribute of po4a than of SiSU, but of interest due to
+ SiSU/spine's object citation numbering being available across translations.
+ ODT has been an output, but much more interesting (and requested by potential
+ users) would be the ability of a word processor to save text in SiSU markup,
+ making alternative document processing and presentations with SiSU possible.
+ </p>
+
+ <p>
+ Also worth mention: in the relatively long history of this project there has
+ been work on extracting hash representations of each object that could
+ hypothetically be shared to prove the content of a document without sharing
+ its content, or to identify which objects change. These hashes can also be
+ used as unique identifiers in a database, or as filenames if individual
+ objects are saved.
+ </p>
+
+</details>
+
+<details>
+ <summary><b>ℹ - From a 2004 evaluation (IBM Software
+ Innovations)</b></summary>
+
+ <p>
+ SiSU has evolved; the current implementation focuses on one primary use-case,
+ books and literary writings. The concept, however, has wider application. The
+ following is a souvenir from an encounter with an IBM software evaluator in
+ London in June 2004, set up after a chance meeting with an IBM manager at a
+ Linux Expo who was curious about my interest in GNU/Linux given my legal
+ background - on hearing that I also wrote software, he suggested IBM should
+ have a look. The evaluator's response after the meeting:
+ </p>
+
+ <p>
+ "Ralph<br>
+ Good to meet with you today, I was very impressed with your software.<br>
+ <i>[colleague's name (also posted to an IBM colleague)]</i> - in summary -
+ Ralph has built an application that runs on linux and takes ASCII documents
+ and pulls them apart in to the smallest constituent parts, storing them as
+ XML, PDF and HTML; the HTML are hyperlinked up so the document can be browsed
+ in its full form. The format and text data created is stored in a
+ database. <br>This has potential in any place that needs the power of full
+ text search whilst holding the structural concepts of the document i.e. legal,
+ pharma, education, research.. which ones we need to figure out, ..."
+ </p>
+
+ <p>
+ Special interest was expressed in the search implications of SiSU. To
+ paraphrase: the company has document management systems dealing with hundreds
+ of thousands of texts; these tell you which documents match your search
+ criteria, but cannot inform you where within a text these matches were found
+ without opening the documents. SiSU addresses this by defining document
+ objects and making them the building block of the document - trackable objects
+ that can be placed back in the context of the document or corpus of documents
+ if part of a collection. SiSU's early design was to abstract documents to
+ their structure and identified objects, numbered in a citable way (as the
+ evaluator pointed out, document-object hashes can be of use for the purpose).
+ </p>
+
+</details>
+
+<details>
+ <summary><b>ℹ - Some observations</b></summary>
+
+ <p>
+ SiSU is more suited to finalised / stratified / published writings (articles,
+ books) that are to remain and be referenced as published - works set at a
+ given time. (As opposed to the increasingly prevalent and important forms of
+ fluid text.)
+ </p>
+
+ <p>
+ Trained AI could likely assist in the preparation of documents with SiSU
+ markup, with resulting deterministic and reproducible outputs (for substantive
+ document objects). Caveats: where text objects may be in blocks (or not),
+ there is some room for discretion and ambiguity in the markup, with resulting
+ possibility of differences in presentation. Book indexes are another
+ markup-intensive area; unless following an already published index, they can
+ be prepared differently and possibly improved over time, and for specialised
+ subject collections could potentially be prepared against a thesaurus.
+ </p>
+
+</details>
-<h3>Development</h3>
-<h4>Programming</h4>
-<p>
- [ <a href="https://dlang.org/">
- D - (dlang) general purpose, multi-paradigm, fast C like programming language
- </a> ]
- [ <a href="https://code.dlang.org/">
- dub - package registry
- </a> ]
- [ <a href="https://forum.dlang.org/group/general">
- community discussion (mail list frontend)
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://www.ruby-lang.org/en/">
- Ruby
- </a> ]
- [ <a href="https://rubygems.org/">
- Gems
- </a> ]<br>
- [ <a href="https://crystal-lang.org/">
- Crystal
- </a> ]<br>
-</p>
-<h4>SQL DB</h4>
-<p>
- [ <a href="https://sqlite.org/index.html">
- Sqlite - an sql database engine
- </a> ]<br>
- [ <a href="https://www.postgresql.org/">
- PostgreSQL
- </a> ]<br>
-</p>
-<h4>Markup</h4>
-<p>
- [ <a href="https://www.w3.org/html/">
- HTML
- </a> ]
- [ <a href="https://html.spec.whatwg.org/multipage/">
- multipage current spec
- </a> ]
- [ <a href="https://dom.spec.whatwg.org/">
- dom current spec
- </a> ]<br>
- [ <a href="https://www.w3.org/publishing/epub32/">
- Epub
- </a> ]<br>
- [ <a href="https://www.w3.org/Style/CSS/">
- css - cascading style sheets
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://opendocumentformat.org/">
- OpenDocument Format
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://www.latex-project.org/get/">
- LaTeX
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://po4a.org/index.php.en">
- po4a - maintain translations
- </a> ]<br>
-</p>
-<h4>Operating System Distributions</h4>
-<p>
- [ <a href="https://nixos.org/">
- NixOS - linux based operating system built on the Nix declarative, reproducible and reliable, build system
- </a> ]
- [ <a href="https://github.com/NixOS/nixpkgs">
- nixpkgs (packages @ github)
- </a> ]
- [ <a href="https://search.nixos.org/packages?channel=unstable&from=0&size=100&sort=relevance&query=">
- package search
- </a> ]
- [ <a href="https://discourse.nixos.org/">
- community discussion (discourse)
- </a> ]
- [ <a href="https://discourse.nixos.org/t/nixos-foundation-board-giving-power-to-the-community/44552/">
- NixOS Foundation board: Giving power to the community
- </a> ]<br>
<!--
- [ <a href="https://aux.computer/">
- Aux - aux.computer - a community fork of nix (under deliberation), billed as "An alternative to the Nix ecosystem"
- </a> ]
- [ <a href="https://forum.aux.computer/">
- community discussion (discourse)
- </a> ]<br>
--->
- Gnu [ <a href="https://guix.gnu.org/">
- Guix
- </a> ]
- [ <a href="https://guix.gnu.org/en/packages/">
- packages
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://debian.org/">
- Debian - the universal operating system distribution
- </a> ]<br>
- [ <a href="https://www.devuan.org/">
- Devuan
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://archlinux.org/">
- Arch Linux
- </a> ]
- [ <a href="https://wiki.archlinux.org/">
- Arch Wiki
- </a> ]<br>
-</p>
<hr>
-<h2>Extraneous (external) links of personal interest</h2>
-
-<h4>Workspace</h4>
-
-<h5>Shell</h5>
<p>
- [ <a href="https://www.zsh.org/">
- zsh
- </a> ]<br>
- [ <a href="https://starship.rs/">
- starship - customizable cross-shell prompt
- </a> ]<br>
-</p>
-<h5>Terminal</h5>
-<p>
- [ <a href="https://gnunn1.github.io/tilix-web/">
- tilix
- </a> ]
- [ <a href="https://alacritty.org/">
- alacritty
- </a> ]<br>
-</p>
-<h5>Terminal Multiplexer</h5>
-<p>
- [ <a href="https://github.com/tmux/tmux">
- tmux (github)
- </a> ]
- [ <a href="https://www.gnu.org/software/screen/">
- screen
- </a> ]<br>
-</p>
-<h5>Window Manager</h5>
-<p>
- [ <a href="https://i3wm.org/">
- i3wm
- </a> ]
- [ <a href="https://swaywm.org/">
- sway
- </a> ]<br>
-</p>
-<h5>Text Editors</h5>
-<p>
- Gnu Emacs
- [ <a href="https://github.com/hlissner/doom-emacs">
- Doom Emacs (github)
- </a> ]
- [ <a href="https://orgmode.org/">
- Org-Mode - your life in plain text & literate programming
- </a> ]
- [ <a href="https://github.com/emacs-evil/evil">
- Evil-Mode
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://www.vim.org/">
- Vim
- </a> ]
- [ <a href="https://neovim.io/">
- NeoVim
- </a> ]<br>
-</p>
-<h5>Source Control Manager</h5>
-<p>
- [ <a href="https://git-scm.com/">
- Git
- </a> ]<br>
-</p>
-<h5>Browsers</h5>
-<p>
- [ <a href="https://vieb.dev/">
- vieb
- </a> ]
- [ <a href="https://fanglingsu.github.io/vimb/">
- vimb
- </a> ]<br>
- [ <a href="https://brave.com/">
- brave
- </a> ]<br>
+<a href="./links.html">Personal-interest external links</a> (toolchain,
+distributions, editors, forges).
</p>
-<h3>Search</h3>
-<p>
- [ <a href="https://duckduckgo.com/">
- DuckDuckGo
- </a> ]
- [ <a href="https://yubnub.org/">
- YubNub
- </a> ]<br>
-</p>
-
-<h3>eMail</h3>
-<p>
- [ <a href="https://www.migadu.com/">
- Migadu
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://notmuchmail.org/">
- NotmuchMail
- </a> ]<br>
-</p>
-
-<h3>Forges</h3>
-<p>
- [ <a href="https://sourcehut.org/">
- Sourcehut
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://codeberg.org/">
- CodeBerg
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://github.com">
- GitHub
- </a> ]
- [ <a href="https://gitlab.com">
- GitLab
- </a> ]<br>
-</p>
-
-<h3>Software Archives</h3>
-<p>
- [ <a href="https://www.softwareheritage.org/">
- Software Heritage - the universal software archive
- </a> ]<br>
-</p>
+--!>
<hr>
+
<p class="tiny"><i>
-ralph.amissah www since 1993 ;-)
+ralph.amissah - www since 1993 ;-)
</i></p>
</body>
diff --git a/org/spine-bespoke-output-homepage-html.org b/org/spine-bespoke-output-homepage-html.org
index fe64e09..7baa474 100644
--- a/org/spine-bespoke-output-homepage-html.org
+++ b/org/spine-bespoke-output-homepage-html.org
@@ -20,761 +20,428 @@
#+HEADER: :tangle "../markup/sisudoc-spine-bespoke-output/html/homepage.index.html"
#+BEGIN_SRC html
<!DOCTYPE html>
-<html>
+<html lang="en">
<head>
- <meta http-equiv="Content-Type" content="text/plain; charset=UTF-8" />
+ <meta charset="UTF-8" />
<title>≅ SiSU project sisudoc.org</title>
<link href="./css/html_seg.css" rel="stylesheet" />
</head>
<body>
-<h1>≅ - SiSU for documents - structuring, publishing in multiple
-formats &amp; search</h1>
-
-<h2>ℹ - A short description</h2>
+<h1>≅ SiSU - lightweight markup, object-centric documents,
+multiple outputs &amp; search</h1>
<p>
-SiSU is an object-centric, lightweight markup based, document structuring,
-parser, publishing and search tool for document collections. It is command line
-oriented and generates static content that is made searchable at an object level
-through an SQL database.
+SiSU parses a lightweight-markup source into an abstract document object
+model. Every substantive element (paragraph, heading, table, verse, image)
+becomes a typed object carrying its position in the document's sequence
+and hierarchy, and a stable citation number. From that single abstraction
+it emits multiple output formats - HTML (segmented and scroll), EPUB3,
+LaTeX (then PDF via xelatex), ODT, plain text, and an SQLite full-text
+search database. Each object's number stays stable across every output
+format and across translations of the same document.
</p>
<p>
-SiSU markup helps define (delineate) objects (primarily various types of text
-block) which are tracked in sequence, substantive objects being numbered
-sequentially by the program for object citation. Breaking document into numbered
-objects provides interesting possibilities. These object numbers provide the
-possibility of citing/locating text precisely across different document formats
-and different languages (assuming the document has been translated). For search
-it also makes it possible to identify precisely where within in each document
-search criteria is met in the form of an index. Additionally the use of objects
-(and that objects are numbered) frees the possibility to represent the document
-in the manner considered most suitable to a specific document format (whilst
-retaining its structural (and citation) integrity).
+The processing pipeline is <b>markup &#8594; abstraction &#8594;
+output</b>.
</p>
<p>
-Objects which include their inherent associated properties (which vary by type
-of object), constitute building blocks of a document from which alternative
-representations of a document can be (imagined and) built.
+<b>Object-Centric Document Abstraction</b>. The abstraction stage builds an
+in-memory object model: every paragraph, heading, table, footnote and so on is a
+numbered object that carries its own parent / sibling / type metadata, known as
+OCN (Object Citation Numbering). Every output format is generated from that
+single abstraction, so all formats share the same object identifiers. The
+abstraction can also be written out as a human-readable, PEG-parsable text
+format (<code>.ssp</code>) that other tools can consume directly.
</p>
-<h2>Δ - SiSU project source</h2>
+<h2>ℹ - How this differs from a typical "markup &#8594; HTML"
+pipeline</h2>
<p>
- <a href="./projects">
- Δ SiSU projects repo (git)
- </a><br>
- - <a href="https://git.sisudoc.org">
- https://git.sisudoc.org
- </a><br>
-</p>
+<ul>
+ <li><b>Citation that survives format conversion.</b> Quote object 412 and the
+ reference is meaningful in the HTML, the EPUB, the PDF, the plain text and the
+ SQLite search results - and in any translation, because OCN is a property of
+ the abstraction, not of pagination or layout.</li>
-<h3>Δ - sisudoc-spine project source (programmed in D)</h3>
+ <li><b>Object-granular search.</b> The SQLite database is populated at object
+ granularity. A query reports not just "this document matches" but "object 412
+ in this document matches" - and links straight back to that object in the
+ published HTML.</li>
-<p>
- <a href="./projects/sisudoc-spine">
- Δ SiSU (sisudoc-spine): document publishing (multiple formats + search) [D]
- </a><br>
- - <a href="https://git.sisudoc.org/sisudoc-spine">
- https://git.sisudoc.org/sisudoc-spine
- </a><br>
- git clone git://git.sisudoc.org/software/sisudoc-spine
- <br>
-</p>
+ <li><b>Inspectable intermediate form.</b> The document abstraction has a
+ human-readable, PEG-parsable text serialisation (<code>.ssp</code>). Other
+ tools - in any language - can consume the abstraction without re-implementing
+ the parser. This is also what lets the abstraction stage be reasoned about,
+ diffed, fed to embedding pipelines, or used as the input to custom
+ renderers.</li>
-<p>
- <a href="./projects/sisudoc-spine-search-cgi">
- Δ SiSU (sisudoc-spine search): a sample cgi sqlite search for sisudoc-spine [D]
- </a><br>
- - <a href="https://git.sisudoc.org/sisudoc-spine-search-cgi">
- https://git.sisudoc.org/sisudoc-spine-search-cgi
- </a><br>
- git clone git://git.sisudoc.org/software/sisudoc-spine-search-cgi
- <br>
-</p>
+ <li><b>Deterministic and reproducible.</b> The same markup source produces the
+ same OCN sequence and the same outputs every time. Per-object content hashes
+ can be exposed for content identification or verification without disclosing
+ the content itself.</li>
-<p>
- <a href="./projects/sisudoc-spine-samples">
- Δ SiSU (sisudoc-spine markup): markup samples in document pods for sisudoc-spine
- </a><br>
- - <a href="https://git.sisudoc.org/sisudoc-spine-samples">
- https://git.sisudoc.org/sisudoc-spine-samples
- </a><br>
- git clone git://git.sisudoc.org/markup/sisudoc-spine-samples
- <br>
-</p>
-
-<h3>Δ - sisu scribe project source (programmed in Ruby)</h3>
-
-<p>
- <a href="./projects/sisu">
- Δ SiSU (scribe): document publishing (multiple formats + search) [Ruby]
- </a><br>
- - <a href="https://git.sisudoc.org/sisu">
- https://git.sisudoc.org/sisu
- </a><br>
- git clone git://git.sisudoc.org/software/sisu
- <br>
-</p>
+ <li><b>Designed for finished, "published" works.</b> SiSU is aimed at writings
+ that are published as a stable artefact (books, essays, articles, legal and
+ regulatory texts), where a fixed citable reference of object-level granularity
+ is more valuable than the flexibility of fluid text.</li>
-<p>
- <a href="./projects/sisu-markup">
- Δ SiSU markup samples in document pods for sisu (scribe)
- </a><br>
- - <a href="https://git.sisudoc.org/sisu-markup">
- https://git.sisudoc.org/sisu-markup
- </a><br>
- git clone git://git.sisudoc.org/markup/sisu-markup-samples
- <br>
+ <li><b>Static output, optional search.</b> Generated content is static HTML /
+ EPUB / PDF / text - trivial to host and to archive. The SQLite + CGI search is
+ an opt-in component that adds object-granular full-text query without changing
+ the publishing model.</li>
+</ul>
</p>
-<h2>⌘ - SiSU Spine markup sample output</h2>
+<h2>⌘ - See it in action</h2>
<p>
-To give an idea of how this works here is a small collection of documents marked
-up for and generated by the software. The curation of topics for a collection of
-specialized related documents would benefit from a consistently applied bespoke
-ontology or thesaurus.
-<br>
-The documents presented are documents that have been released under various
-creative commons licences, in the public domain, or the author's work, with the
-exception of one that is under GPL and the old abandoned Debian live-manual
+A single document - <i>The Wealth of Networks</i>, Yochai Benkler -
+shown in every output format SiSU Spine produces. The same OCN
+identifies the same object in each:
</p>
-<p>
- <a href="./authors.html">
- ⌘ Authors
- </a>
- (software curated from provided document header metadata)<br>
- - <a href="./authors.html">
- https://sisudoc.org/spine/authors.html
- </a>
-</p>
+<ul>
+ <li><a href="./en/html/the_wealth_of_networks.yochai_benkler/toc.html">
+ HTML (segmented, one page per chapter)</a></li>
+ <li><a href="./en/html/the_wealth_of_networks.yochai_benkler.html">
+ HTML (single scroll)</a></li>
+ <li><a href="./en/epub/the_wealth_of_networks.yochai_benkler.en.epub">
+ EPUB</a></li>
+ <li><a href="./pdf/the_wealth_of_networks.yochai_benkler.en.a4.portrait.pdf">
+ PDF (LaTeX &#8594; xelatex, A4)</a></li>
+ <li><a href="./en/html/metadata.the_wealth_of_networks.yochai_benkler.html">
+ Metadata page</a></li>
+ <li><a href="./spine_search?fn=the_wealth_of_networks.yochai_benkler&amp;rt=txt&amp;ec=on&amp;url=on&amp;sml=1000">
+ Search within this document (object-granular)</a></li>
+ <li><a href="./pod/the_wealth_of_networks.yochai_benkler/">
+ Source pod (markup + assets + manifest)</a></li>
+</ul>
-<p>
- <a href="./topics.html">
- ⌘ Topics
- </a>
- (software curated from provided document header metadata)<br>
- - <a href="./topics.html">
- https://sisudoc.org/spine/topics.html
- </a>
-</p>
+<h2>⌘ - Browse and search the sample collection</h2>
-<h2>፨ - SiSU Spine search</h2>
<p>
- <a href="./spine_search">
- ፨ Search
- </a>
- (granular search of text objects)<br>
- - <a href="https://sisudoc.org/spine_search">
- https://sisudoc.org/spine_search
- </a>
+<a href="./authors.html">⌘ Authors</a>
+&nbsp;-&nbsp;
+<a href="./topics.html">⌘ Topics</a>
+&nbsp;-&nbsp;
+<a href="./spine_search">፨ Search</a>
+<br>
+(Authors and Topics are software-curated from each document's
+header metadata. Search is object-granular.)
</p>
<div class="p">
- <!-- SiSU Spine Search -->
- <form action="https://sisudoc.org/spine_search" target="_top" method="POST" accept-charset="UTF-8" id="search">
- <input type="text" name="sf" size="24" maxlength="255">
- <input type="hidden" name="db" value="spine.search.db">
+ <!-- SiSU Spine Search -->
+ <form action="https://sisudoc.org/spine_search" target="_top"
+ method="POST" accept-charset="UTF-8" id="search">
+ <input type="text" name="sf" size="32" maxlength="255"
+ placeholder="search the collection...">
+ <input type="hidden" name="db" value="spine.search.db">
<input type="hidden" name="sml" value="1000">
- <input type="hidden" name="ec" value="on">
+ <input type="hidden" name="ec" value="on">
<input type="hidden" name="url" value="on">
<button type="submit" form="search">&nbsp;㏈&nbsp;፨&nbsp;</button>
- </form>
- <!-- SiSU Spine Search -->
+ </form>
+ <!-- SiSU Spine Search -->
</div>
-<h2>ℹ - SiSU description</h2>
-
-<p>
-SiSU is an object-centric, lightweight markup based, document structuring,
-parser, publishing and search tool for document collections. It is command line
-oriented and generates static content that is currently made searchable at an
-object level through an SQL database.
-Markup helps define (delineate) objects (primarily various types of text block)
-which are tracked in sequence, substantive objects being numbered sequentially
-by the program for object citation.
-</p>
-
-<p>
-<b>Summary.</b> An object is a unit of text within a document the most common
-being a paragraph. Objects include individual headings, paragraphs, tables,
-grouped text of various types such as code blocks and within poems, verse.
-Objects have properties and attributes, of particular significance are headings
-and their levels which provide document structure. A heading is an object with a
-heirarchical value, that conceptually contains other objects (such as paragraphs
-and possibly sub-headings etc.). Objects are tracked sequentially as they relate
-to each other object within a document and substantive objects are numbered
-sequentially, for citation purposes. Notably footnotes are not objects in
-themselves, rather belonging to the object from which they are referenced, and
-following their own numbering sequence. From heading objects (linked) tables of
-content may be generated, and if additional metadata is provided book type
-indexes can be generated that link back to the objects to which they relate.
-</p>
-
-<p>
-<b>Unpacking this a bit further.</b> SiSU as a concept independent of its markup
-language and the parsers that have been implemented, is based on the following
-ideas:
-</p>
-
-<p>
-<b>Object-Centricity. On objects:</b> In SiSU objects are the fundamental unit
-from which larger constructs within a document and the document itself is built.
-Breaking the document into objects provides interesting possibilities.
-</p>
-
-<p>
-<b>Objects are fundamental building blocks:</b> Conceptually within SiSU,
-objects are the building blocks or individual units of construction of a
-document. Objects are usually blocks of text, the most common of which is the
-paragraph, other examples include: individual headings, tables, grouped text of
-various types which include code blocks and verse within poems, ... and as
-mentioned an object could also, for example, be an image. Objects can be
-formatted and placed as needed, providing flexibility and enabling multiple
-types of representation across disperate formats and text recepticle, examples
-including html, epub, latex (in the past mind-maps) and sql (populated at an
-object level, and thereby providing search with that degree of granularity).
-</p>
-
<p>
-<b>Sequential. Objects have sequence:</b> That objects have sequence, goes
-largely without saying, this follows authorship, it is part of the definition of
-a document and how a document is written to convey meaning.
+The collection contains 25+ documents released under various Creative Commons
+licences, in the public domain, or as the author's own work (with one
+GPL-licensed exception and the abandoned Debian live-manual). A specialised
+collection would benefit from a consistently applied bespoke ontology or
+thesaurus.
</p>
-<p>
-<b>Object Numbers & Citation. Substantive objects are numbered for citation
-purposes:</b> Most objects within a document are meant by the author to be a
-substantive part of the document. All such objects are numbered sequentially and
-can be referenced thereby for citation purposes.
-<br>
-Object numbers provide the possibility of citing/locating text precisely across
-different document formats and different languages (assuming the document has
-been translated). For search it also makes it possible to identify precisely
-where search criteria is met within in each document in the form of an index or
-to view those precise text objects before deciding which documents are of
-interest. Additionally the use of objects (and that objects are numbered) frees
-the possibility to represent the document in the manner considered most suitable
-to a specific document format wilst retaining its structural (and citation)
-integrity).
-</p>
+<h2>Δ - Source repositories</h2>
<p>
-<b>Characteristics. Objects have properties and attributes:</b> Objects have
-properties (and may have attributes). By properties I here refer to the
-fundamental type of object, be it a heading, a paragraph, table, verse etc.
-Attributes extend further and may include other things that one might wish to
-associate with the object (examples not necessarily currently available/
-implemented in SiSU might include, formatting whether it is indented, or
-metadata e.g. the associated language, or programming language for a code block)
+All project repositories are at
+<a href="https://git.sisudoc.org">https://git.sisudoc.org</a>:
</p>
-<p>
-<b>Document structure. Heading objects hold documents structure:</b> Heading
-objects hold documents structure through their heading level property. The types
-of document of interest to SiSU have structure that is captured by the heading
-level property. Headings are individual objects like any other with the
-additional properties that (i) they may be regarded as containing the other
-objects following them sequentially (until the next heading of a similar or
-higher level), heading objects may include other headings (sub-headings), and
-(ii) that they have a heirarchy, the root "heading" being the document
-title.
-<br>
-A complication was intruduced to provide greater flexibility across document
-output formats. Headings have two sets of levels, the level under which
-substantive text occurs, this would be a chapter or segment level, and above
-that in the heirarchy if needed are document section separators, book, section,
-part.
-</p>
-
-<p>
-<b>Non-objects</b> Most but not all parts of a document are treated as objects.
-Notably footnotes are not objects in themselves, rather belonging to the object
-from which they are referenced, and following their own numbering sequence. From
-heading objects (linked) tables of content may be generated, and if additional
-metadata is provided book type indexes can be generated that link back to the
-objects to which they relate.
-</p>
+<ul>
+ <li><b>sisudoc-spine</b> (D) - the current generator
+ <br><code>git clone git://git.sisudoc.org/software/sisudoc-spine</code></li>
+ <li><b>sisudoc-spine-search-cgi</b> (D) - object-granular CGI search
+ <br><code>git clone git://git.sisudoc.org/software/sisudoc-spine-search-cgi</code></li>
+ <li><b>sisudoc-spine-samples</b> - 25+ marked-up sample documents
+ <br><code>git clone git://git.sisudoc.org/markup/sisudoc-spine-samples</code></li>
+ <li><b>sisu</b> (Ruby, original/antecedent) - the original generator
+ <br><code>git clone git://git.sisudoc.org/software/sisu</code></li>
+ <li><b>sisu-markup-samples</b> - samples for the original sisu
+ <br><code>git clone git://git.sisudoc.org/markup/sisu-markup-samples</code></li>
+ <li><b>tree-sitter-sisu</b> - tree sitter for sisu markup
+ <br><code>git clone git://git.sisudoc.org/tools/tree-sitter-sisu</code></li>
+</ul>
-<p>
-<b>The Document Header.</b> SiSU document have headers which contain document
-metadata, at a minimum the document title and author. In addition the document
-header may contain markup instruction (e.g. how to identify headings within the
-document, in which case those headings need not be found and treated
-accordingly)
-</p>
-
-<p>
-SiSU parsers have now been implemented in different programming paradigms and
-languages a couple of times, the chosen markup has been left unchanged though
-the document headers have been modified.
-<br>
-This is the core of sisu, beyond which there is more but largely in the form of
-choices based on ... existing output formats and of implementation detail,
-deciding what attributes of objects, or within objects should be supported,
-extending markup to allow for the generation of book indexes from if tagging
-provided.
-</p>
-
-<h2>ℹ - SiSU Historical Descriptions</h2>
-
-<p>
-Here is a description that has been used for the original sisu (scribe):
-</p>
-
-<p>
-With minimal preparation of a plain-text (UTF-8) file, using sisu markup syntax
-in your text editor of choice, SiSU can generate various document formats, most
-of which share a common object numbering system for locating content, including
-plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODF:ODT), LaTeX, PDF
-files, and populate an SQL database with objects (roughly paragraph-sized
-chunks) so searches may be performed and matches returned with that degree of
-granularity. Think of being able to finely match text in documents, using common
-object numbers, across different output formats (same object identifier for pdf,
-epub or html) and across languages if you have translations of the same document
-(same object identifier across languages). For search, your criteria is met by
-these documents at these locations within each document (equally relevant across
-different output formats and languages). To be clear (if obvious) page numbers
-provide none of this functionality. Object numbering is particularly suitable
-for "published" works (finalized texts as opposed to works that are frequently
-changed or updated) for which it provides a fixed means of reference of content.
-Document outputs can also share provided semantic meta-data.
-</p>
-
-<h3>...</h3>
-
-<p>
-SiSU is less about document layout than it is about finding a way using little
-markup to construct an abstract representation of a document that makes it
-possible to produce multiple representations of it which may be rather different
-from each other and used for different purposes, whether layout and publishing,
-scrollworthy online viewing/ reading, or content search. To be able to take
-advantage from its minimal preparation starting point of some of the strengths
-of rather different established ways of representing documents for different
-purposes, whether for search (relational database, or indexed flat files
-generated for that purpose whether of complete documents, or say of files made
-up of objects), online or other electronic viewing (e.g. html, xml, epub), or
-paper publication (e.g. pdf via latex)...
-</p>
-
-<p>
-The solution arrived at is to extract structural information about the document
-(document sections and headings within the document, available through pattern
-matching or markup) and tracking objects (which primarily are defined units of
-text such as paragraphs, headings, tables, verse, etc. but also images) which
-can be reconstituted as the same documents with relevant object identification
-numbers so text (objects) can be referenced across different output formats and
-presentations.
-</p>
-
-<p>
-SiSU generates tables of content, and through its markup the means for metadata
-to be provided for the generation of book style indexes for a document (that
-again due to document object numbers are the same and equally relevant across
-all document formats). Per document classifying/organizing metadata can also be
-provided for automated document curation.
-</p>
-
-<p>
-... there have also been working experiments with sisu markup source, two way
-conversion/representation of sisu document markup source in mind-mapping
-(software kdissert was used for its strong focus on producing documents (now
-apparently called semantik)); also po4a software for translators has been used
-successfuly in its regular text mode for sisu markup in translation, (which is
-more an attribute of po4a than of sisu, but) which is of interest due to
-sisu/spine's object citation numbering being available across translations. Open
-Document Format text (odf:odt), has been an output, but much more interesting
-(and requested by potential users of sisu/spine) would be the ability of a word
-processor to save text/a document in sisu markup, making alternative document
-processing and presentations with sisu possible.
-</p>
-
-<p>
-also worth mention, in the relatively long history of this project, there has
-been work done on extracting hash representations of each object, that could
-hypothetically be shared to prove the content of a document without sharing its
-content, or of identifying which objects change; these hashes can also be used
-as unique identifiers in a database or as identifying filenames if individual
-objects are saved.
-</p>
-
-<p>
-SiSU has evolved, the current implementation focuses on one primary use-case,
-books and literary writings. However the concept on which it is based has wider
-application. Here is a prevously posted souvenir from my encounter with an IBM
-software evaluator in London June 2004 that came about through a chance
-encounter with an IBM manager at a Linux Expo, who was curious about my interest
-in Gnu/Linux with my legal background... on hearing that I also wrote software,
-he suggested, maybe IBM should have a look at it. I was interested, the meeting
-was set up... with an IBM, Software Innovations evaluator
-<br>
-His response after the meeting:
-</p>
-
-<p>
-"Ralph<br>Good to meet with you today, I was very impressed with your
-software.<br><i>[colleague's name (also posted to an IBM colleague)]</i> - in
-summary - Ralph has built an application that runs on linux and takes ASCII
-documents and pulls them apart in to the smallest constituent parts, storing
-them as XML, PDF and HTML, the HTML are hyperlinked up so the document can be
-browsed in its full form. the format and text data created is stored in a
-database.<br>This has potential in any place that needs the power of full text
-search whilst holding the structural concepts of the document i.e. legal,
-pharma, education, research.. which ones we need to figure out, ..."
-</p>
-
-<p>
-Special interest was expressed in the search implications of SiSU. To
-paraphrase, the company has document management systems dealing with hundreds of
-thousands of texts, these tell you which documents match your search criteria,
-but cannot inform you where within a text these matches were found without
-opening the documents. This is achieved through defining document objects and
-making them the building block of the document, trackable document objects (that
-can be placed back in the context of the document or corpus of documents if part
-of a collection). SiSU's early design was to - abstract documents to their
-structure, and identified objects, numbered in a citable way (as pointed out
-document object hashes can be of use for the purpose).
-</p>
-
-<h2>ℹ - SiSU Spine (sisudoc-spine)</h2>
-
-<p>
-SiSU Spine is the new generator for documents prepared in sisu markup, written
-in D as opposed to the original sisu which was first shared in Ruby.
-</p>
+<h2>ℹ - Spine vs. the original sisu</h2>
<p>
-sisudoc spine code was shared publicly under the AGPLv3 2024-05-01 (after
-considerable procrastination). (It should be fairly straightforward to have this
-work on other OS platforms, I have only used linux since 1999.)
-</p>
-
-<p>
-As compared with the original sisu generator sisu spine:
-</p>
-
-<p>
-- Spine uses the same document markup for the document body, but uses yaml for
-document headers (which contains document metadata and configuration details),
-the original sisu has a bespoke markup for headers.
-</p>
-
-<p>
-- Spine (written in D) is considerably faster at generating native output than
-sisu (written in Ruby), on last test at least 60 times faster (what took 1
-minute takes 1 second; 1 hour a minute :-) (admittedly some time ago, ruby has
-been getting faster, hopefully this is not over over promising).
-</p>
-
-<p>
-- Spine produces fewer document outputs types than sisu (html, epub, (odt,
-latex) and populates sql db for search)
-</p>
-
-<p>
-- As regards non-native output, so far Spine has greater separation of what it
-does and largely leaves calling the external program to the user, e.g.: latex
-output is a native output in the sense that it is generated directly by spine,
-but the pdfs that can be produced from these are produced through use of an
-external program xelatex, which produces fine output but is a very much slower
-process.
-</p>
-
-<p>
-- (where both produce the same output type, generally) Spine generally produces
-more up to date output format representations.
-</p>
-
-<h2>ℹ - Some Observations</h2>
-
-<p>
-SiSU is more suited to finalized/stratified/published writings (writings,
-articles, books), that are to remain and be referenced as published,
-representing a work or ideas, set at a given time. (As opposed to the
-increasingly prevalent and important forms of fluid text).
-</p>
-
-<p>
-Trained AI likely could assist in the preparation of documents (with SiSU
-markup), with resulting deterministic and reproducible outputs (for substantive
-document objects). Caveats: Where text objects may be in blocks (or not) there
-is some room for discretion and ambiguity in the markup with resulting
-possibility of differences in the resulting presentation of a document. Book
-indexes are another area that if desired is markup intensive and unless
-following an already published index, can be prepared differently and possibly
-improved over time, and for specialised collections on a subject area could
-potentially be prepared against a thesaurus.
-</p>
-
-<h2>ℹ - Thank You</h2>
-
-<p>
-Thanks to all who help produce and maintain the software and libraries I am able
-to use and have come to rely on. Reliable infrastructure so far.
+Spine (D) and the original sisu (Ruby) share the same lightweight body markup;
+spine moves the document header to YAML where the original uses a bespoke header
+dialect. Spine is roughly 60x faster on equivalent inputs (a one-minute Ruby run
+is about a one-second D run). Spine emits HTML, EPUB, LaTeX, ODT, plain text and
+the SQLite search database; PDF is delegated to an external xelatex pass (slower
+but produces excellent output). For output formats both produce, spine's
+representations are generally more up to date. Spine was released publicly under
+AGPLv3 on 2024-05-01.
</p>
<hr>
-<p class="tiny"><i>
-ralph.amissah www since 1993 ;-)
-</i></p>
-<hr>
-<h2>Some external links of interest</h2>
+<!-- Below the fold: long-form material wrapped in <details> so the
+ homepage does not have to render or scroll past it on first
+ paint. The content is unchanged, just moved. -->
+
+<details>
+ <summary><b>ℹ - A longer description (design and intent)</b></summary>
+
+ <p>
+ <b>Summary.</b> An object is a unit of text within a document, the most common
+ being a paragraph. Objects include individual headings, paragraphs, tables,
+ and grouped text of various types such as code blocks and (within poems)
+ verse. Objects have properties and attributes; of particular significance are
+ headings and their levels, which provide document structure. A heading is an
+ object with a hierarchical value that conceptually contains other objects
+ (such as paragraphs and possibly sub-headings). Objects are tracked
+ sequentially as they relate to each other within a document, and substantive
+ objects are numbered sequentially for citation purposes. Notably, footnotes
+ are not objects in themselves - they belong to the object from which they are
+ referenced, and follow their own numbering sequence. From heading objects,
+ linked tables of content may be generated; and if additional metadata is
+ provided, book-style indexes can be generated that link back to the objects to
+ which they relate.
+ </p>
+
+ <p>
+ <b>Object-centricity.</b> In SiSU, objects are the fundamental unit from which
+ larger constructs and the document itself are built. Breaking the document
+ into objects provides interesting possibilities.
+ </p>
+
+ <p>
+ <b>Objects are fundamental building blocks.</b> Objects are usually blocks of
+ text - paragraphs, headings, tables, grouped text of various types including
+ code blocks and verse - and may also be, for example, images. Objects can be
+ formatted and placed as needed, enabling multiple types of representation
+ across disparate formats and text receptacles: HTML, EPUB, LaTeX, (in the
+ past, mind-maps) and SQL (populated at object level, so that search has that
+ degree of granularity).
+ </p>
+
+ <p>
+ <b>Sequence.</b> Objects have sequence - this follows authorship and is part
+ of how a document conveys meaning.
+ </p>
+
+ <p>
+ <b>Object numbers and citation.</b> Substantive objects are numbered
+ sequentially and can be referenced for citation purposes. Object numbers
+ locate text precisely across different document formats and different
+ languages (assuming the document has been translated). For search, they
+ identify precisely where within each document the search criteria are met - in
+ the form of an index, or by surfacing the matching text objects so a reader
+ can decide which documents are of interest before opening them. Object
+ numbering also frees the representation of each format to be whatever is most
+ suitable to that format, while structural and citation integrity are retained.
+ </p>
+
+ <p>
+ <b>Characteristics.</b> Objects have properties (the fundamental type:
+ heading, paragraph, table, verse, etc.) and may carry attributes (e.g.
+ indentation, language, programming language for a code block).
+ </p>
+
+ <p>
+ <b>Document structure.</b> Headings hold the document's structure through
+ their heading-level property. Headings are individual objects like any other,
+ with the additional properties that (i) they may be regarded as containing the
+ other objects following them sequentially (until the next heading of similar
+ or higher level), and (ii) they have a hierarchy, the root being the document
+ title. To give greater flexibility across output formats, headings have two
+ sets of levels: the level under which substantive text occurs (chapter or
+ segment), and above that, optional document section separators (book, section,
+ part).
+ </p>
+
+ <p>
+ <b>Non-objects.</b> Footnotes are not objects in themselves; they belong to
+ the referencing object and follow their own numbering sequence. Tables of
+ content may be generated from heading objects; book-style indexes may be
+ generated when the required metadata is provided.
+ </p>
+
+ <p>
+ <b>The document header.</b> A SiSU document has a header carrying document
+ metadata - at a minimum, title and author. The header may also carry markup
+ instructions (e.g. how to identify headings within the document, so that those
+ headings do not need to be inferred).
+ </p>
+
+</details>
+
+<details>
+ <summary><b>ℹ - Historical description (original sisu)</b></summary>
+
+ <p>
+ With minimal preparation of a plain-text (UTF-8) file using SiSU markup syntax
+ in your text editor of choice, SiSU can generate various document formats,
+ most of which share a common object numbering system for locating content -
+ plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODT), LaTeX, PDF - and
+ populate an SQL database with objects (roughly paragraph-sized chunks) so
+ searches may be performed and matches returned with that degree of
+ granularity. Think of being able to finely match text across different output
+ formats (same object identifier for PDF, EPUB or HTML) and across languages
+ where translations exist (same object identifier across languages). For
+ search, your criteria are met by these documents at these locations within
+ each document (equally relevant across different output formats and
+ languages). Page numbers provide none of this functionality. Object numbering
+ is particularly suitable for "published" works (finalised texts as opposed to
+ works that are frequently changed or updated), for which it provides a fixed
+ means of reference of content. Document outputs can also share provided
+ semantic metadata.
+ </p>
+
+ <p>
+ SiSU is less about document layout than about finding a way, using little
+ markup, to construct an abstract representation of a document that makes it
+ possible to produce multiple representations - which may be rather different
+ from each other and used for different purposes - whether layout and
+ publishing, scrollworthy online viewing, or content search. The aim is to take
+ advantage, from a minimal-preparation starting point, of some of the strengths
+ of rather different established ways of representing documents for different
+ purposes: search (relational database, or indexed flat files of complete
+ documents or files made up of objects), online or electronic viewing (HTML,
+ XML, EPUB), or paper publication (PDF via LaTeX).
+ </p>
+
+ <p>
+ The solution arrived at is to extract structural information about the
+ document (sections and headings, available through pattern matching or markup)
+ and to track objects (defined units of text such as paragraphs, headings,
+ tables, verse, etc., but also images), which can then be reconstituted as the
+ same document with relevant object identification numbers - so text (objects)
+ can be referenced across different output formats and presentations.
+ </p>
+
+ <p>
+ SiSU generates tables of content and, through its markup, the means for
+ metadata to be provided for the generation of book-style indexes for a
+ document (that, again, due to document object numbers, are the same and
+ equally relevant across all output formats). Per-document
+ classifying/organizing metadata can also be provided for automated document
+ curation.
+ </p>
+
+ <p>
+ There have also been working experiments with SiSU-markup source: two-way
+ conversion/representation in mind-mapping software (kdissert / semantik, for
+ its strong focus on producing documents); and po4a (for translators) has been
+ used successfully in its regular text mode for SiSU markup in translation -
+ which is more an attribute of po4a than of SiSU, but of interest due to
+ SiSU/spine's object citation numbering being available across translations.
+ ODT has been an output, but much more interesting (and requested by potential
+ users) would be the ability of a word processor to save text in SiSU markup,
+ making alternative document processing and presentations with SiSU possible.
+ </p>
+
+ <p>
+ Also worth mention: in the relatively long history of this project there has
+ been work on extracting hash representations of each object that could
+ hypothetically be shared to prove the content of a document without sharing
+ its content, or to identify which objects change. These hashes can also be
+ used as unique identifiers in a database, or as filenames if individual
+ objects are saved.
+ </p>
+
+</details>
+
+<details>
+ <summary><b>ℹ - From a 2004 evaluation (IBM Software
+ Innovations)</b></summary>
+
+ <p>
+ SiSU has evolved; the current implementation focuses on one primary use-case,
+ books and literary writings. The concept, however, has wider application. The
+ following is a souvenir from an encounter with an IBM software evaluator in
+ London in June 2004, set up after a chance meeting with an IBM manager at a
+ Linux Expo who was curious about my interest in GNU/Linux given my legal
+ background - on hearing that I also wrote software, he suggested IBM should
+ have a look. The evaluator's response after the meeting:
+ </p>
+
+ <p>
+ "Ralph<br>
+ Good to meet with you today, I was very impressed with your software.<br>
+ <i>[colleague's name (also posted to an IBM colleague)]</i> - in summary -
+ Ralph has built an application that runs on linux and takes ASCII documents
+ and pulls them apart in to the smallest constituent parts, storing them as
+ XML, PDF and HTML; the HTML are hyperlinked up so the document can be browsed
+ in its full form. The format and text data created is stored in a
+ database. <br>This has potential in any place that needs the power of full
+ text search whilst holding the structural concepts of the document i.e. legal,
+ pharma, education, research.. which ones we need to figure out, ..."
+ </p>
+
+ <p>
+ Special interest was expressed in the search implications of SiSU. To
+ paraphrase: the company has document management systems dealing with hundreds
+ of thousands of texts; these tell you which documents match your search
+ criteria, but cannot inform you where within a text these matches were found
+ without opening the documents. SiSU addresses this by defining document
+ objects and making them the building block of the document - trackable objects
+ that can be placed back in the context of the document or corpus of documents
+ if part of a collection. SiSU's early design was to abstract documents to
+ their structure and identified objects, numbered in a citable way (as the
+ evaluator pointed out, document-object hashes can be of use for the purpose).
+ </p>
+
+</details>
+
+<details>
+ <summary><b>ℹ - Some observations</b></summary>
+
+ <p>
+ SiSU is more suited to finalised / stratified / published writings (articles,
+ books) that are to remain and be referenced as published - works set at a
+ given time. (As opposed to the increasingly prevalent and important forms of
+ fluid text.)
+ </p>
+
+ <p>
+ Trained AI could likely assist in the preparation of documents with SiSU
+ markup, with resulting deterministic and reproducible outputs (for substantive
+ document objects). Caveats: where text objects may be in blocks (or not),
+ there is some room for discretion and ambiguity in the markup, with resulting
+ possibility of differences in presentation. Book indexes are another
+ markup-intensive area; unless following an already published index, they can
+ be prepared differently and possibly improved over time, and for specialised
+ subject collections could potentially be prepared against a thesaurus.
+ </p>
+
+</details>
-<h3>Development</h3>
-<h4>Programming</h4>
-<p>
- [ <a href="https://dlang.org/">
- D - (dlang) general purpose, multi-paradigm, fast C like programming language
- </a> ]
- [ <a href="https://code.dlang.org/">
- dub - package registry
- </a> ]
- [ <a href="https://forum.dlang.org/group/general">
- community discussion (mail list frontend)
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://www.ruby-lang.org/en/">
- Ruby
- </a> ]
- [ <a href="https://rubygems.org/">
- Gems
- </a> ]<br>
- [ <a href="https://crystal-lang.org/">
- Crystal
- </a> ]<br>
-</p>
-<h4>SQL DB</h4>
-<p>
- [ <a href="https://sqlite.org/index.html">
- Sqlite - an sql database engine
- </a> ]<br>
- [ <a href="https://www.postgresql.org/">
- PostgreSQL
- </a> ]<br>
-</p>
-<h4>Markup</h4>
-<p>
- [ <a href="https://www.w3.org/html/">
- HTML
- </a> ]
- [ <a href="https://html.spec.whatwg.org/multipage/">
- multipage current spec
- </a> ]
- [ <a href="https://dom.spec.whatwg.org/">
- dom current spec
- </a> ]<br>
- [ <a href="https://www.w3.org/publishing/epub32/">
- Epub
- </a> ]<br>
- [ <a href="https://www.w3.org/Style/CSS/">
- css - cascading style sheets
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://opendocumentformat.org/">
- OpenDocument Format
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://www.latex-project.org/get/">
- LaTeX
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://po4a.org/index.php.en">
- po4a - maintain translations
- </a> ]<br>
-</p>
-<h4>Operating System Distributions</h4>
-<p>
- [ <a href="https://nixos.org/">
- NixOS - linux based operating system built on the Nix declarative, reproducible and reliable, build system
- </a> ]
- [ <a href="https://github.com/NixOS/nixpkgs">
- nixpkgs (packages @ github)
- </a> ]
- [ <a href="https://search.nixos.org/packages?channel=unstable&from=0&size=100&sort=relevance&query=">
- package search
- </a> ]
- [ <a href="https://discourse.nixos.org/">
- community discussion (discourse)
- </a> ]
- [ <a href="https://discourse.nixos.org/t/nixos-foundation-board-giving-power-to-the-community/44552/">
- NixOS Foundation board: Giving power to the community
- </a> ]<br>
<!--
- [ <a href="https://aux.computer/">
- Aux - aux.computer - a community fork of nix (under deliberation), billed as "An alternative to the Nix ecosystem"
- </a> ]
- [ <a href="https://forum.aux.computer/">
- community discussion (discourse)
- </a> ]<br>
--->
- Gnu [ <a href="https://guix.gnu.org/">
- Guix
- </a> ]
- [ <a href="https://guix.gnu.org/en/packages/">
- packages
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://debian.org/">
- Debian - the universal operating system distribution
- </a> ]<br>
- [ <a href="https://www.devuan.org/">
- Devuan
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://archlinux.org/">
- Arch Linux
- </a> ]
- [ <a href="https://wiki.archlinux.org/">
- Arch Wiki
- </a> ]<br>
-</p>
<hr>
-<h2>Extraneous (external) links of personal interest</h2>
-
-<h4>Workspace</h4>
-
-<h5>Shell</h5>
<p>
- [ <a href="https://www.zsh.org/">
- zsh
- </a> ]<br>
- [ <a href="https://starship.rs/">
- starship - customizable cross-shell prompt
- </a> ]<br>
-</p>
-<h5>Terminal</h5>
-<p>
- [ <a href="https://gnunn1.github.io/tilix-web/">
- tilix
- </a> ]
- [ <a href="https://alacritty.org/">
- alacritty
- </a> ]<br>
-</p>
-<h5>Terminal Multiplexer</h5>
-<p>
- [ <a href="https://github.com/tmux/tmux">
- tmux (github)
- </a> ]
- [ <a href="https://www.gnu.org/software/screen/">
- screen
- </a> ]<br>
-</p>
-<h5>Window Manager</h5>
-<p>
- [ <a href="https://i3wm.org/">
- i3wm
- </a> ]
- [ <a href="https://swaywm.org/">
- sway
- </a> ]<br>
-</p>
-<h5>Text Editors</h5>
-<p>
- Gnu Emacs
- [ <a href="https://github.com/hlissner/doom-emacs">
- Doom Emacs (github)
- </a> ]
- [ <a href="https://orgmode.org/">
- Org-Mode - your life in plain text & literate programming
- </a> ]
- [ <a href="https://github.com/emacs-evil/evil">
- Evil-Mode
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://www.vim.org/">
- Vim
- </a> ]
- [ <a href="https://neovim.io/">
- NeoVim
- </a> ]<br>
-</p>
-<h5>Source Control Manager</h5>
-<p>
- [ <a href="https://git-scm.com/">
- Git
- </a> ]<br>
-</p>
-<h5>Browsers</h5>
-<p>
- [ <a href="https://vieb.dev/">
- vieb
- </a> ]
- [ <a href="https://fanglingsu.github.io/vimb/">
- vimb
- </a> ]<br>
- [ <a href="https://brave.com/">
- brave
- </a> ]<br>
+<a href="./links.html">Personal-interest external links</a> (toolchain,
+distributions, editors, forges).
</p>
-<h3>Search</h3>
-<p>
- [ <a href="https://duckduckgo.com/">
- DuckDuckGo
- </a> ]
- [ <a href="https://yubnub.org/">
- YubNub
- </a> ]<br>
-</p>
-
-<h3>eMail</h3>
-<p>
- [ <a href="https://www.migadu.com/">
- Migadu
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://notmuchmail.org/">
- NotmuchMail
- </a> ]<br>
-</p>
-
-<h3>Forges</h3>
-<p>
- [ <a href="https://sourcehut.org/">
- Sourcehut
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://codeberg.org/">
- CodeBerg
- </a> ]<br>
-</p>
-<p>
- [ <a href="https://github.com">
- GitHub
- </a> ]
- [ <a href="https://gitlab.com">
- GitLab
- </a> ]<br>
-</p>
-
-<h3>Software Archives</h3>
-<p>
- [ <a href="https://www.softwareheritage.org/">
- Software Heritage - the universal software archive
- </a> ]<br>
-</p>
+--!>
<hr>
+
<p class="tiny"><i>
-ralph.amissah www since 1993 ;-)
+ralph.amissah - www since 1993 ;-)
</i></p>
</body>