aboutsummaryrefslogtreecommitdiffhomepage
path: root/spine-bespoke-output
diff options
context:
space:
mode:
authorRalph Amissah <ralph.amissah@gmail.com>2023-06-25 13:26:05 -0400
committerRalph Amissah <ralph.amissah@gmail.com>2023-06-25 13:26:05 -0400
commit2397b53655f3b91bf85aa56d2a8cb149d98f4b6b (patch)
treecc10d348bc7dabdd1e9f9c10b1c5c255103b6851 /spine-bespoke-output
parentnix flake, overlays from spine (diff)
bespoke html, homepage index.html
Diffstat (limited to 'spine-bespoke-output')
-rw-r--r--spine-bespoke-output/html/homepage.index.html539
1 files changed, 539 insertions, 0 deletions
diff --git a/spine-bespoke-output/html/homepage.index.html b/spine-bespoke-output/html/homepage.index.html
new file mode 100644
index 0000000..abf0a68
--- /dev/null
+++ b/spine-bespoke-output/html/homepage.index.html
@@ -0,0 +1,539 @@
+<!DOCTYPE html>
+<html>
+<head>
+ <meta http-equiv="Content-Type" content="text/plain; charset=UTF-8" />
+ <title>≅ SiSU project sisudoc.org</title>
+ <link href="./css/html_seg.css" rel="stylesheet" />
+</head>
+
+<body>
+
+<h1>≅ - SiSU for documents - structuring, publishing in multiple
+formats &amp; search</h1>
+
+<h2>ℹ - A short description</h2>
+
+<p>
+SiSU is an object-centric, lightweight markup based, document structuring,
+parser, publishing and search tool for document collections. It is command line
+oriented and generates static content that is also made searchable at an object
+level through an SQL database.
+</p>
+
+<p>
+
+SiSU markup helps define (delineate) text objects which are numbered
+sequentially by the program for object citation. Breaking the document into
+objects provides interesting possibilities. These object numbers provide the
+possibility of citing/locating text precisely across different document formats
+and different languages (assuming the document has been translated). For search
+it also makes it possible to identify precisely where within in each document
+search criteria is met in the form of an index. Additionally the use of objects
+(and that objects are numbered) frees the possibility to represent the document
+in the manner considered most suitable to a specific document format (whilst
+retaining its structural (and citation) integrity).
+
+</p>
+
+<h2>Δ - SiSU project source</h2>
+
+<p>
+ <a href="./projects">
+ Δ SiSU projects repo (git)
+ </a><br>
+ - <a href="https://git.sisudoc.org">
+ https://git.sisudoc.org
+ </a><br>
+</p>
+
+<p>
+ <a href="./projects/sisu">
+ Δ SiSU (scribe): document publishing (multiple formats + search)
+ </a><br>
+ - <a href="https://git.sisudoc.org/sisu">
+ https://git.sisudoc.org/sisu
+ </a><br>
+</p>
+
+<p>
+ <a href="./projects/sisu-markup">
+ Δ SiSU markup samples in document pods for sisu (scribe)
+ </a><br>
+ - <a href="https://git.sisudoc.org/sisu-markup">
+ https://git.sisudoc.org/sisu-markup
+ </a><br>
+</p>
+
+<h2>⌘ - SiSU Spine markup sample output</h2>
+
+<p>
+To give an idea of how this works here is a small collection of documents marked
+up for and generated by the software. The curation of topics for a collection of
+specialized related documents would benefit from a consistently applied bespoke
+ontology or thesaurus.<br> The documents presented are documents that have been
+released under various creative commons licences, in the public domain, or the
+author's work, with the exception of one that is under GPL and the old abandoned
+Debian live-manual
+</p>
+
+<p>
+ <a href="./authors.html">
+ ⌘ Authors
+ </a>
+ (software curated from provided document header metadata)<br>
+ - <a href="./authors.html">
+ https://sisudoc.org/spine/authors.html
+ </a>
+</p>
+
+<p>
+ <a href="./topics.html">
+ ⌘ Topics
+ </a>
+ (software curated from provided document header metadata)<br>
+ - <a href="./topics.html">
+ https://sisudoc.org/spine/topics.html
+ </a>
+</p>
+
+<h2>፨ - SiSU Spine search</h2>
+<p>
+ <a href="./spine_search">
+ ፨ Search
+ </a>
+ (granular search of text objects)<br>
+ - <a href="https://sisudoc.org/spine_search">
+ https://sisudoc.org/spine_search
+ </a>
+</p>
+
+<div class="p">
+ <!-- SiSU Spine Search -->
+ <form action="https://sisudoc.org/spine_search" target="_top" method="POST" accept-charset="UTF-8" id="search">
+ <input type="text" name="sf" size="24" maxlength="255">
+ <input type="hidden" name="db" value="spine.search.db">
+ <input type="hidden" name="sml" value="1000">
+ <input type="hidden" name="ec" value="on">
+ <input type="hidden" name="url" value="on">
+ <button type="submit" form="search">&nbsp;㏈&nbsp;፨&nbsp;</button>
+ </form>
+ <!-- SiSU Spine Search -->
+</div>
+
+<h2>ℹ - SiSU description</h2>
+
+<p>
+Here is a description that has been used for the original sisu (scribe):
+</p>
+
+<p>
+With minimal preparation of a plain-text (UTF-8) file, using sisu markup syntax
+in your text editor of choice, SiSU can generate various document formats, most
+of which share a common object numbering system for locating content, including
+plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODF:ODT), LaTeX, PDF
+files, and populate an SQL database with objects (roughly paragraph-sized
+chunks) so searches may be performed and matches returned with that degree of
+granularity. Think of being able to finely match text in documents, using common
+object numbers, across different output formats (same object identifier for pdf,
+epub or html) and across languages if you have translations of the same document
+(same object identifier across languages). For search, your criteria is met by
+these documents at these locations within each document (equally relevant across
+different output formats and languages). To be clear (if obvious) page numbers
+provide none of this functionality. Object numbering is particularly suitable
+for "published" works (finalized texts as opposed to works that are frequently
+changed or updated) for which it provides a fixed means of reference of content.
+Document outputs can also share provided semantic meta-data.
+</p>
+
+<h3>...</h3>
+
+<p>
+SiSU is less about document layout than it is about finding a way using little
+markup to construct an abstract representation of a document that makes it
+possible to produce multiple representations of it which may be rather different
+from each other and used for different purposes, whether layout and publishing,
+scrollworthy online viewing/ reading, or content search. To be able to take
+advantage from its minimal preparation starting point of some of the strengths
+of rather different established ways of representing documents for different
+purposes, whether for search (relational database, or indexed flat files
+generated for that purpose whether of complete documents, or say of files made
+up of objects), online or other electronic viewing (e.g. html, xml, epub), or
+paper publication (e.g. pdf via latex)...
+</p>
+
+<p>
+The solution arrived at is to extract structural information about the document
+(document sections and headings within the document, available through pattern
+matching or markup) and tracking objects (which primarily are defined units of
+text such as paragraphs, headings, tables, verse, etc. but also images) which
+can be reconstituted as the same documents with relevant object identification
+numbers so text (objects) can be referenced across different output formats and
+presentations.
+</p>
+
+<p>
+SiSU generates tables of content, and through its markup the means for metadata
+to be provided for the generation of book style indexes for a document (that
+again due to document object numbers are the same and equally relevant across
+all document formats). Per document classifying/organizing metadata can also be
+provided for automated document curation.
+</p>
+
+<p>
+... there have also been working experiments with sisu markup source, two way
+conversion/representation of sisu document markup source in mind-mapping
+(software kdissert was used for its strong focus on producing documents (now
+apparently called semantik)); also po4a software for translators has been used
+successfuly in its regular text mode for sisu markup in translation, (which is
+more an attribute of po4a than of sisu, but) which is of interest due to
+sisu/spine's object citation numbering being available across translations. Open
+Document Format text (odf:odt), has been an output, but much more interesting
+(and requested by potential users of sisu/spine) would be the ability of a word
+processor to save text/a document in sisu markup, making alternative document
+processing and presentations with sisu possible.
+</p>
+
+<p>
+also worth mention, in the relatively long history of this project, there has
+been work done on extracting hash representations of each object, that could
+hypothetically be shared to prove the content of a document without sharing its
+content, or of identifying which objects change; these hashes can also be used
+as unique identifiers in a database or as identifying filenames if individual
+objects are saved.
+</p>
+
+<p>
+SiSU has evolved, the current implementation focuses on one primary use-case,
+books and literary writings. However the concept on which it is based has wider
+application. Here is a prevously posted souvenir from my encounter with an IBM
+software evaluator in London June 2004 that came about through a chance
+encounter with an IBM manager at a Linux Expo, who was curious about my interest
+in Gnu/Linux with my legal background... on hearing that I also wrote software,
+he suggested, maybe IBM should have a look at it. I was interested, the meeting
+was set up... with an IBM, Software Innovations evaluator<br>His response after
+the meeting:
+</p>
+
+<p>
+"Ralph<br>Good to meet with you today, I was very impressed with your
+software.<br><i>[colleague's name (also posted to an IBM colleague)]</i> - in
+summary - Ralph has built an application that runs on linux and takes ASCII
+documents and pulls them apart in to the smallest constituent parts, storing
+them as XML, PDF and HTML, the HTML are hyperlinked up so the document can be
+browsed in its full form. the format and text data created is stored in a
+database.<br>This has potential in any place that needs the power of full text
+search whilst holding the structural concepts of the document i.e. legal,
+pharma, education, research.. which ones we need to figure out, ..."
+</p>
+
+<p>
+Special interest was expressed in the search implications of SiSU. To
+paraphrase, the company has document management systems dealing with hundreds of
+thousands of texts, these tell you which documents match your search criteria,
+but cannot inform you where within a text these matches were found without
+opening the documents. This is achieved through defining document objects and
+making them the building block of the document, trackable document objects (that
+can be placed back in the context of the document or corpus of documents if part
+of a collection). SiSU's early design was to - abstract documents to their
+structure, and identified objects, numbered in a citable way (as pointed out
+document object hashes can be of use for the purpose).
+</p>
+
+<h2>ℹ - SiSU Spine</h2>
+
+<p>
+SiSU Spine is the new generator for documents prepared in sisu markup, written
+in D as opposed to the original sisu which was first shared in Ruby.
+</p>
+
+<p>
+Spine code has not as yet been made publicly available.
+</p>
+
+<p>
+As compared with the original sisu generator sisu spine:
+</p>
+
+<p>
+- Spine uses the same document markup for the document body, but uses yaml for
+document headers (which contains document metadata and configuration details),
+the original sisu has a bespoke markup for headers.
+</p>
+
+<p>
+- Spine (written in D) is considerably faster at generating native output than
+sisu (written in Ruby), on last test at least 60 times faster (what took 1
+minute takes 1 second; 1 hour a minute :-) (admittedly some time ago, ruby has
+been getting faster, hopefully this is not over over promising).
+</p>
+
+<p>
+- Spine produces fewer document outputs types than sisu (html, epub, (odt,
+latex) and populates sql db for search)
+</p>
+
+<p>
+- As regards non-native output, so far Spine has greater separation of what it
+does and largely leaves calling the external program to the user, e.g.: latex
+output is a native output in the sense that it is generated directly by spine,
+but the pdfs that can be produced from these are produced through use of an
+external program xelatex, which produces fine output but is a very much slower
+process.
+</p>
+
+<p>
+- (where both produce the same output type, generally) Spine generally produces
+more up to date output format representations.
+</p>
+
+<hr>
+<p class="tiny"><i>
+ralph.amissah www since 1993 ;-)
+</i></p>
+
+<hr>
+<h2>Some external links of interest</h2>
+
+<h3>Development</h3>
+<h4>Programming</h4>
+<p>
+ [ <a href="https://dlang.org/">
+ D - (dlang) general purpose, multi-paradigm, fast C like programming language
+ </a> ]
+ [ <a href="https://code.dlang.org/">
+ dub - package registry
+ </a> ]
+ [ <a href="https://forum.dlang.org/group/general">
+ community discussion (mail list frontend)
+ </a> ]<br>
+</p>
+<p>
+ [ <a href="https://www.ruby-lang.org/en/">
+ Ruby
+ </a> ]
+ [ <a href="https://rubygems.org/">
+ Gems
+ </a> ]<br>
+ [ <a href="https://crystal-lang.org/">
+ Crystal
+ </a> ]<br>
+</p>
+<h4>SQL DB</h4>
+<p>
+ [ <a href="https://sqlite.org/index.html">
+ Sqlite - an sql database engine
+ </a> ]<br>
+ [ <a href="https://www.postgresql.org/">
+ PostgreSQL
+ </a> ]<br>
+</p>
+<h4>Markup</h4>
+<p>
+ [ <a href="https://www.w3.org/html/">
+ HTML
+ </a> ]
+ [ <a href="https://html.spec.whatwg.org/multipage/">
+ multipage current spec
+ </a> ]
+ [ <a href="https://dom.spec.whatwg.org/">
+ dom current spec
+ </a> ]<br>
+ [ <a href="https://www.w3.org/publishing/epub32/">
+ Epub
+ </a> ]<br>
+ [ <a href="https://www.w3.org/Style/CSS/">
+ css - cascading style sheets
+ </a> ]<br>
+</p>
+<p>
+ [ <a href="https://opendocumentformat.org/">
+ OpenDocument Format
+ </a> ]<br>
+</p>
+<p>
+ [ <a href="https://www.latex-project.org/get/">
+ LaTeX
+ </a> ]<br>
+</p>
+<p>
+ [ <a href="https://po4a.org/index.php.en">
+ po4a - maintain translations
+ </a> ]<br>
+</p>
+<h4>Operating System Distributions</h4>
+<p>
+ [ <a href="https://nixos.org/">
+ NixOS - linux based operating system built on the Nix declarative, reproducible and reliable, build system
+ </a> ]
+ [ <a href="https://github.com/NixOS/nixpkgs">
+ nixpkgs (packages @ github)
+ </a> ]
+ [ <a href="https://search.nixos.org/packages?channel=unstable&from=0&size=100&sort=relevance&query=">
+ package search
+ </a> ]
+ [ <a href="https://discourse.nixos.org/">
+ community discussion (discourse)
+ </a> ]<br>
+ Gnu [ <a href="https://guix.gnu.org/">
+ Guix
+ </a> ]
+ [ <a href="https://guix.gnu.org/en/packages/">
+ packages
+ </a> ]
+ <br>
+</p>
+<p>
+ [ <a href="https://debian.org/">
+ Debian - the universal operating system distribution
+ </a> ]<br>
+ [ <a href="https://www.devuan.org/">
+ Devuan
+ </a> ]<br>
+</p>
+<p>
+ [ <a href="https://archlinux.org/">
+ Arch Linux
+ </a> ]
+ [ <a href="https://wiki.archlinux.org/">
+ Arch Wiki
+ </a> ]<br>
+</p>
+
+<hr>
+
+<h2>Extraneous (external) links of personal interest</h2>
+
+<h4>Workspace</h4>
+
+<h5>Shell</h5>
+<p>
+ [ <a href="https://www.zsh.org/">
+ zsh
+ </a> ]<br>
+ [ <a href="https://starship.rs/">
+ starship - customizable cross-shell prompt
+ </a> ]<br>
+</p>
+<h5>Terminal</h5>
+<p>
+ [ <a href="https://gnunn1.github.io/tilix-web/">
+ tilix
+ </a> ]
+ [ <a href="https://alacritty.org/">
+ alacritty
+ </a> ]<br>
+</p>
+<h5>Terminal Multiplexer</h5>
+<p>
+ [ <a href="https://github.com/tmux/tmux">
+ tmux (github)
+ </a> ]
+ [ <a href="https://www.gnu.org/software/screen/">
+ screen
+ </a> ]<br>
+</p>
+<h5>Window Manager</h5>
+<p>
+ [ <a href="https://i3wm.org/">
+ i3wm
+ </a> ]
+ [ <a href="https://swaywm.org/">
+ sway
+ </a> ]<br>
+</p>
+<h5>Text Editors</h5>
+<p>
+ Gnu Emacs
+ [ <a href="https://github.com/hlissner/doom-emacs">
+ Doom Emacs (github)
+ </a> ]
+ [ <a href="https://orgmode.org/">
+ Org-Mode - your life in plain text & literate programming
+ </a> ]
+ [ <a href="https://github.com/emacs-evil/evil">
+ Evil-Mode
+ </a> ]<br>
+</p>
+<p>
+ [ <a href="https://www.vim.org/">
+ Vim
+ </a> ]
+ [ <a href="https://neovim.io/">
+ NeoVim
+ </a> ]<br>
+</p>
+<h5>Source Control Manager</h5>
+<p>
+ [ <a href="https://git-scm.com/">
+ Git
+ </a> ]<br>
+</p>
+<h5>Browsers</h5>
+<p>
+ [ <a href="https://vieb.dev/">
+ vieb
+ </a> ]
+ [ <a href="https://fanglingsu.github.io/vimb/">
+ vimb
+ </a> ]<br>
+ [ <a href="https://brave.com/">
+ brave
+ </a> ]<br>
+</p>
+
+<h3>Search</h3>
+<p>
+ [ <a href="https://duckduckgo.com/">
+ DuckDuckGo
+ </a> ]
+ [ <a href="https://yubnub.org/">
+ YubNub
+ </a> ]<br>
+</p>
+
+<h3>eMail</h3>
+<p>
+ [ <a href="https://www.migadu.com/">
+ Migadu
+ </a> ]<br>
+</p>
+<p>
+ [ <a href="https://notmuchmail.org/">
+ NotmuchMail
+ </a> ]<br>
+</p>
+
+<h3>Forges</h3>
+<p>
+ [ <a href="https://sourcehut.org/">
+ Sourcehut
+ </a> ]<br>
+</p>
+<p>
+ [ <a href="https://codeberg.org/">
+ CodeBerg
+ </a> ]<br>
+</p>
+<p>
+ [ <a href="https://github.com">
+ GitHub
+ </a> ]
+ [ <a href="https://gitlab.com">
+ GitLab
+ </a> ]<br>
+</p>
+
+<h3>Software Archives</h3>
+<p>
+ [ <a href="https://www.softwareheritage.org/">
+ Software Heritage - the universal software archive
+ </a> ]<br>
+</p>
+
+<hr>
+<p class="tiny"><i>
+ralph.amissah www since 1993 ;-)
+</i></p>
+
+</body>
+</html>