diff options
Diffstat (limited to 'spine-bespoke-output/html/homepage.index.html')
-rw-r--r-- | spine-bespoke-output/html/homepage.index.html | 539 |
1 files changed, 539 insertions, 0 deletions
diff --git a/spine-bespoke-output/html/homepage.index.html b/spine-bespoke-output/html/homepage.index.html new file mode 100644 index 0000000..abf0a68 --- /dev/null +++ b/spine-bespoke-output/html/homepage.index.html @@ -0,0 +1,539 @@ +<!DOCTYPE html> +<html> +<head> + <meta http-equiv="Content-Type" content="text/plain; charset=UTF-8" /> + <title>≅ SiSU project sisudoc.org</title> + <link href="./css/html_seg.css" rel="stylesheet" /> +</head> + +<body> + +<h1>≅ - SiSU for documents - structuring, publishing in multiple +formats & search</h1> + +<h2>ℹ - A short description</h2> + +<p> +SiSU is an object-centric, lightweight markup based, document structuring, +parser, publishing and search tool for document collections. It is command line +oriented and generates static content that is also made searchable at an object +level through an SQL database. +</p> + +<p> + +SiSU markup helps define (delineate) text objects which are numbered +sequentially by the program for object citation. Breaking the document into +objects provides interesting possibilities. These object numbers provide the +possibility of citing/locating text precisely across different document formats +and different languages (assuming the document has been translated). For search +it also makes it possible to identify precisely where within in each document +search criteria is met in the form of an index. Additionally the use of objects +(and that objects are numbered) frees the possibility to represent the document +in the manner considered most suitable to a specific document format (whilst +retaining its structural (and citation) integrity). + +</p> + +<h2>Δ - SiSU project source</h2> + +<p> + <a href="./projects"> + Δ SiSU projects repo (git) + </a><br> + - <a href="https://git.sisudoc.org"> + https://git.sisudoc.org + </a><br> +</p> + +<p> + <a href="./projects/sisu"> + Δ SiSU (scribe): document publishing (multiple formats + search) + </a><br> + - <a href="https://git.sisudoc.org/sisu"> + https://git.sisudoc.org/sisu + </a><br> +</p> + +<p> + <a href="./projects/sisu-markup"> + Δ SiSU markup samples in document pods for sisu (scribe) + </a><br> + - <a href="https://git.sisudoc.org/sisu-markup"> + https://git.sisudoc.org/sisu-markup + </a><br> +</p> + +<h2>⌘ - SiSU Spine markup sample output</h2> + +<p> +To give an idea of how this works here is a small collection of documents marked +up for and generated by the software. The curation of topics for a collection of +specialized related documents would benefit from a consistently applied bespoke +ontology or thesaurus.<br> The documents presented are documents that have been +released under various creative commons licences, in the public domain, or the +author's work, with the exception of one that is under GPL and the old abandoned +Debian live-manual +</p> + +<p> + <a href="./authors.html"> + ⌘ Authors + </a> + (software curated from provided document header metadata)<br> + - <a href="./authors.html"> + https://sisudoc.org/spine/authors.html + </a> +</p> + +<p> + <a href="./topics.html"> + ⌘ Topics + </a> + (software curated from provided document header metadata)<br> + - <a href="./topics.html"> + https://sisudoc.org/spine/topics.html + </a> +</p> + +<h2>፨ - SiSU Spine search</h2> +<p> + <a href="./spine_search"> + ፨ Search + </a> + (granular search of text objects)<br> + - <a href="https://sisudoc.org/spine_search"> + https://sisudoc.org/spine_search + </a> +</p> + +<div class="p"> + <!-- SiSU Spine Search --> + <form action="https://sisudoc.org/spine_search" target="_top" method="POST" accept-charset="UTF-8" id="search"> + <input type="text" name="sf" size="24" maxlength="255"> + <input type="hidden" name="db" value="spine.search.db"> + <input type="hidden" name="sml" value="1000"> + <input type="hidden" name="ec" value="on"> + <input type="hidden" name="url" value="on"> + <button type="submit" form="search"> ㏈ ፨ </button> + </form> + <!-- SiSU Spine Search --> +</div> + +<h2>ℹ - SiSU description</h2> + +<p> +Here is a description that has been used for the original sisu (scribe): +</p> + +<p> +With minimal preparation of a plain-text (UTF-8) file, using sisu markup syntax +in your text editor of choice, SiSU can generate various document formats, most +of which share a common object numbering system for locating content, including +plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODF:ODT), LaTeX, PDF +files, and populate an SQL database with objects (roughly paragraph-sized +chunks) so searches may be performed and matches returned with that degree of +granularity. Think of being able to finely match text in documents, using common +object numbers, across different output formats (same object identifier for pdf, +epub or html) and across languages if you have translations of the same document +(same object identifier across languages). For search, your criteria is met by +these documents at these locations within each document (equally relevant across +different output formats and languages). To be clear (if obvious) page numbers +provide none of this functionality. Object numbering is particularly suitable +for "published" works (finalized texts as opposed to works that are frequently +changed or updated) for which it provides a fixed means of reference of content. +Document outputs can also share provided semantic meta-data. +</p> + +<h3>...</h3> + +<p> +SiSU is less about document layout than it is about finding a way using little +markup to construct an abstract representation of a document that makes it +possible to produce multiple representations of it which may be rather different +from each other and used for different purposes, whether layout and publishing, +scrollworthy online viewing/ reading, or content search. To be able to take +advantage from its minimal preparation starting point of some of the strengths +of rather different established ways of representing documents for different +purposes, whether for search (relational database, or indexed flat files +generated for that purpose whether of complete documents, or say of files made +up of objects), online or other electronic viewing (e.g. html, xml, epub), or +paper publication (e.g. pdf via latex)... +</p> + +<p> +The solution arrived at is to extract structural information about the document +(document sections and headings within the document, available through pattern +matching or markup) and tracking objects (which primarily are defined units of +text such as paragraphs, headings, tables, verse, etc. but also images) which +can be reconstituted as the same documents with relevant object identification +numbers so text (objects) can be referenced across different output formats and +presentations. +</p> + +<p> +SiSU generates tables of content, and through its markup the means for metadata +to be provided for the generation of book style indexes for a document (that +again due to document object numbers are the same and equally relevant across +all document formats). Per document classifying/organizing metadata can also be +provided for automated document curation. +</p> + +<p> +... there have also been working experiments with sisu markup source, two way +conversion/representation of sisu document markup source in mind-mapping +(software kdissert was used for its strong focus on producing documents (now +apparently called semantik)); also po4a software for translators has been used +successfuly in its regular text mode for sisu markup in translation, (which is +more an attribute of po4a than of sisu, but) which is of interest due to +sisu/spine's object citation numbering being available across translations. Open +Document Format text (odf:odt), has been an output, but much more interesting +(and requested by potential users of sisu/spine) would be the ability of a word +processor to save text/a document in sisu markup, making alternative document +processing and presentations with sisu possible. +</p> + +<p> +also worth mention, in the relatively long history of this project, there has +been work done on extracting hash representations of each object, that could +hypothetically be shared to prove the content of a document without sharing its +content, or of identifying which objects change; these hashes can also be used +as unique identifiers in a database or as identifying filenames if individual +objects are saved. +</p> + +<p> +SiSU has evolved, the current implementation focuses on one primary use-case, +books and literary writings. However the concept on which it is based has wider +application. Here is a prevously posted souvenir from my encounter with an IBM +software evaluator in London June 2004 that came about through a chance +encounter with an IBM manager at a Linux Expo, who was curious about my interest +in Gnu/Linux with my legal background... on hearing that I also wrote software, +he suggested, maybe IBM should have a look at it. I was interested, the meeting +was set up... with an IBM, Software Innovations evaluator<br>His response after +the meeting: +</p> + +<p> +"Ralph<br>Good to meet with you today, I was very impressed with your +software.<br><i>[colleague's name (also posted to an IBM colleague)]</i> - in +summary - Ralph has built an application that runs on linux and takes ASCII +documents and pulls them apart in to the smallest constituent parts, storing +them as XML, PDF and HTML, the HTML are hyperlinked up so the document can be +browsed in its full form. the format and text data created is stored in a +database.<br>This has potential in any place that needs the power of full text +search whilst holding the structural concepts of the document i.e. legal, +pharma, education, research.. which ones we need to figure out, ..." +</p> + +<p> +Special interest was expressed in the search implications of SiSU. To +paraphrase, the company has document management systems dealing with hundreds of +thousands of texts, these tell you which documents match your search criteria, +but cannot inform you where within a text these matches were found without +opening the documents. This is achieved through defining document objects and +making them the building block of the document, trackable document objects (that +can be placed back in the context of the document or corpus of documents if part +of a collection). SiSU's early design was to - abstract documents to their +structure, and identified objects, numbered in a citable way (as pointed out +document object hashes can be of use for the purpose). +</p> + +<h2>ℹ - SiSU Spine</h2> + +<p> +SiSU Spine is the new generator for documents prepared in sisu markup, written +in D as opposed to the original sisu which was first shared in Ruby. +</p> + +<p> +Spine code has not as yet been made publicly available. +</p> + +<p> +As compared with the original sisu generator sisu spine: +</p> + +<p> +- Spine uses the same document markup for the document body, but uses yaml for +document headers (which contains document metadata and configuration details), +the original sisu has a bespoke markup for headers. +</p> + +<p> +- Spine (written in D) is considerably faster at generating native output than +sisu (written in Ruby), on last test at least 60 times faster (what took 1 +minute takes 1 second; 1 hour a minute :-) (admittedly some time ago, ruby has +been getting faster, hopefully this is not over over promising). +</p> + +<p> +- Spine produces fewer document outputs types than sisu (html, epub, (odt, +latex) and populates sql db for search) +</p> + +<p> +- As regards non-native output, so far Spine has greater separation of what it +does and largely leaves calling the external program to the user, e.g.: latex +output is a native output in the sense that it is generated directly by spine, +but the pdfs that can be produced from these are produced through use of an +external program xelatex, which produces fine output but is a very much slower +process. +</p> + +<p> +- (where both produce the same output type, generally) Spine generally produces +more up to date output format representations. +</p> + +<hr> +<p class="tiny"><i> +ralph.amissah www since 1993 ;-) +</i></p> + +<hr> +<h2>Some external links of interest</h2> + +<h3>Development</h3> +<h4>Programming</h4> +<p> + [ <a href="https://dlang.org/"> + D - (dlang) general purpose, multi-paradigm, fast C like programming language + </a> ] + [ <a href="https://code.dlang.org/"> + dub - package registry + </a> ] + [ <a href="https://forum.dlang.org/group/general"> + community discussion (mail list frontend) + </a> ]<br> +</p> +<p> + [ <a href="https://www.ruby-lang.org/en/"> + Ruby + </a> ] + [ <a href="https://rubygems.org/"> + Gems + </a> ]<br> + [ <a href="https://crystal-lang.org/"> + Crystal + </a> ]<br> +</p> +<h4>SQL DB</h4> +<p> + [ <a href="https://sqlite.org/index.html"> + Sqlite - an sql database engine + </a> ]<br> + [ <a href="https://www.postgresql.org/"> + PostgreSQL + </a> ]<br> +</p> +<h4>Markup</h4> +<p> + [ <a href="https://www.w3.org/html/"> + HTML + </a> ] + [ <a href="https://html.spec.whatwg.org/multipage/"> + multipage current spec + </a> ] + [ <a href="https://dom.spec.whatwg.org/"> + dom current spec + </a> ]<br> + [ <a href="https://www.w3.org/publishing/epub32/"> + Epub + </a> ]<br> + [ <a href="https://www.w3.org/Style/CSS/"> + css - cascading style sheets + </a> ]<br> +</p> +<p> + [ <a href="https://opendocumentformat.org/"> + OpenDocument Format + </a> ]<br> +</p> +<p> + [ <a href="https://www.latex-project.org/get/"> + LaTeX + </a> ]<br> +</p> +<p> + [ <a href="https://po4a.org/index.php.en"> + po4a - maintain translations + </a> ]<br> +</p> +<h4>Operating System Distributions</h4> +<p> + [ <a href="https://nixos.org/"> + NixOS - linux based operating system built on the Nix declarative, reproducible and reliable, build system + </a> ] + [ <a href="https://github.com/NixOS/nixpkgs"> + nixpkgs (packages @ github) + </a> ] + [ <a href="https://search.nixos.org/packages?channel=unstable&from=0&size=100&sort=relevance&query="> + package search + </a> ] + [ <a href="https://discourse.nixos.org/"> + community discussion (discourse) + </a> ]<br> + Gnu [ <a href="https://guix.gnu.org/"> + Guix + </a> ] + [ <a href="https://guix.gnu.org/en/packages/"> + packages + </a> ] + <br> +</p> +<p> + [ <a href="https://debian.org/"> + Debian - the universal operating system distribution + </a> ]<br> + [ <a href="https://www.devuan.org/"> + Devuan + </a> ]<br> +</p> +<p> + [ <a href="https://archlinux.org/"> + Arch Linux + </a> ] + [ <a href="https://wiki.archlinux.org/"> + Arch Wiki + </a> ]<br> +</p> + +<hr> + +<h2>Extraneous (external) links of personal interest</h2> + +<h4>Workspace</h4> + +<h5>Shell</h5> +<p> + [ <a href="https://www.zsh.org/"> + zsh + </a> ]<br> + [ <a href="https://starship.rs/"> + starship - customizable cross-shell prompt + </a> ]<br> +</p> +<h5>Terminal</h5> +<p> + [ <a href="https://gnunn1.github.io/tilix-web/"> + tilix + </a> ] + [ <a href="https://alacritty.org/"> + alacritty + </a> ]<br> +</p> +<h5>Terminal Multiplexer</h5> +<p> + [ <a href="https://github.com/tmux/tmux"> + tmux (github) + </a> ] + [ <a href="https://www.gnu.org/software/screen/"> + screen + </a> ]<br> +</p> +<h5>Window Manager</h5> +<p> + [ <a href="https://i3wm.org/"> + i3wm + </a> ] + [ <a href="https://swaywm.org/"> + sway + </a> ]<br> +</p> +<h5>Text Editors</h5> +<p> + Gnu Emacs + [ <a href="https://github.com/hlissner/doom-emacs"> + Doom Emacs (github) + </a> ] + [ <a href="https://orgmode.org/"> + Org-Mode - your life in plain text & literate programming + </a> ] + [ <a href="https://github.com/emacs-evil/evil"> + Evil-Mode + </a> ]<br> +</p> +<p> + [ <a href="https://www.vim.org/"> + Vim + </a> ] + [ <a href="https://neovim.io/"> + NeoVim + </a> ]<br> +</p> +<h5>Source Control Manager</h5> +<p> + [ <a href="https://git-scm.com/"> + Git + </a> ]<br> +</p> +<h5>Browsers</h5> +<p> + [ <a href="https://vieb.dev/"> + vieb + </a> ] + [ <a href="https://fanglingsu.github.io/vimb/"> + vimb + </a> ]<br> + [ <a href="https://brave.com/"> + brave + </a> ]<br> +</p> + +<h3>Search</h3> +<p> + [ <a href="https://duckduckgo.com/"> + DuckDuckGo + </a> ] + [ <a href="https://yubnub.org/"> + YubNub + </a> ]<br> +</p> + +<h3>eMail</h3> +<p> + [ <a href="https://www.migadu.com/"> + Migadu + </a> ]<br> +</p> +<p> + [ <a href="https://notmuchmail.org/"> + NotmuchMail + </a> ]<br> +</p> + +<h3>Forges</h3> +<p> + [ <a href="https://sourcehut.org/"> + Sourcehut + </a> ]<br> +</p> +<p> + [ <a href="https://codeberg.org/"> + CodeBerg + </a> ]<br> +</p> +<p> + [ <a href="https://github.com"> + GitHub + </a> ] + [ <a href="https://gitlab.com"> + GitLab + </a> ]<br> +</p> + +<h3>Software Archives</h3> +<p> + [ <a href="https://www.softwareheritage.org/"> + Software Heritage - the universal software archive + </a> ]<br> +</p> + +<hr> +<p class="tiny"><i> +ralph.amissah www since 1993 ;-) +</i></p> + +</body> +</html> |