aboutsummaryrefslogtreecommitdiffhomepage
path: root/data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1
diff options
context:
space:
mode:
Diffstat (limited to 'data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1')
-rw-r--r--data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1504
1 files changed, 504 insertions, 0 deletions
diff --git a/data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1 b/data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1
new file mode 100644
index 00000000..22e04ea0
--- /dev/null
+++ b/data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1
@@ -0,0 +1,504 @@
+.TH "sisu_introduction" "1" "2007-09-16" "0.59.0" "SiSU"
+.SH
+SISU \- COMMANDS \ [0.58],
+RALPH AMISSAH
+.BR
+
+.SH
+WHAT IS SISU?
+.BR
+
+.SH
+DESCRIPTION
+.BR
+
+.SH
+1. INTRODUCTION \- WHAT IS SISU?
+.BR
+
+.BR
+.B SiSU
+is a system for document markup, publishing (in multiple open standard
+formats) and search
+
+.BR
+.B SiSU
+[^1] is a[^2] framework for document structuring, publishing and search,
+comprising of (a) a lightweight document structure and presentation markup
+syntax and (b) an accompanying engine for generating standard document format
+outputs from documents prepared in sisu markup syntax, which is able to produce
+multiple standard outputs that (can) share a common numbering system for the
+citation of text within a document.
+
+.BR
+.B SiSU
+is developed under an open source, software libre license (GPL3). It has been
+developed in the context of coping with large document sets with evolving
+markup related technologies, for which you want multiple output formats, a
+common mechanism for cross\-output\-format citation, and search.
+
+.BR
+.B SiSU
+both defines a markup syntax and provides an engine that produces open
+standards format outputs from documents prepared with
+.B SiSU
+markup. From a single lightly prepared document sisu custom builds several
+standard output formats which share a common (text object) numbering system for
+citation of content within a document (that also has implications for search).
+The sisu engine works with an abstraction of the document\'s structure and
+content from which it is possible to generate different forms of representation
+of the document. Significantly
+.B SiSU
+markup is more sparse than html and outputs which include html, LaTeX,
+landscape and portrait pdfs, Open Document Format (ODF), all of which can be
+added to and updated.
+.B SiSU
+is also able to populate SQL type databases at an object level, which means
+that searches can be made with that degree of granularity. Results of objects
+(primarily paragraphs and headings) can be viewed directly in the database, or
+just the object numbers shown \- your search criteria is met in these documents
+and at these locations within each document.
+
+.BR
+Source document preparation and output generation is a two step process: (i)
+document source is prepared, that is, marked up in sisu markup syntax and (ii)
+the desired output subsequently generated by running the sisu engine against
+document source. Output representations if updated (in the sisu engine) can be
+generated by re\-running the engine against the prepared source. Using
+.B SiSU
+markup applied to a document,
+.B SiSU
+custom builds various standard open output formats including plain text,
+HTML, XHTML, XML, OpenDocument, LaTeX or PDF files, and populate an SQL
+database with objects[^3] (equating generally to paragraph\-sized chunks) so
+searches may be performed and matches returned with that degree of granularity
+( e.g. your search criteria is met by these documents and at these locations
+within each document). Document output formats share a common object numbering
+system for locating content. This is particularly suitable for \"published\"
+works (finalized texts as opposed to works that are frequently changed or
+updated) for which it provides a fixed means of reference of content.
+
+.BR
+In preparing a
+.B SiSU
+document you optionally provide semantic information related to the document
+in a document header, and in marking up the substantive text provide
+information on the structure of the document, primarily indicating heading
+levels and footnotes. You also provide information on basic text attributes
+where used. The rest is automatic, sisu from this information custom builds[^4]
+the different forms of output requested.
+
+.BR
+.B SiSU
+works with an abstraction of the document based on its structure which is
+comprised of its frame[^5] and the objects[^6] it contains, which enables
+.B SiSU
+to represent the document in many different ways, and to take advantage of
+the strengths of different ways of presenting documents. The objects are
+numbered, and these numbers can be used to provide a common base for citing
+material within a document across the different output format types. This is
+significant as page numbers are not suited to the digital age, in web
+publishing, changing a browser\'s default font or using a different browser
+means that text appears on different pages; and in publishing in different
+formats, html, landscape and portrait pdf etc. again page numbers are of no use
+to cite text in a manner that is relevant against the different output types.
+Dealing with documents at an object level together with object numbering also
+has implications for search.
+
+.BR
+One of the challenges of maintaining documents is to keep them in a format that
+would allow users to use them without depending on a proprietary software
+popular at the time. Consider the ease of dealing with legacy proprietary
+formats today and what guarantee you have that old proprietary formats will
+remain (or can be read without proprietary software/equipment) in 15 years
+time, or the way the way in which html has evolved over its relatively short
+span of existence.
+.B SiSU
+provides the flexibility of outputing documents in multiple non\-proprietary
+open formats including html, pdf[^7] and the ISO standard ODF.[^8] Whilst
+.B SiSU
+relies on software, the markup is uncomplicated and minimalistic which
+guarantees that future engines can be written to run against it. It is also
+easily converted to other formats, which means documents prepared in
+.B SiSU
+can be migrated to other document formats. Further security is provided by
+the fact that the software itself,
+.B SiSU
+is available under GPL3 a licence that guarantees that the source code will
+always be open, and free as in libre which means that that code base can be
+used updated and further developed as required under the terms of its license.
+Another challenge is to keep up with a moving target.
+.B SiSU
+permits new forms of output to be added as they become important, (Open
+Document Format text was added in 2006), and existing output to be updated
+(html has evolved and the related module has been updated repeatedly over the
+years, presumably when the World Wide Web Consortium (w3c) finalises html 5
+which is currently under development, the html module will again be updated
+allowing all existing documents to be regenerated as html 5).
+
+.BR
+The document formats are written to the file\-system and available for indexing
+by independent indexing tools, whether off the web like Google and Yahoo or on
+the site like Lucene and Hyperestraier.
+
+.BR
+.B SiSU
+also provides other features such as concordance files and document content
+certificates, and the working against an abstraction of document structure has
+further possibilities for the research and development of other document
+representations, the availability of objects is useful for example for topic
+maps and the commercial law thesaurus by Vikki Rogers and Al Krtizer, together
+with the flexibility of
+.B SiSU
+offers great possibilities.
+
+.BR
+.B SiSU
+is primarily for published works, which can take advantage of the citation
+system to reliably reference its documents.
+.B SiSU
+works well in a complementary manner with such collaborative technologies as
+Wikis, which can take advantage of and be used to discuss the substance of
+content prepared in
+.B SiSU
+.
+
+.BR
+<http://www.jus.uio.no/sisu>
+
+.SH
+2. HOW DOES SISU WORK?
+.BR
+
+.BR
+.B SiSU
+markup is fairly minimalistic, it consists of: a (largely optional) document
+header, made up of information about the document (such as when it was
+published, who authored it, and granting what rights) and any processing
+instructions; and markup within the substantive text of the document, which is
+related to document structure and typeface.
+.B SiSU
+must be able to discern the structure of a document, (text headings and their
+levels in relation to each other), either from information provided in the
+document header or from markup within the text (or from a combination of both).
+Processing is done against an abstraction of the document comprising of
+information on the document\'s structure and its objects,[2] which the program
+serializes (providing the object numbers) and which are assigned hash sum
+values based on their content. This abstraction of information about document
+structure, objects, (and hash sums), provides considerable flexibility in
+representing documents different ways and for different purposes (e.g. search,
+document layout, publishing, content certification, concordance etc.), and
+makes it possible to take advantage of some of the strengths of established
+ways of representing documents, (or indeed to create new ones).
+
+.SH
+3. SUMMARY OF FEATURES
+.BR
+
+.BR
+* sparse/minimal markup (clean utf\-8 source texts). Documents are prepared in
+a single UTF\-8 file using a minimalistic mnemonic syntax. Typical literature,
+documents like \"War and Peace\" require almost no markup, and most of the
+headers are optional.
+
+.BR
+* markup is easily readable/parsable by the human eye, (basic markup is simpler
+and more sparse than the most basic HTML), \ [this \ may \ also \ be \
+converted \ to \ XML \ representations \ of \ the \ same \ input/source \
+document].
+
+.BR
+* markup defines document structure (this may be done once in a header
+pattern\-match description, or for heading levels individually); basic text
+attributes (bold, italics, underscore, strike\-through etc.) as required; and
+semantic information related to the document (header information, extended
+beyond the Dublin core and easily further extended as required); the headers
+may also contain processing instructions.
+.B SiSU
+markup is primarily an abstraction of document structure and document
+metadata to permit taking advantage of the basic strengths of existing
+alternative practical standard ways of representing documents \ [be \ that \
+browser \ viewing, \ paper \ publication, \ sql \ search \ etc.] (html, xml,
+odf, latex, pdf, sql)
+
+.BR
+* for output produces reasonably elegant output of established industry and
+institutionally accepted open standard formats.[3] takes advantage of the
+different strengths of various standard formats for representing documents,
+amongst the output formats currently supported are:
+
+.BR
+ * html \- both as a single scrollable text and a segmented document
+
+.BR
+ * xhtml
+
+.BR
+ * XML \- both in sax and dom style xml structures for further development as
+ required
+
+.BR
+ * ODF \- open document format, the iso standard for document storage
+
+.BR
+ * LaTeX \- used to generate pdf
+
+.BR
+ * pdf (via LaTeX)
+
+.BR
+ * sql \- population of an sql database, (at the same object level that is
+ used to cite text within a document)
+
+.BR
+Also produces: concordance files; document content certificates (md5 or sha256
+digests of headings, paragraphs, images etc.) and html manifests (and sitemaps
+of content). (b) takes advantage of the strengths implicit in these very
+different output types, (e.g. PDFs produced using typesetting of LaTeX,
+databases populated with documents at an individual object/paragraph level,
+making possible granular search (and related possibilities))
+
+.BR
+* ensuring content can be cited in a meaningful way regardless of selected
+output format. Online publishing (and publishing in multiple document formats)
+lacks a useful way of citing text internally within documents (important to
+academics generally and to lawyers) as page numbers are meaningless across
+browsers and formats. sisu seeks to provide a common way of pinpoint the text
+within a document, (which can be utilized for citation and by search engines).
+The outputs share a common numbering system that is meaningful (to man and
+machine) across all digital outputs whether paper, screen, or database
+oriented, (pdf, HTML, xml, sqlite, postgresql), this numbering system can be
+used to reference content.
+
+.BR
+* Granular search within documents. SQL databases are populated at an object
+level (roughly headings, paragraphs, verse, tables) and become searchable with
+that degree of granularity, the output information provides the
+object/paragraph numbers which are relevant across all generated outputs; it is
+also possible to look at just the matching paragraphs of the documents in the
+database; \ [output \ indexing \ also \ work \ well \ with \ search \ indexing
+\ tools \ like \ hyperestraier].
+
+.BR
+* long term maintainability of document collections in a world of changing
+formats, having a very sparsely marked\-up source document base. there is a
+considerable degree of future\-proofing, output representations are
+\"upgradeable\", and new document formats may be added. e.g. addition of odf
+(open document text) module in 2006 and in future html5 output sometime in
+future, without modification of existing prepared texts
+
+.BR
+* SQL search aside, documents are generated as required and static once
+generated.
+
+.BR
+* documents produced are static files, and may be batch processed, this needs
+to be done only once but may be repeated for various reasons as desired
+(updated content, addition of new output formats, updated technology document
+presentations/representations)
+
+.BR
+* document source (plaintext utf\-8) if shared on the net may be used as input
+and processed locally to produce the different document outputs
+
+.BR
+* document source may be bundled together (automatically) with associated
+documents (multiple language versions or master document with inclusions) and
+images and sent as a zip file called a sisupod, if shared on the net these too
+may be processed locally to produce the desired document outputs
+
+.BR
+* generated document outputs may automatically be posted to remote sites.
+
+.BR
+* for basic document generation, the only software dependency is
+.B Ruby
+, and a few standard Unix tools (this covers plaintext, HTML, XML, ODF,
+LaTeX). To use a database you of course need that, and to convert the LaTeX
+generated to pdf, a latex processor like tetex or texlive.
+
+.BR
+* as a developers tool it is flexible and extensible
+
+.BR
+Syntax highlighting for
+.B SiSU
+markup is available for a number of text editors.
+
+.BR
+.B SiSU
+is less about document layout than about finding a way with little markup to
+be able to construct an abstract representation of a document that makes it
+possible to produce multiple representations of it which may be rather
+different from each other and used for different purposes, whether layout and
+publishing, or search of content
+
+.BR
+i.e. to be able to take advantage from this minimal preparation starting point
+of some of the strengths of rather different established ways of representing
+documents for different purposes, whether for search (relational database, or
+indexed flat files generated for that purpose whether of complete documents, or
+say of files made up of objects), online viewing (e.g. html, xml, pdf), or
+paper publication (e.g. pdf)...
+
+.BR
+the solution arrived at is by extracting structural information about the
+document (about headings within the document) and by tracking objects (which
+are serialized and also given hash values) in the manner described. It makes
+possible representations that are quite different from those offered at
+present. For example objects could be saved individually and identified by
+their hashes, with an index of how the objects relate to each other to form a
+document.
+
+.SH
+DOCUMENT INFORMATION (METADATA)
+.BR
+
+.SH
+METADATA
+.BR
+
+.BR
+Document Manifest @
+<http://www.jus.uio.no/sisu/sisu_manual/sisu_introduction/sisu_manifest.html>
+
+.BR
+.B Dublin Core
+(DC)
+
+.BR
+.I DC tags included with this document are provided here.
+
+.BR
+DC Title:
+.I SiSU \- Commands \ [0.58]
+
+.BR
+DC Creator:
+.I Ralph Amissah
+
+.BR
+DC Rights:
+.I Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL
+3
+
+.BR
+DC Type:
+.I information
+
+.BR
+DC Date created:
+.I 2002\-08\-28
+
+.BR
+DC Date issued:
+.I 2002\-08\-28
+
+.BR
+DC Date available:
+.I 2002\-08\-28
+
+.BR
+DC Date modified:
+.I 2007\-09\-16
+
+.BR
+DC Date:
+.I 2007\-09\-16
+
+.BR
+.B Version Information
+
+.BR
+Sourcefile:
+.I sisu_introduction.sst
+
+.BR
+Filetype:
+.I SiSU text 0.58
+
+.BR
+Sourcefile Digest, MD5(sisu_introduction.sst)=
+.I b2a6da5bd22fa1eaa92a08d81f11d1c7
+
+.BR
+Skin_Digest:
+MD5(/home/ralph/grotto/theatre/dbld/sisu\-dev/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)=
+.I 20fc43cf3eb6590bc3399a1aef65c5a9
+
+.BR
+.B Generated
+
+.BR
+Document (metaverse) last generated:
+.I Sun Sep 23 04:13:42 +0100 2007
+
+.BR
+Generated by:
+.I SiSU
+.I 0.59.0
+of 2007w38/0 (2007\-09\-23)
+
+.BR
+Ruby version:
+.I ruby 1.8.6 (2007\-06\-07 patchlevel 36) \ [i486\-linux]
+
+.TP
+.BI 1.
+\"
+.B SiSU
+information Structuring Universe\" or \"Structured information, Serialized
+Units\".
+ also chosen for the meaning of the Finnish term "sisu".
+.TP
+.BI 2.
+Unix command line oriented
+.TP
+.BI 3.
+objects include: headings, paragraphs, verse, tables, images, but not
+footnotes/endnotes which are numbered separately and tied to the object from
+which they are referenced.
+.TP
+.BI 4.
+i.e. the html, pdf, odf outputs are each built individually and optimised for
+that form of presentation, rather than for example the html being a saved
+version of the odf, or the pdf being a saved version of the html.
+.TP
+.BI 5.
+the different heading levels
+.TP
+.BI 6.
+units of text, primarily paragraphs and headings, also any tables, poems,
+code-blocks
+.TP
+.BI 7.
+Specification submitted by Adobe to ISO to become a full open ISO
+specification
+ <http://www.linux-watch.com/news/NS7542722606.html>
+.TP
+.BI 8.
+ISO/IEC 26300:2006
+
+.TP
+Other versions of this document:
+.TP
+manifest: <http://www.jus.uio.no/sisu/sisu_introduction/sisu_manifest.html>
+.TP
+html: <http://www.jus.uio.no/sisu/sisu_introduction/toc.html>
+.TP
+pdf: <http://www.jus.uio.no/sisu/sisu_introduction/portrait.pdf>
+.TP
+pdf: <http://www.jus.uio.no/sisu/sisu_introduction/landscape.pdf>
+." .TP
+." manpage: http://www.jus.uio.no/sisu/sisu_introduction/sisu_introduction.1
+.TP
+at: <http://www.jus.uio.no/sisu>
+.TP
+.TP
+* Generated by: SiSU 0.59.0 of 2007w38/0 (2007-09-23)
+.TP
+* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
+.TP
+* Last Generated on: Sun Sep 23 04:13:49 +0100 2007
+.TP
+* SiSU http://www.jus.uio.no/sisu