aboutsummaryrefslogtreecommitdiffhomepage
path: root/data/doc/manuals_generated/sisu_manual/sisu_introduction/dom.xml
diff options
context:
space:
mode:
Diffstat (limited to 'data/doc/manuals_generated/sisu_manual/sisu_introduction/dom.xml')
-rw-r--r--data/doc/manuals_generated/sisu_manual/sisu_introduction/dom.xml617
1 files changed, 617 insertions, 0 deletions
diff --git a/data/doc/manuals_generated/sisu_manual/sisu_introduction/dom.xml b/data/doc/manuals_generated/sisu_manual/sisu_introduction/dom.xml
new file mode 100644
index 00000000..644f61e1
--- /dev/null
+++ b/data/doc/manuals_generated/sisu_manual/sisu_introduction/dom.xml
@@ -0,0 +1,617 @@
+<?xml version="1.0" encoding="UTF-8" standalone="no"?>
+<?xml-stylesheet type="text/css" href="../_sisu/css/dom.css"?>
+<!-- Document processing information:
+ * Generated by: SiSU 0.59.0 of 2007w38/0 (2007-09-23)
+ * Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
+ *
+ * Last Generated on: Sun Sep 23 04:12:08 +0100 2007
+ * SiSU http://www.jus.uio.no/sisu
+-->
+
+<document>
+
+<head>
+
+ <header>
+ <meta>Title:</meta>
+ <title>
+ SiSU - Commands [0.58]
+ </title>
+ </header>
+
+ <header>
+ <meta>Creator:</meta>
+ <creator>
+ Ralph Amissah
+ </creator>
+ </header>
+
+ <header>
+ <meta>Rights:</meta>
+ <rights>
+ Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL 3
+ </rights>
+ </header>
+
+ <header>
+ <meta>Type:</meta>
+ <type>
+ information
+ </type>
+ </header>
+
+ <header>
+ <meta>Subject:</meta>
+ <subject>
+ ebook, epublishing, electronic book, electronic publishing, electronic document, electronic citation, data structure, citation systems, search
+ </subject>
+ </header>
+
+ <header>
+ <meta>Date created:</meta>
+ <date_created>
+ 2002-08-28
+ </date_created>
+ </header>
+
+ <header>
+ <meta>Date issued:</meta>
+ <date_issued>
+ 2002-08-28
+ </date_issued>
+ </header>
+
+ <header>
+ <meta>Date available:</meta>
+ <date_available>
+ 2002-08-28
+ </date_available>
+ </header>
+
+ <header>
+ <meta>Date modified:</meta>
+ <date_modified>
+ 2007-09-16
+ </date_modified>
+ </header>
+
+ <header>
+ <meta>Date:</meta>
+ <date>
+ 2007-09-16
+ </date>
+ </header>
+
+
+
+
+
+
+</head>
+
+<body>
+
+<heading1>
+ <heading>
+ <object id="1">
+ <ocn>1</ocn>
+ <text>SiSU - Commands [0.58],<br /> Ralph Amissah</text>
+ </object>
+ </heading>
+
+ <heading2>
+ <heading>
+ <object id="2">
+ <ocn>2</ocn>
+ <text>What is SiSU?</text>
+ </object>
+ </heading>
+
+ <heading3>
+ <heading>
+ <object id="3">
+ <ocn>3</ocn>
+ <nametag>?</nametag>
+
+ <text>Description</text>
+ </object>
+ </heading>
+
+ <contents1>
+ <heading>
+ <object id="4">
+ <ocn>4</ocn>
+ <nametag>sisu_intro</nametag>
+
+ <text>1. Introduction - What is SiSU? </text>
+ </object>
+ </heading>
+ <content>
+
+ <object id="5">
+
+
+ <ocn>5</ocn>
+
+
+ <text class="norm"><b>SiSU</b> is a system for document markup, publishing (in multiple open standard formats) and search</text>
+
+ </object>
+
+
+ <object id="6">
+
+
+ <ocn>6</ocn>
+
+
+ <text class="norm"><b>SiSU</b><endnote><number>1</number><note>"<b>SiSU</b> information Structuring Universe" or "Structured information, Serialized Units".<br /> also chosen for the meaning of the Finnish term "sisu".</note></endnote> is a<endnote><number>2</number><note>Unix command line oriented</note></endnote> framework for document structuring, publishing and search, comprising of (a) a lightweight document structure and presentation markup syntax and (b) an accompanying engine for generating standard document format outputs from documents prepared in sisu markup syntax, which is able to produce multiple standard outputs that (can) share a common numbering system for the citation of text within a document.</text>
+
+ </object>
+
+
+ <object id="7">
+
+
+ <ocn>7</ocn>
+
+
+ <text class="norm"><b>SiSU</b> is developed under an open source, software libre license (GPL3). It has been developed in the context of coping with large document sets with evolving markup related technologies, for which you want multiple output formats, a common mechanism for cross-output-format citation, and search.</text>
+
+ </object>
+
+
+ <object id="8">
+
+
+ <ocn>8</ocn>
+
+
+ <text class="norm"><b>SiSU</b> both defines a markup syntax and provides an engine that produces open standards format outputs from documents prepared with <b>SiSU</b> markup. From a single lightly prepared document sisu custom builds several standard output formats which share a common (text object) numbering system for citation of content within a document (that also has implications for search). The sisu engine works with an abstraction of the document's structure and content from which it is possible to generate different forms of representation of the document. Significantly <b>SiSU</b> markup is more sparse than html and outputs which include html, LaTeX, landscape and portrait pdfs, Open Document Format (ODF), all of which can be added to and updated. <b>SiSU</b> is also able to populate SQL type databases at an object level, which means that searches can be made with that degree of granularity. Results of objects (primarily paragraphs and headings) can be viewed directly in the database, or just the object numbers shown - your search criteria is met in these documents and at these locations within each document.</text>
+
+ </object>
+
+
+ <object id="9">
+
+
+ <ocn>9</ocn>
+
+
+ <text class="norm">Source document preparation and output generation is a two step process: (i) document source is prepared, that is, marked up in sisu markup syntax and (ii) the desired output subsequently generated by running the sisu engine against document source. Output representations if updated (in the sisu engine) can be generated by re-running the engine against the prepared source. Using <b>SiSU</b> markup applied to a document, <b>SiSU</b> custom builds various standard open output formats including plain text, HTML, XHTML, XML, OpenDocument, LaTeX or PDF files, and populate an SQL database with objects<endnote><number>3</number><note>objects include: headings, paragraphs, verse, tables, images, but not footnotes/endnotes which are numbered separately and tied to the object from which they are referenced.</note></endnote> (equating generally to paragraph-sized chunks) so searches may be performed and matches returned with that degree of granularity ( e.g. your search criteria is met by these documents and at these locations within each document). Document output formats share a common object numbering system for locating content. This is particularly suitable for "published" works (finalized texts as opposed to works that are frequently changed or updated) for which it provides a fixed means of reference of content.</text>
+
+ </object>
+
+
+ <object id="10">
+
+
+ <ocn>10</ocn>
+
+
+ <text class="norm">In preparing a <b>SiSU</b> document you optionally provide semantic information related to the document in a document header, and in marking up the substantive text provide information on the structure of the document, primarily indicating heading levels and footnotes. You also provide information on basic text attributes where used. The rest is automatic, sisu from this information custom builds<endnote><number>4</number><note>i.e. the html, pdf, odf outputs are each built individually and optimised for that form of presentation, rather than for example the html being a saved version of the odf, or the pdf being a saved version of the html.</note></endnote> the different forms of output requested.</text>
+
+ </object>
+
+
+ <object id="11">
+
+
+ <ocn>11</ocn>
+
+
+ <text class="norm"><b>SiSU</b> works with an abstraction of the document based on its structure which is comprised of its frame<endnote><number>5</number><note>the different heading levels</note></endnote> and the objects<endnote><number>6</number><note>units of text, primarily paragraphs and headings, also any tables, poems, code-blocks</note></endnote> it contains, which enables <b>SiSU</b> to represent the document in many different ways, and to take advantage of the strengths of different ways of presenting documents. The objects are numbered, and these numbers can be used to provide a common base for citing material within a document across the different output format types. This is significant as page numbers are not suited to the digital age, in web publishing, changing a browser's default font or using a different browser means that text appears on different pages; and in publishing in different formats, html, landscape and portrait pdf etc. again page numbers are of no use to cite text in a manner that is relevant against the different output types. Dealing with documents at an object level together with object numbering also has implications for search.</text>
+
+ </object>
+
+
+ <object id="12">
+
+
+ <ocn>12</ocn>
+
+
+ <text class="norm">One of the challenges of maintaining documents is to keep them in a format that would allow users to use them without depending on a proprietary software popular at the time. Consider the ease of dealing with legacy proprietary formats today and what guarantee you have that old proprietary formats will remain (or can be read without proprietary software/equipment) in 15 years time, or the way the way in which html has evolved over its relatively short span of existence. <b>SiSU</b> provides the flexibility of outputing documents in multiple non-proprietary open formats including html, pdf<endnote><number>7</number><note>Specification submitted by Adobe to ISO to become a full open ISO specification <br /> &lt;<link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://www.linux-watch.com/news/NS7542722606.html">http://www.linux-watch.com/news/NS7542722606.html</link>&gt;</note></endnote> and the ISO standard ODF.<endnote><number>8</number><note>ISO/IEC 26300:2006</note></endnote> Whilst <b>SiSU</b> relies on software, the markup is uncomplicated and minimalistic which guarantees that future engines can be written to run against it. It is also easily converted to other formats, which means documents prepared in <b>SiSU</b> can be migrated to other document formats. Further security is provided by the fact that the software itself, <b>SiSU</b> is available under GPL3 a licence that guarantees that the source code will always be open, and free as in libre which means that that code base can be used updated and further developed as required under the terms of its license. Another challenge is to keep up with a moving target. <b>SiSU</b> permits new forms of output to be added as they become important, (Open Document Format text was added in 2006), and existing output to be updated (html has evolved and the related module has been updated repeatedly over the years, presumably when the World Wide Web Consortium (w3c) finalises html 5 which is currently under development, the html module will again be updated allowing all existing documents to be regenerated as html 5).</text>
+
+ </object>
+
+
+ <object id="13">
+
+
+ <ocn>13</ocn>
+
+
+ <text class="norm">The document formats are written to the file-system and available for indexing by independent indexing tools, whether off the web like Google and Yahoo or on the site like Lucene and Hyperestraier.</text>
+
+ </object>
+
+
+ <object id="14">
+
+
+ <ocn>14</ocn>
+
+
+ <text class="norm"><b>SiSU</b> also provides other features such as concordance files and document content certificates, and the working against an abstraction of document structure has further possibilities for the research and development of other document representations, the availability of objects is useful for example for topic maps and the commercial law thesaurus by Vikki Rogers and Al Krtizer, together with the flexibility of <b>SiSU</b> offers great possibilities.</text>
+
+ </object>
+
+
+ <object id="15">
+
+
+ <ocn>15</ocn>
+
+
+ <text class="norm"><b>SiSU</b> is primarily for published works, which can take advantage of the citation system to reliably reference its documents. <b>SiSU</b> works well in a complementary manner with such collaborative technologies as Wikis, which can take advantage of and be used to discuss the substance of content prepared in <b>SiSU</b>.</text>
+
+ </object>
+
+
+ <object id="16">
+
+
+ <ocn>16</ocn>
+
+
+ <text class="norm">&lt;<link xmlns:xlink="http://www.w3.org/1999/xlink" xlink:type="simple" xlink:href="http://www.jus.uio.no/sisu">http://www.jus.uio.no/sisu</link>&gt;</text>
+
+ </object>
+
+
+ </content>
+
+ </contents1>
+
+ <contents1>
+ <heading>
+ <object id="17">
+ <ocn>17</ocn>
+ <nametag>sisu_how</nametag>
+
+ <text>2. How does sisu work? </text>
+ </object>
+ </heading>
+ <content>
+
+ <object id="18">
+
+
+ <ocn>18</ocn>
+
+
+ <text class="norm"><b>SiSU</b> markup is fairly minimalistic, it consists of: a (largely optional) document header, made up of information about the document (such as when it was published, who authored it, and granting what rights) and any processing instructions; and markup within the substantive text of the document, which is related to document structure and typeface. <b>SiSU</b> must be able to discern the structure of a document, (text headings and their levels in relation to each other), either from information provided in the document header or from markup within the text (or from a combination of both). Processing is done against an abstraction of the document comprising of information on the document's structure and its objects,[2] which the program serializes (providing the object numbers) and which are assigned hash sum values based on their content. This abstraction of information about document structure, objects, (and hash sums), provides considerable flexibility in representing documents different ways and for different purposes (e.g. search, document layout, publishing, content certification, concordance etc.), and makes it possible to take advantage of some of the strengths of established ways of representing documents, (or indeed to create new ones).</text>
+
+ </object>
+
+
+ </content>
+
+ </contents1>
+
+ <contents1>
+ <heading>
+ <object id="19">
+ <ocn>19</ocn>
+ <nametag>sisu_feature_summary</nametag>
+
+ <text>3. Summary of features </text>
+ </object>
+ </heading>
+ <content>
+
+ <object id="20">
+
+
+ <ocn>20</ocn>
+
+
+ <text class="indent_bullet"> sparse/minimal markup (clean utf-8 source texts). Documents are prepared in a single UTF-8 file using a minimalistic mnemonic syntax. Typical literature, documents like "War and Peace" require almost no markup, and most of the headers are optional.</text>
+
+ </object>
+
+
+ <object id="21">
+
+
+ <ocn>21</ocn>
+
+
+ <text class="indent_bullet"> markup is easily readable/parsable by the human eye, (basic markup is simpler and more sparse than the most basic HTML), [this may also be converted to XML representations of the same input/source document].</text>
+
+ </object>
+
+
+ <object id="22">
+
+
+ <ocn>22</ocn>
+
+
+ <text class="indent_bullet"> markup defines document structure (this may be done once in a header pattern-match description, or for heading levels individually); basic text attributes (bold, italics, underscore, strike-through etc.) as required; and semantic information related to the document (header information, extended beyond the Dublin core and easily further extended as required); the headers may also contain processing instructions. <b>SiSU</b> markup is primarily an abstraction of document structure and document metadata to permit taking advantage of the basic strengths of existing alternative practical standard ways of representing documents [be that browser viewing, paper publication, sql search etc.] (html, xml, odf, latex, pdf, sql)</text>
+
+ </object>
+
+
+ <object id="23">
+
+
+ <ocn>23</ocn>
+
+
+ <text class="indent_bullet"> for output produces reasonably elegant output of established industry and institutionally accepted open standard formats.[3] takes advantage of the different strengths of various standard formats for representing documents, amongst the output formats currently supported are:</text>
+
+ </object>
+
+
+ <object id="24">
+
+
+ <ocn>24</ocn>
+
+
+ <text class="indent_bullet1"> html - both as a single scrollable text and a segmented document</text>
+
+ </object>
+
+
+ <object id="25">
+
+
+ <ocn>25</ocn>
+
+
+ <text class="indent_bullet1"> xhtml</text>
+
+ </object>
+
+
+ <object id="26">
+
+
+ <ocn>26</ocn>
+
+
+ <text class="indent_bullet1"> XML - both in sax and dom style xml structures for further development as required</text>
+
+ </object>
+
+
+ <object id="27">
+
+
+ <ocn>27</ocn>
+
+
+ <text class="indent_bullet1"> ODF - open document format, the iso standard for document storage</text>
+
+ </object>
+
+
+ <object id="28">
+
+
+ <ocn>28</ocn>
+
+
+ <text class="indent_bullet1"> LaTeX - used to generate pdf</text>
+
+ </object>
+
+
+ <object id="29">
+
+
+ <ocn>29</ocn>
+
+
+ <text class="indent_bullet1"> pdf (via LaTeX)</text>
+
+ </object>
+
+
+ <object id="30">
+
+
+ <ocn>30</ocn>
+
+
+ <text class="indent_bullet1"> sql - population of an sql database, (at the same object level that is used to cite text within a document)</text>
+
+ </object>
+
+
+ <object id="31">
+
+
+ <ocn>31</ocn>
+
+
+ <text class="norm">Also produces: concordance files; document content certificates (md5 or sha256 digests of headings, paragraphs, images etc.) and html manifests (and sitemaps of content). (b) takes advantage of the strengths implicit in these very different output types, (e.g. PDFs produced using typesetting of LaTeX, databases populated with documents at an individual object/paragraph level, making possible granular search (and related possibilities))</text>
+
+ </object>
+
+
+ <object id="32">
+
+
+ <ocn>32</ocn>
+
+
+ <text class="indent_bullet"> ensuring content can be cited in a meaningful way regardless of selected output format. Online publishing (and publishing in multiple document formats) lacks a useful way of citing text internally within documents (important to academics generally and to lawyers) as page numbers are meaningless across browsers and formats. sisu seeks to provide a common way of pinpoint the text within a document, (which can be utilized for citation and by search engines). The outputs share a common numbering system that is meaningful (to man and machine) across all digital outputs whether paper, screen, or database oriented, (pdf, HTML, xml, sqlite, postgresql), this numbering system can be used to reference content.</text>
+
+ </object>
+
+
+ <object id="33">
+
+
+ <ocn>33</ocn>
+
+
+ <text class="indent_bullet"> Granular search within documents. SQL databases are populated at an object level (roughly headings, paragraphs, verse, tables) and become searchable with that degree of granularity, the output information provides the object/paragraph numbers which are relevant across all generated outputs; it is also possible to look at just the matching paragraphs of the documents in the database; [output indexing also work well with search indexing tools like hyperestraier].</text>
+
+ </object>
+
+
+ <object id="34">
+
+
+ <ocn>34</ocn>
+
+
+ <text class="indent_bullet"> long term maintainability of document collections in a world of changing formats, having a very sparsely marked-up source document base. there is a considerable degree of future-proofing, output representations are "upgradeable", and new document formats may be added. e.g. addition of odf (open document text) module in 2006 and in future html5 output sometime in future, without modification of existing prepared texts</text>
+
+ </object>
+
+
+ <object id="35">
+
+
+ <ocn>35</ocn>
+
+
+ <text class="indent_bullet"> SQL search aside, documents are generated as required and static once generated.</text>
+
+ </object>
+
+
+ <object id="36">
+
+
+ <ocn>36</ocn>
+
+
+ <text class="indent_bullet"> documents produced are static files, and may be batch processed, this needs to be done only once but may be repeated for various reasons as desired (updated content, addition of new output formats, updated technology document presentations/representations)</text>
+
+ </object>
+
+
+ <object id="37">
+
+
+ <ocn>37</ocn>
+
+
+ <text class="indent_bullet"> document source (plaintext utf-8) if shared on the net may be used as input and processed locally to produce the different document outputs</text>
+
+ </object>
+
+
+ <object id="38">
+
+
+ <ocn>38</ocn>
+
+
+ <text class="indent_bullet"> document source may be bundled together (automatically) with associated documents (multiple language versions or master document with inclusions) and images and sent as a zip file called a sisupod, if shared on the net these too may be processed locally to produce the desired document outputs</text>
+
+ </object>
+
+
+ <object id="39">
+
+
+ <ocn>39</ocn>
+
+
+ <text class="indent_bullet"> generated document outputs may automatically be posted to remote sites.</text>
+
+ </object>
+
+
+ <object id="40">
+
+
+ <ocn>40</ocn>
+
+
+ <text class="indent_bullet"> for basic document generation, the only software dependency is <b>Ruby</b>, and a few standard Unix tools (this covers plaintext, HTML, XML, ODF, LaTeX). To use a database you of course need that, and to convert the LaTeX generated to pdf, a latex processor like tetex or texlive.</text>
+
+ </object>
+
+
+ <object id="41">
+
+
+ <ocn>41</ocn>
+
+
+ <text class="indent_bullet"> as a developers tool it is flexible and extensible</text>
+
+ </object>
+
+
+ <object id="42">
+
+
+ <ocn>42</ocn>
+
+
+ <text class="norm">Syntax highlighting for <b>SiSU</b> markup is available for a number of text editors.</text>
+
+ </object>
+
+
+ <object id="43">
+
+
+ <ocn>43</ocn>
+
+
+ <text class="norm"><b>SiSU</b> is less about document layout than about finding a way with little markup to be able to construct an abstract representation of a document that makes it possible to produce multiple representations of it which may be rather different from each other and used for different purposes, whether layout and publishing, or search of content</text>
+
+ </object>
+
+
+ <object id="44">
+
+
+ <ocn>44</ocn>
+
+
+ <text class="norm">i.e. to be able to take advantage from this minimal preparation starting point of some of the strengths of rather different established ways of representing documents for different purposes, whether for search (relational database, or indexed flat files generated for that purpose whether of complete documents, or say of files made up of objects), online viewing (e.g. html, xml, pdf), or paper publication (e.g. pdf)...</text>
+
+ </object>
+
+
+ <object id="45">
+
+
+ <ocn>45</ocn>
+
+
+ <text class="norm">the solution arrived at is by extracting structural information about the document (about headings within the document) and by tracking objects (which are serialized and also given hash values) in the manner described. It makes possible representations that are quite different from those offered at present. For example objects could be saved individually and identified by their hashes, with an index of how the objects relate to each other to form a document.</text>
+
+ </object>
+
+
+ </content>
+
+ </contents1>
+
+ <contents1>
+ <heading>
+ <object id="0">
+ <ocn>0</ocn>
+ <nametag>endnotes</nametag>
+
+ <text>Endnotes</text>
+ </object>
+ </heading>
+ <content>
+
+ </content>
+ </contents1>
+
+ </heading3>
+
+ </heading2>
+
+</heading1>
+
+</body>
+
+</document>
+