diff options
Diffstat (limited to 'data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1')
-rw-r--r-- | data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1 | 503 |
1 files changed, 0 insertions, 503 deletions
diff --git a/data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1 b/data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1 deleted file mode 100644 index 082a6eae..00000000 --- a/data/doc/manuals_generated/sisu_manual/man/sisu_introduction.1 +++ /dev/null @@ -1,503 +0,0 @@ -.TH "sisu_introduction" "1" "2007-09-16" "0.59.1" "SiSU" -.SH -SISU \- COMMANDS, -RALPH AMISSAH -.BR - -.SH -WHAT IS SISU? -.BR - -.SH -DESCRIPTION -.BR - -.SH -1. INTRODUCTION \- WHAT IS SISU? -.BR - -.BR -.B SiSU -is a system for document markup, publishing (in multiple open standard -formats) and search - -.BR -.B SiSU -[^1] is a[^2] framework for document structuring, publishing and search, -comprising of (a) a lightweight document structure and presentation markup -syntax and (b) an accompanying engine for generating standard document format -outputs from documents prepared in sisu markup syntax, which is able to produce -multiple standard outputs that (can) share a common numbering system for the -citation of text within a document. - -.BR -.B SiSU -is developed under an open source, software libre license (GPL3). It has been -developed in the context of coping with large document sets with evolving -markup related technologies, for which you want multiple output formats, a -common mechanism for cross\-output\-format citation, and search. - -.BR -.B SiSU -both defines a markup syntax and provides an engine that produces open -standards format outputs from documents prepared with -.B SiSU -markup. From a single lightly prepared document sisu custom builds several -standard output formats which share a common (text object) numbering system for -citation of content within a document (that also has implications for search). -The sisu engine works with an abstraction of the document\'s structure and -content from which it is possible to generate different forms of representation -of the document. Significantly -.B SiSU -markup is more sparse than html and outputs which include html, LaTeX, -landscape and portrait pdfs, Open Document Format (ODF), all of which can be -added to and updated. -.B SiSU -is also able to populate SQL type databases at an object level, which means -that searches can be made with that degree of granularity. Results of objects -(primarily paragraphs and headings) can be viewed directly in the database, or -just the object numbers shown \- your search criteria is met in these documents -and at these locations within each document. - -.BR -Source document preparation and output generation is a two step process: (i) -document source is prepared, that is, marked up in sisu markup syntax and (ii) -the desired output subsequently generated by running the sisu engine against -document source. Output representations if updated (in the sisu engine) can be -generated by re\-running the engine against the prepared source. Using -.B SiSU -markup applied to a document, -.B SiSU -custom builds various standard open output formats including plain text, -HTML, XHTML, XML, OpenDocument, LaTeX or PDF files, and populate an SQL -database with objects[^3] (equating generally to paragraph\-sized chunks) so -searches may be performed and matches returned with that degree of granularity -( e.g. your search criteria is met by these documents and at these locations -within each document). Document output formats share a common object numbering -system for locating content. This is particularly suitable for \"published\" -works (finalized texts as opposed to works that are frequently changed or -updated) for which it provides a fixed means of reference of content. - -.BR -In preparing a -.B SiSU -document you optionally provide semantic information related to the document -in a document header, and in marking up the substantive text provide -information on the structure of the document, primarily indicating heading -levels and footnotes. You also provide information on basic text attributes -where used. The rest is automatic, sisu from this information custom builds[^4] -the different forms of output requested. - -.BR -.B SiSU -works with an abstraction of the document based on its structure which is -comprised of its frame[^5] and the objects[^6] it contains, which enables -.B SiSU -to represent the document in many different ways, and to take advantage of -the strengths of different ways of presenting documents. The objects are -numbered, and these numbers can be used to provide a common base for citing -material within a document across the different output format types. This is -significant as page numbers are not suited to the digital age, in web -publishing, changing a browser\'s default font or using a different browser -means that text appears on different pages; and in publishing in different -formats, html, landscape and portrait pdf etc. again page numbers are of no use -to cite text in a manner that is relevant against the different output types. -Dealing with documents at an object level together with object numbering also -has implications for search. - -.BR -One of the challenges of maintaining documents is to keep them in a format that -would allow users to use them without depending on a proprietary software -popular at the time. Consider the ease of dealing with legacy proprietary -formats today and what guarantee you have that old proprietary formats will -remain (or can be read without proprietary software/equipment) in 15 years -time, or the way the way in which html has evolved over its relatively short -span of existence. -.B SiSU -provides the flexibility of outputing documents in multiple non\-proprietary -open formats including html, pdf[^7] and the ISO standard ODF.[^8] Whilst -.B SiSU -relies on software, the markup is uncomplicated and minimalistic which -guarantees that future engines can be written to run against it. It is also -easily converted to other formats, which means documents prepared in -.B SiSU -can be migrated to other document formats. Further security is provided by -the fact that the software itself, -.B SiSU -is available under GPL3 a licence that guarantees that the source code will -always be open, and free as in libre which means that that code base can be -used updated and further developed as required under the terms of its license. -Another challenge is to keep up with a moving target. -.B SiSU -permits new forms of output to be added as they become important, (Open -Document Format text was added in 2006), and existing output to be updated -(html has evolved and the related module has been updated repeatedly over the -years, presumably when the World Wide Web Consortium (w3c) finalises html 5 -which is currently under development, the html module will again be updated -allowing all existing documents to be regenerated as html 5). - -.BR -The document formats are written to the file\-system and available for indexing -by independent indexing tools, whether off the web like Google and Yahoo or on -the site like Lucene and Hyperestraier. - -.BR -.B SiSU -also provides other features such as concordance files and document content -certificates, and the working against an abstraction of document structure has -further possibilities for the research and development of other document -representations, the availability of objects is useful for example for topic -maps and the commercial law thesaurus by Vikki Rogers and Al Krtizer, together -with the flexibility of -.B SiSU -offers great possibilities. - -.BR -.B SiSU -is primarily for published works, which can take advantage of the citation -system to reliably reference its documents. -.B SiSU -works well in a complementary manner with such collaborative technologies as -Wikis, which can take advantage of and be used to discuss the substance of -content prepared in -.B SiSU -. - -.BR -<http://www.jus.uio.no/sisu> - -.SH -2. HOW DOES SISU WORK? -.BR - -.BR -.B SiSU -markup is fairly minimalistic, it consists of: a (largely optional) document -header, made up of information about the document (such as when it was -published, who authored it, and granting what rights) and any processing -instructions; and markup within the substantive text of the document, which is -related to document structure and typeface. -.B SiSU -must be able to discern the structure of a document, (text headings and their -levels in relation to each other), either from information provided in the -document header or from markup within the text (or from a combination of both). -Processing is done against an abstraction of the document comprising of -information on the document\'s structure and its objects,[2] which the program -serializes (providing the object numbers) and which are assigned hash sum -values based on their content. This abstraction of information about document -structure, objects, (and hash sums), provides considerable flexibility in -representing documents different ways and for different purposes (e.g. search, -document layout, publishing, content certification, concordance etc.), and -makes it possible to take advantage of some of the strengths of established -ways of representing documents, (or indeed to create new ones). - -.SH -3. SUMMARY OF FEATURES -.BR - -.BR -* sparse/minimal markup (clean utf\-8 source texts). Documents are prepared in -a single UTF\-8 file using a minimalistic mnemonic syntax. Typical literature, -documents like \"War and Peace\" require almost no markup, and most of the -headers are optional. - -.BR -* markup is easily readable/parsable by the human eye, (basic markup is simpler -and more sparse than the most basic HTML), \ [this \ may \ also \ be \ -converted \ to \ XML \ representations \ of \ the \ same \ input/source \ -document]. - -.BR -* markup defines document structure (this may be done once in a header -pattern\-match description, or for heading levels individually); basic text -attributes (bold, italics, underscore, strike\-through etc.) as required; and -semantic information related to the document (header information, extended -beyond the Dublin core and easily further extended as required); the headers -may also contain processing instructions. -.B SiSU -markup is primarily an abstraction of document structure and document -metadata to permit taking advantage of the basic strengths of existing -alternative practical standard ways of representing documents \ [be \ that \ -browser \ viewing, \ paper \ publication, \ sql \ search \ etc.] (html, xml, -odf, latex, pdf, sql) - -.BR -* for output produces reasonably elegant output of established industry and -institutionally accepted open standard formats.[3] takes advantage of the -different strengths of various standard formats for representing documents, -amongst the output formats currently supported are: - -.BR - * html \- both as a single scrollable text and a segmented document - -.BR - * xhtml - -.BR - * XML \- both in sax and dom style xml structures for further development as - required - -.BR - * ODF \- open document format, the iso standard for document storage - -.BR - * LaTeX \- used to generate pdf - -.BR - * pdf (via LaTeX) - -.BR - * sql \- population of an sql database, (at the same object level that is - used to cite text within a document) - -.BR -Also produces: concordance files; document content certificates (md5 or sha256 -digests of headings, paragraphs, images etc.) and html manifests (and sitemaps -of content). (b) takes advantage of the strengths implicit in these very -different output types, (e.g. PDFs produced using typesetting of LaTeX, -databases populated with documents at an individual object/paragraph level, -making possible granular search (and related possibilities)) - -.BR -* ensuring content can be cited in a meaningful way regardless of selected -output format. Online publishing (and publishing in multiple document formats) -lacks a useful way of citing text internally within documents (important to -academics generally and to lawyers) as page numbers are meaningless across -browsers and formats. sisu seeks to provide a common way of pinpoint the text -within a document, (which can be utilized for citation and by search engines). -The outputs share a common numbering system that is meaningful (to man and -machine) across all digital outputs whether paper, screen, or database -oriented, (pdf, HTML, xml, sqlite, postgresql), this numbering system can be -used to reference content. - -.BR -* Granular search within documents. SQL databases are populated at an object -level (roughly headings, paragraphs, verse, tables) and become searchable with -that degree of granularity, the output information provides the -object/paragraph numbers which are relevant across all generated outputs; it is -also possible to look at just the matching paragraphs of the documents in the -database; \ [output \ indexing \ also \ work \ well \ with \ search \ indexing -\ tools \ like \ hyperestraier]. - -.BR -* long term maintainability of document collections in a world of changing -formats, having a very sparsely marked\-up source document base. there is a -considerable degree of future\-proofing, output representations are -\"upgradeable\", and new document formats may be added. e.g. addition of odf -(open document text) module in 2006 and in future html5 output sometime in -future, without modification of existing prepared texts - -.BR -* SQL search aside, documents are generated as required and static once -generated. - -.BR -* documents produced are static files, and may be batch processed, this needs -to be done only once but may be repeated for various reasons as desired -(updated content, addition of new output formats, updated technology document -presentations/representations) - -.BR -* document source (plaintext utf\-8) if shared on the net may be used as input -and processed locally to produce the different document outputs - -.BR -* document source may be bundled together (automatically) with associated -documents (multiple language versions or master document with inclusions) and -images and sent as a zip file called a sisupod, if shared on the net these too -may be processed locally to produce the desired document outputs - -.BR -* generated document outputs may automatically be posted to remote sites. - -.BR -* for basic document generation, the only software dependency is -.B Ruby -, and a few standard Unix tools (this covers plaintext, HTML, XML, ODF, -LaTeX). To use a database you of course need that, and to convert the LaTeX -generated to pdf, a latex processor like tetex or texlive. - -.BR -* as a developers tool it is flexible and extensible - -.BR -Syntax highlighting for -.B SiSU -markup is available for a number of text editors. - -.BR -.B SiSU -is less about document layout than about finding a way with little markup to -be able to construct an abstract representation of a document that makes it -possible to produce multiple representations of it which may be rather -different from each other and used for different purposes, whether layout and -publishing, or search of content - -.BR -i.e. to be able to take advantage from this minimal preparation starting point -of some of the strengths of rather different established ways of representing -documents for different purposes, whether for search (relational database, or -indexed flat files generated for that purpose whether of complete documents, or -say of files made up of objects), online viewing (e.g. html, xml, pdf), or -paper publication (e.g. pdf)... - -.BR -the solution arrived at is by extracting structural information about the -document (about headings within the document) and by tracking objects (which -are serialized and also given hash values) in the manner described. It makes -possible representations that are quite different from those offered at -present. For example objects could be saved individually and identified by -their hashes, with an index of how the objects relate to each other to form a -document. - -.SH -DOCUMENT INFORMATION (METADATA) -.BR - -.SH -METADATA -.BR - -.BR -Document Manifest @ -<http://www.jus.uio.no/sisu/sisu_manual/sisu_introduction/sisu_manifest.html> - -.BR -.B Dublin Core -(DC) - -.BR -.I DC tags included with this document are provided here. - -.BR -DC Title: -.I SiSU \- Commands - -.BR -DC Creator: -.I Ralph Amissah - -.BR -DC Rights: -.I Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL -3 - -.BR -DC Type: -.I information - -.BR -DC Date created: -.I 2002\-08\-28 - -.BR -DC Date issued: -.I 2002\-08\-28 - -.BR -DC Date available: -.I 2002\-08\-28 - -.BR -DC Date modified: -.I 2007\-09\-16 - -.BR -DC Date: -.I 2007\-09\-16 - -.BR -.B Version Information - -.BR -Sourcefile: -.I sisu_introduction.sst - -.BR -Filetype: -.I SiSU text 0.58 - -.BR -Sourcefile Digest, MD5(sisu_introduction.sst)= -.I 877333106803c1fc864bccdbd0c667e2 - -.BR -Skin_Digest: -MD5(/home/ralph/grotto/theatre/dbld/builds/sisu/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)= -.I 20fc43cf3eb6590bc3399a1aef65c5a9 - -.BR -.B Generated - -.BR -Document (metaverse) last generated: -.I Tue Sep 25 02:54:41 +0100 2007 - -.BR -Generated by: -.I SiSU -.I 0.59.1 -of 2007w39/2 (2007\-09\-25) - -.BR -Ruby version: -.I ruby 1.8.6 (2007\-06\-07 patchlevel 36) \ [i486\-linux] - -.TP -.BI 1. -\" \.B SiSU -information Structuring Universe\" or \"Structured information, Serialized -Units\". - also chosen for the meaning of the Finnish term "sisu". -.TP -.BI 2. -Unix command line oriented -.TP -.BI 3. -objects include: headings, paragraphs, verse, tables, images, but not -footnotes/endnotes which are numbered separately and tied to the object from -which they are referenced. -.TP -.BI 4. -i.e. the html, pdf, odf outputs are each built individually and optimised for -that form of presentation, rather than for example the html being a saved -version of the odf, or the pdf being a saved version of the html. -.TP -.BI 5. -the different heading levels -.TP -.BI 6. -units of text, primarily paragraphs and headings, also any tables, poems, -code-blocks -.TP -.BI 7. -Specification submitted by Adobe to ISO to become a full open ISO -specification - <http://www.linux-watch.com/news/NS7542722606.html> -.TP -.BI 8. -ISO/IEC 26300:2006 - -.TP -Other versions of this document: -.TP -manifest: <http://www.jus.uio.no/sisu/sisu_introduction/sisu_manifest.html> -.TP -html: <http://www.jus.uio.no/sisu/sisu_introduction/toc.html> -.TP -pdf: <http://www.jus.uio.no/sisu/sisu_introduction/portrait.pdf> -.TP -pdf: <http://www.jus.uio.no/sisu/sisu_introduction/landscape.pdf> -." .TP -." manpage: http://www.jus.uio.no/sisu/sisu_introduction/sisu_introduction.1 -.TP -at: <http://www.jus.uio.no/sisu> -.TP -.TP -* Generated by: SiSU 0.59.1 of 2007w39/2 (2007-09-25) -.TP -* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] -.TP -* Last Generated on: Tue Sep 25 02:54:49 +0100 2007 -.TP -* SiSU http://www.jus.uio.no/sisu |