summaryrefslogtreecommitdiffstats
path: root/data/doc/sisu/markup-samples/v4/sisu_manual/sisu_introduction.sst
blob: 8978ace6a3a7295a101a7e391404334ed24e997e (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
% SiSU 4.0

@title: SiSU
 :subtitle: Introduction

@creator:
 :author: Amissah, Ralph

@date:
 :published: 2007-09-16
 :created: 2002-08-28
 :issued: 2002-08-28
 :available: 2002-08-28
 :modified: 2012-10-03

@rights:
 :copyright: Copyright (C) Ralph Amissah 2011
 :license: GPL 3 (part of SiSU documentation)

@classify:
 :subject: ebook, epublishing, electronic book, electronic publishing, electronic document, electronic citation, data structure, citation systems, search

:A~? @title @creator

:B~? What is SiSU?

:C~? Description

1~sisu_intro Introduction - What is SiSU?

SiSU is a framework for document structuring, publishing (in multiple open standard formats) and search, comprising of: (a) a lightweight document structure and presentation markup syntax; and (b) an accompanying engine for generating standard document format outputs from documents prepared in sisu markup syntax, which is able to produce multiple standard outputs (including the population of sql databases) that (can) share a common numbering system for the citation of text within a document.

SiSU is developed under an open source, software libre license (GPLv3). Its use case for development is work with medium to large document sets and cope with evolving document formats/ representation technologies. Documents are prepared once, and generated as need be to update the technical presentation or add additional output formats. Various output formats (including search related output) share a common mechanism for cross-output-format citation.

SiSU both defines a markup syntax and provides an engine that produces open standards format outputs from documents prepared with SiSU markup. From a single lightly prepared document sisu custom builds several standard output formats which share a common (text object) numbering system for citation of content within a document (that also has implications for search). The sisu engine works with an abstraction of the document's structure and content from which it is possible to generate different forms of representation of the document. Significantly SiSU markup is more sparse than html and outputs which include HTML, EPUB, ODT (Open Document Format text), LaTeX, landscape and portrait PDF, all of which can be added to and updated. SiSU is also able to populate SQL type databases at an object level, which means that searches can be made with that degree of granularity.

Source document preparation and output generation is a two step process: (i) document source is prepared, that is, marked up in sisu markup syntax and (ii) the desired output subsequently generated by running the sisu engine against document source. Output representations if updated (in the sisu engine) can be generated by re-running the engine against the prepared source. Using SiSU markup applied to a document, SiSU custom builds (to take advantage of the strengths of different ways of representing documents) various standard open output formats including plain text, HTML, XHTML, XML, EPUB, ODT, LaTeX or PDF files, and populate an SQL database with objects~{ objects include: headings, paragraphs, verse, tables, images, but not footnotes/endnotes which are numbered separately and tied to the object from which they are referenced. }~ (equating generally to paragraph-sized chunks) so searches may be performed and matches returned with that degree of granularity ( e.g. your search criteria is met by these documents and at these locations within each document). Document output formats share a common object numbering system for locating content. This is particularly suitable for "published" works (finalized texts as opposed to works that are frequently changed or updated) for which it provides a fixed means of reference of content.

In preparing a SiSU document you optionally provide semantic information related to the document in a document header, and in marking up the substantive text provide information on the structure of the document, primarily indicating heading levels and footnotes. You also provide information on basic text attributes where used. The rest is automatic, sisu from this information custom builds~{ i.e. the HTML, PDF, EPUB, ODT outputs are each built individually and optimised for that form of presentation, rather than for example the html being a saved version of the odf, or the pdf being a saved version of the html. }~ the different forms of output requested.

SiSU works with an abstraction of the document based on its structure which is comprised of its headings~{ the different heading levels }~ and objects~{ units of text, primarily paragraphs and headings, also any tables, poems, code-blocks }~, which enables SiSU to represent the document in many different ways, and to take advantage of the strengths of different ways of presenting documents. The objects are numbered, and these numbers can be used to provide a common basis for citing material within a document across the different output format types. This is significant as page numbers are not well suited to the digital age, in web publishing, changing a browser's default font or using a different browser can mean that text will appear on a different page; and publishing in different formats, html, landscape and portrait pdf etc. again page numbers are not useful to cite text. Dealing with documents at an object level together with object numbering also has implications for search that SiSU is able to take advantage of.

One of the challenges of maintaining documents is to keep them in a format that allows use of them independently of proprietary platforms. Consider issues related to dealing with legacy proprietary formats today and what guarantee you have that old proprietary formats will remain (or can be read without proprietary software/equipment) in 15 years time, or the way the way in which html has evolved over its relatively short span of existence. SiSU provides the flexibility of producing documents in multiple non-proprietary open formats including HTML, EPUB,~{ An open standard format for e-books }~ ODT,~{ Open Document Format (ODF) text }~ PDF~{ Specification submitted by Adobe to ISO to become a full open ISO specification <br> http://www.linux-watch.com/news/NS7542722606.html }~ ODF,~{ ISO standard ISO/IEC 26300:2006 }~. Whilst SiSU relies on software, the markup is uncomplicated and minimalistic which guarantees that future engines can be written to run against it. It is also easily converted to other formats, which means documents prepared in SiSU can be migrated to other document formats. Further security is provided by the fact that the software itself, SiSU is available under GPLv3 a licence that guarantees that the source code will always be open, and free as in libre, which means that that code base can be used, updated and further developed as required under the terms of its license. Another challenge is to keep up with a moving target. SiSU permits new forms of output to be added as they become important, (Open Document Format text was added in 2006 when it became an ISO standard for office applications and the archival of documents), EPUB was introduced in 2009; and allows the technical representations existing output to be updated (HTML has evolved and the related module has been updated repeatedly over the years, presumably when the World Wide Web Consortium (w3c) finalises HTML 5 which is currently under development, the HTML module will again be updated allowing all existing documents to be regenerated as HTML 5).

The document formats are written to the file-system and available for indexing by independent indexing tools, whether off the web like Google and Yahoo or on the site like Lucene and Hyperestraier.

SiSU also provides other features such as concordance files and document content certificates, and the working against an abstraction of document structure has further possibilities for the research and development of other document representations, the availability of objects is useful for example for topic maps and thesauri, together with the flexibility of SiSU offers great possibilities.

SiSU is primarily for published works, which can take advantage of the citation system to reliably reference its documents. SiSU works well in a complementary manner with such collaborative technologies as Wikis, which can take advantage of and be used to discuss the substance of content prepared in SiSU.

http://www.sisudoc.org/

http://www.jus.uio.no/sisu

% SiSU is a way of preparing, publishing, managing and searching documents.