aboutsummaryrefslogtreecommitdiffhomepage
path: root/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/home/index.html
blob: b2e44e48588d0c666c5dfb03ddfc314ae4c0f715 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<title>SiSU information Structuring Universe - Structured
information, Serialized Units - software for electronic texts,
documents, books, digital libraries in plaintext, html, XHTML, XML,
ODF (OpenDocument), LaTeX, PDF, SQL (PostgreSQL and SQLite), and
for search</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="dc.title" content="SiSU information Structuring Universe, Structured information Serialised Units, 2007" />
<meta name="dc.creator" content="Ralph Amissah" />
<meta name="dc.subject" content=
"document structuring, ebook, publishing, PDF, LaTeX, XML, ODF, SQL, postgresql, sqlite, electronic book, electronic publishing, electronic document, electronic citation, data structure, citation systems, granular search, digital library" />
<meta name="dc.publisher" content=
"SiSU http://www.jus.uio.no/sisu" />
<meta name="dc.language" content="en" />
<meta name="dc.rights" content="Copyright Ralph Amissah" />
<meta name="generator" content="SiSU 0.58.3 of 2007w36/4 (2007-09-06) (n*x and Ruby!)" />
<link rel="generator" href="http://www.jus.uio.no/sisu/SiSU" />
<link rel="stylesheet" href="./_sisu/css/html.css" type="text/css" />
<link rel="shortcut icon" href="./_sisu/image/rb7.ico" />
</head>
<body>
<div id="top_band">
<p class="top_band_image">
  <a href="http://www.jus.uio.no/sisu/SiSU" target="_top" >
    <img border="0" src="./_sisu/image/sisu.png" alt="SiSU &gt;&gt;">
  </a>
</p>
<h1 class="top_band">
  SiSU information Structuring Universe
</h1>
<h2 class="top_band_tiny">
  Structured information, Serialized Units
</h2>
<h2 class="top_band_tiny">
software for electronic texts, document collections, books, digital libraries, and search,
</h2>
<h2 class="top_band_tiny">
 with "atomic search" and text positioning system (shared text citation numbering: "<i>ocn</i>")
</h2>
<h2 class="top_band_tiny">
outputs include: plaintext, html, XHTML, XML, ODF (OpenDocument), LaTeX, PDF, SQL (PostgreSQL and SQLite)
</h2>
</div>
<div id="top_band_search">
<!-- SiSU Search -->
<a name="search"></a><form method="get" action="http://search.sisudoc.org" target="_top">
<input type="text" name="s1" size="24" maxlength="255" />
<br />
<input type="submit" name="ignore" value="search" />
<input type="hidden" name="db" value="SiSU_sisu_manual" />
<input type="radio" name="view" value="index" checked="checked" /> idx
<input type="radio" name="view" value="text" /> txt
</form>
<!-- SiSU Search -->
</div>
<div id="column_left">
<p class="small">
  <a href="./sisu_manual/index.html" target="_top" >
    SiSU manual (composite document)
  </a>
</p>
<p class="small">
  <a href="./sisu_introduction/index.html" target="_top" >
    SiSU introduction
  </a>
</p>
<p class="small">
  <a href="./sisu_markup/index.html" target="_top" >
    SiSU markup
  </a>
</p>
<p class="small">
  <a href="./sisu_commands/index.html" target="_top" >
    SiSU commands
  </a>
</p>
<p class="small">
  <a href="./sisu_configuration/index.html" target="_top" >
    SiSU configuration
  </a>
</p>
<p class="tiny">
  ---
</p>
<p class="small">
  <a href="./sisu_help/index.html" target="_top" >
    SiSU help
  </a>
</p>
<p class="small">
  <a href="./sisu_help_sources/index.html" target="_top" >
    SiSU help sources
  </a>
</p>
<p class="tiny">
  ---
</p>
<p class="tiny">
  online
</p>
<p class="bold">
  <a href="http://www.jus.uio.no/sisu/SiSU" target="_top" >
    SiSU
  </a>
</p>
<p class="small">
  <a href="http://www.jus.uio.no/sisu/SiSU/download.html" target="_top" >
    SiSU download
  </a>
</p>
</div>
<div id="column_center">
<p class="bold">
  For less markup than the most elementary HTML you can have more.
</p>
<p><a href="http://www.jus.uio.no/sisu/SiSU" target="_top" ><b>SiSU</b> - Structured information, Serialized Units</a> for electronic documents, is an information structuring, transforming, publishing and search framework with the following features:</p>
<p>
<b>(i)</b> markup syntax:
<b>(a)</b>
simpler than html,
<b>(b)</b>
mnemonic, influenced by mail/messaging/wiki markup practices,
<b>(c)</b> human readable, and easily writable,</p>
<p><b>(ii)</b>
<b>(a)</b>
minimal markup requirement,
<b>(b)</b>
single file marked up for multiple outputs,</p>
<p><b>
notes
</b></p>
<p class="small">
<b>*</b>
documents are prepared in a single UTF-8 file using a minimalistic mnemonic syntax. Typical literature, documents like "War and Peace" require almost no markup, and most of the headers are optional.
</p>
<p class="small">
<b>*</b>
markup is easily readable/parsed by the human eye, (basic markup is simpler and more sparse than the most basic html), [this may also be converted to XML representations of the same input/source document].
</p>
<p class="small">
<b>*</b>
markup defines document structure (this may be done once in a header pattern-match description, or for heading levels individually); basic text attributes (bold, italics, underscore, strike-through etc.) as required; and semantic information related to the document (header information, extended beyond the Dublin core and easily further extended as required); the headers may also contain processing instructions.
</p>
<p><b>(iii)</b>
<b>(a)</b>
multiple outputs primarily industry established and institutionally accepted open standard formats, include amongst others: plaintext (UTF-8); html; (structured) XML; ODF (Open Document text)l; LaTeX; PDF (via LaTeX); SQL type databases (currently PostgreSQL and SQLite). Also produces: concordance files; document content certificates (md5 or sha256 digests of headings, paragraphs, images etc.) and html manifests (and sitemaps of content).

<b>(b)
</b>
takes advantage of the strengths implicit in these very different output types, (e.g. PDFs produced using typesetting of LaTeX, databases populated with documents at an individual object/paragraph level, making possible granular search (and related possibilities))</p>
<p><b>(iv)</b>
outputs share a common numbering system (dubbed "object citation numbering" (ocn)) that is meaningful (to man and machine) across various digital outputs whether paper, screen, or database oriented, (PDF, html, XML, sqlite, postgresql), this numbering system can be used to reference content.</p>
<p>
<b>(v)</b>
SQL databases are populated at an object level (roughly headings, paragraphs, verse, tables) and become searchable with that degree of granularity, the output information provides the object/paragraph numbers which are relevant across all generated outputs; it is also possible to look at just the matching paragraphs of the documents in the database; [output indexing also work well with search indexing tools like hyperesteier].</p>
<p>
<b>(vi)</b>
 use of semantic meta-tags in headers permit the addition of semantic information on documents, (the available fields are easily extended)</p>
<p>
<b>(vii)</b>
creates organised directory/file structure for (file-system) output, easily mapped with its clearly defined structure, with all text objects numbered, you know in advance where in each document output type, a bit of text will be found (e.g. from an SQL search, you know where to go to find the prepared html output or PDF etc.)... there is more; easy directory management and document associations, the document preparation (sub-)directory may be used to determine output (sub-)directory, the skin used, and the SQL database used,</p>
<p>
<b>(viii)</b>
"Concordance file" wordmap, consisting of all the words in a document and their (text/ object) locations within the text, (and the possibility of adding vocabularies),</p>
<p>
<b>(ix)</b>
document content certification and comparison considerations:
<b>(a)</b>
the document and each object within it stamped with an md5 hash making it possible to easily check or guarantee that the substantive content of a document is unchanged,
<b>(b)</b>
version control, documents integrated with time based source control system, default RCS or CVS with use of $Id$ tag, which SiSU checks
<p>
<b>(x)</b>
SiSU's minimalist markup makes for meaningful "diffing" of the substantive content of markup-files,</p>
<p>
<b>(xi)</b>
easily skinnable, document appearance on a project/site wide, directory wide, or document instance level easily controlled/changed,</p>
<p>
<b>(xii)</b>
in many cases a regular expression may be used (once in the document header) to define all or part of a documents structure obviating or reducing the need to provide structural markup within the document,</p>
<p>
<b>(xiii)</b>
prepared files may be batch process, documents produced are static files so this needs to be done only once but may be repeated for various reasons as desired (updated content, addition of new output formats, updated technology document presentations/representations)</p>
<p>
<b>(xiv)</b>
possible to pre-process, which permits: the easy creation of standard form documents, and templates/term-sheets, or; building of composite documents (master documents) from other sisu marked up documents, or marked up parts, i.e. import documents or parts of text into a main document should this be desired</p>
<p>
there is a considerable degree of future-proofing, output representations are "upgradeable", and new document formats may be added.
</p>
<p>
<b>(xv)</b>
there is a considerable degree of future-proofing, output representations are "upgradeable", and new document formats may be added:
<b>(a)</b>
modular, (thanks in no small part to Ruby) another output format required, write another module....
<b>(b)</b> easy to update output formats (eg html, XHTML, LaTeX/PDF produced can be updated in program and run against whole document set),
<b>(c)</b> easy to add, modify, or have alternative syntax rules for input, should you need to,</p>
<p>
<b>(xvi)</b>
scalability, dependent on your file-system (ext3, Reiserfs, XFS, whatever) and on the relational database used (currently Postgresql and SQLite), and your hardware,</p>
<p>
<b>(xvii)</b>
only marked up files need be backed up, to secure the larger document set produced,</p>
<p>
<b>(xviii)</b>
document management,</p>
<p>
<b>(xix)</b>
Syntax highlighting for SiSU markup is available for a number of text editors.</p>
<p><b>(xx)</b> remote operations:
<b>(a)</b>
run SiSU on a remote server, (having prepared sisu markup documents locally or on that server, i.e. this solution where sisu is installed on the remote server, would work whatever type of machine you chose to prepare your markup documents on),
<b>(b)</b>
generated document outputs may be posted by sisu to remote sites (using rsync/scp)
<b>(c)</b>
document source (plaintext utf-8) if shared on the net may be identified by its url and processed locally to produce the different document outputs.</p>
<p>
<b>(xxi)</b>
document source may be bundled together (automatically) with associated documents (multiple language versions or master document with inclusions) and images and sent as a zip file called a sisupod, if shared on the net these too may be processed locally to produce the desired document outputs, these may be downloaded, shared as email attachments, or processed by running sisu against them, either using a url or the filename.
</p>
<p>
<b>(xxii)</b>
for basic document generation, the only software dependency is Ruby, and a few standard Unix tools (this covers plaintext, html, XML, ODF, LaTeX). To use a database you of course need that, and to convert the LaTeX generated to PDF, a LaTeX processor like tetex or texlive.
</p>
<p>
as a developers tool it is flexible and extensible
</p>
<br />
<p class="small">
More information on <a href="http://www.jus.uio.no/sisu/SiSU/"><b>SiSU</b></a> provided at <a href="http://www.jus.uio.no/sisu/SiSU/">www.jus.uio.no/sisu/SiSU</a></p>
</div>
<div id="column_right">
<p class="tiny">
SiSU ("SiSU information Structuring Universe" or "Structured information, Serialized Units"),<sup>1</sup> is a Unix command line oriented framework for document structuring, publishing and search. Featuring minimalistic markup, multiple standard outputs, a common citation system, and granular search.
</p>
<p class="tiny">
 Using markup applied to a document, SiSU can produce plain text, HTML, XHTML, XML, OpenDocument, LaTeX or PDF files, and populate an SQL database with objects<sup>2</sup> (equating generally to paragraph-sized chunks) so searches may be performed and matches returned with that degree of granularity (e.g. your search criteria is met by these documents and at these locations within each document). Document output formats share a common object numbering system for locating content. This is particularly suitable for "published" works (finalized texts as opposed to works that are frequently changed or updated) for which it provides a fixed means of reference of content.
</p>

<p class="small">
How it works
</p>
<p class="tiny">
SiSU markup is fairly minimalistic, it consists of: a (largely optional) document header, made up of information about the document (such as when it was published, who authored it, and granting what rights) and any processing instructions; and markup within text which is related to document structure and typeface. SiSU must be able to discern the structure of a document, (text headings and their levels in relation to each other), either from information provided in the instruction header or from markup within the text (or from a combination of both). Processing is done against an abstraction of the document comprising of information on the document's structure and its objects,<sup>2</sup> which the program serializes (providing the object numbers) and which are assigned hash sum values based on their content. This abstraction of information about document structure, objects, (and hash sums), provides considerable flexibility in representing documents different ways and for different purposes (e.g. search, document layout, publishing, content certification, concordance etc.), and makes it possible to take advantage of some of the strengths of established ways of representing documents, (or indeed to create new ones).</p>
<p class="tiny">
<sup>1.</sup> also chosen for the meaning of the Finnish term "sisu".
</p>
<p class="tiny">
<sup>2</sup> objects include: headings, paragraphs, verse, tables, images, but not footnotes/endnotes which are numbered separately and tied to the object from which they are referenced.</p>
<p class="small">
  More information on <a href="http://www.jus.uio.no/sisu/SiSU/"><b>SiSU</b></a> provided at:
  <a href="http://www.jus.uio.no/sisu/SiSU/">
    www.jus.uio.no/sisu/SiSU
  </a>
<p class="tiny">
SiSU was developed in relation to legal documents, and is strong across a wide variety of texts (law, literature...(humanities, law and part of the social sciences)). SiSU handles images but is not suitable for formulae/ statistics, or for technical writing at this time.</p>
<p class="tiny">
SiSU has been developed and has been in use for several years. Requirements to cover a wide range of documents within its use domain have been explored.</p>
<p class="small">
<a href="mailto://ralph@amissah.com">
ralph@amissah.com
</a>
</p>
<p class="small">
<a href="mailto://ralph.amissah@gmail.com">
ralph.amissah@gmail.com
</a>
</p>
<p class="small">
2007
</p>
<p class="tiny">
w3 since October 3 1993
</p>
</div>
</body>
</html>