debian/control some suggests (including CJK for pdf)
[software/sisu] / data / doc / sisu / org / sisu.org
1 #+TITLE: SiSU
2 #+AUTHOR: Ralph Amissah
3 #+EMAIL: ralph.amissah@gmail.com
4 #+STARTUP: indent content
5 #+LANGUAGE: en
6 #+OPTIONS: H:3 num:nil toc:t \n:nil @:t ::t |:t ^:nil _:nil -:t f:t *:t <:t
7 #+OPTIONS: TeX:t LaTeX:t skip:nil d:nil todo:t pri:nil tags:not-in-toc
8 #+OPTIONS: author:nil email:nil creator:nil timestamp:nil
9 #+PRIORITIES: A F E
10 #+EXPORT_SELECT_TAGS: export
11 #+EXPORT_EXCLUDE_TAGS: noexport
12 #+FILETAGS: :sisu:notes:
13 (emacs:evil mode gifts a "vim" of enticing "alternative" powers! ;)
14 (vim, my _editor_ of choice also in the emacs environment :)
15
16 * What is SiSU?
17
18 Multiple output formats with a nod to the strengths of each output format and
19 the ability to cite text easily across output formats.
20
21 ** debian/control desc
22
23 documents - structuring, publishing in multiple formats and search
24 SiSU is a lightweight markup based, command line oriented, document
25 structuring, publishing and search, static content tool for document
26 collections.
27 .
28 With minimal preparation of a plain-text (UTF-8) file, using sisu markup syntax
29 in your text editor of choice, SiSU can generate various document formats, most
30 of which share a common object numbering system for locating content, including
31 plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODF:ODT), LaTeX, PDF
32 files, and populate an SQL database with objects (roughly paragraph-sized
33 chunks) so searches may be performed and matches returned with that degree of
34 granularity. Think of being able to finely match text in documents, using
35 common object numbers, across different output formats and across languages if
36 you have translations of the same document. For search, your criteria is met
37 by these documents at these locations within each document (equally relevant
38 across different output formats and languages). To be clear (if obvious) page
39 numbers provide none of this functionality. Object numbering is particularly
40 suitable for "published" works (finalized texts as opposed to works that are
41 frequently changed or updated) for which it provides a fixed means of reference
42 of content. Document outputs can also share provided semantic meta-data.
43 .
44 SiSU also provides concordance files, document content certificates and
45 manifests of generated output and the means to make book indexes that make use
46 of its object numbering.
47 .
48 Syntax highlighting and folding (outlining) files are provided for the Vim and
49 Emacs editors.
50 .
51 Dependencies for various features are taken care of in sisu related packages.
52 The package sisu-complete installs the whole of SiSU.
53 .
54 Additional document markup samples are provided in the package
55 sisu-markup-samples which is found in the non-free archive. The licenses for
56 the substantive content of the marked up documents provided is that provided
57 by the author or original publisher.
58 .
59 SiSU uses utf-8 & parses left to right. Currently supported languages:
60 am bg bn br ca cs cy da de el en eo es et eu fi fr ga gl he hi hr hy ia is it
61 ja ko la lo lt lv ml mr nl nn no oc pl pt pt_BR ro ru sa se sk sl sq sr sv ta
62 te th tk tr uk ur us vi zh (see XeTeX polyglossia & cjk)
63 .
64 SiSU works well under po4a translation management, for which an administrative
65 sample Rakefile is provided with sisu_manual under markup-samples.
66
67 ** take two
68
69 SiSU may be regarded as an open access document publishing platform, applicable
70 to a modest but substantial domain of documents (typically law and literature,
71 but also some forms of technical writing), that is tasked to address certain
72 challenges I identified as being of interest to me over the years in open
73 publishing.
74
75 The idea and implementation may be of interest to consider as some of the
76 issues encountered and that it seeks to address are known and common to such
77 endeavors. Amongst them:
78
79 * how do you ensure what you do now can be read in decades?
80 * how do you keep up with new changing and technologies?
81 * do you select a canonical format to represent your documents, if so
82 what?
83 * how do you reliably cite (locate) material in different document
84 representations?
85 * how do you deal with multilingual texts?
86 * what of search?
87 * how are documents contributed to the collection?
88
89 (these questions are selected in to help describe the direction of efforts with
90 regard to sisu).
91
92 My Dabblings in the Domain of Open Publishing
93 ---------------------------------------------
94
95 The system is called SiSU, it is an offshoot of my early efforts at finding out
96 what to make of the web, that started at the University of Tromsø in 1993 (an
97 early law website Ananse/ International Trade Law Project / Lex Mercatoria). I
98 have worked on SiSU continually since 1997 and it has been open source in 2005
99 (under a license called GPL3+), though I remain its developer.
100
101 In working in this field I have had to address some of the common issues.
102
103 So how do you ensure what you do now can be read in decades to come? There are
104 alternative solutions. (i) stick with a widely used and not overly complicated
105 well document open standard, and for that the likes of odf is an excellent
106 choice (ii) alternatively go for the most basic representation of a document
107 that meets your needs, in my case based on UTF-8 text and some markup tags,
108 fairly easily parsable by the human eye and as long as utf8 is in use it will
109 always be possible to extract the information
110
111 How do you keep up with new changing and technologies? Here my solution has
112 been to generate new versions of the substantive content so as to always have
113 the latest document representations available e.g. HTML has changed a lot over
114 the years, different specifications come out for various formats including ODF,
115 electronic readers have become an important viewing alternative, introducing
116 the open reader format EPUB. Output representations are generated from source
117 documents. Different open document file formats can be produced and databases
118 and search engines populated. (The source documents and interpreter are all
119 that are required to re-create site content. Source documents can be made
120 public or retained privately). The strict separation of a simple source
121 document from the output produced, means that with updates to SiSU (the
122 interpreter/processor/generator), outputs can be updated technically as
123 necessary, and new output formats added when needed. Amongst the output formats
124 currently supported are HTML, LaTeX generated Pdfs (A4, letter, other;
125 landscape, portrait), Epub, Open Document Format text. Returning to HTML as an
126 example, it has changed a lot over the years I have worked with it, this way of
127 working has meant it is possible to keep producing current versions of HTML,
128 retaining the original substantive document... and new formats have been added
129 as thought desired. There is no attempt to make output in different document
130 formats/ representations look alike let alone identical. Rather the attempt is
131 to optimize output for the particular document filetype, (there is no reason
132 why an epub document would look or behave like an open document text or that a
133 Pdf would look like HTML output; rather PDF is optimized for paper viewing,
134 HTML for screen etc.) Wherever possible features associated with the
135 particular output type are taken advantage of. This freedom is made possible to
136 a large extent by the answer to the question that follows.
137
138 How do you reliably cite (locate) material in different document
139 representations? The traditional answer has been to have a canonical
140 publication, and resulting fixed page numbers. This was not a viable solution
141 for HTML (which changes from one viewer to another and with selectable font
142 faces & size etc.); nor is it otherwise ideal in an electronic age with the
143 possibility of presenting/interacting with material/documents in so many
144 different ways. Why be so restricted? Here my solution has been "object
145 citation numbering". What the various generated document formats have in
146 common is a shared object numbering system that identifies the location of text
147 and that is available for citation purposes. Object numbers are: sequential
148 numbers assigned to each identified object in a document. Objects are logical
149 units of text (or equivalent parts of a document), usually paragraphs, but also
150 document headings, tables, images, in a poem a verse etc. [In an electronic
151 publishing age are page numbers the best we can come up with? Change font
152 type, font size, page orientation, paper size (sometimes even the viewer) and
153 where are you with them? And paper though a favorite medium of mine is no
154 longer the sole (or sometimes primary) means of interacting with documents/text
155 or of sharing knowledge]
156
157 What object numbers mean (unlike page numbers) is e.g.
158
159 * if you cite text in any format, the resulting output can be reliably located
160 in any other document format type. Cite HTML and the reader can choose to
161 view in Epub or Pdf (the PDFs being an independent output, generated by
162 book publishing software XeTeX/LaTeX).
163
164 * if you do a search, you can be given a result "index" indicating that your
165 search criteria is met by these documents, and at these specific locations
166 within each document, and the "index" is relevant not only for content
167 within the database, but for all document formats.
168
169 * if you have a translated text prepared for sisu, then your citations are
170 relevant across languages e.g. you can specify exactly where in a Chinese
171 document text is to be found.
172
173 * generated document index references & concordance list references etc. are
174 relevant across all output formats.
175
176 What of search? For search, see the implications of object numbers for search
177 mentioned above. The system currently loads an SQL server (Postgresql) with
178 object sized text chunks. It could just as well populate an analytical engine
179 with larger sections or chapters of text for analytical purposes (such as the
180 currently popular Elasticsearch), whilst availing itself also of the concept of
181 objects and object numbers in search results.
182
183 How do you deal with multilingual texts? If you have translated text prepared
184 for sisu, then your citations are relevant across languages. Object numbers
185 also provide an easy way to compare, discuss text (translations) across
186 languages. Text found/cited in one language has the same object number in its
187 translations, a given paragraph will be the same in another language, just
188 change the language code. (documents are prepared in UTF-8, current language
189 restrictions are: through use of LaTeX tools, Polyglosia & CJK (Chinese,
190 Japanese & Korean), and from the fact that sisu parses left to right)
191
192 How are materials prepared for contribution to the collection? (a) The easiest
193 solution if the system allows is for submission in the format in which work is
194 authored, usually a word processor, for which odf may be a decent selection.
195 (b) I have stuck with enhanced plaintext, UTF-8 with minimal markup. Source
196 documents are prepared in UTF-8 text, with a minimalist native markup to
197 indicate the document structure (headings and their relative levels),
198 footnotes, and other document "features". This markup is easily parsable to the
199 human eye, and plays well with version control systems. Documents are prepared
200 in a text editor. Front ends such as markup assistants in a word processor that
201 can save to sisu text format or other tool whist possible do not exist. [(c)
202 yet another form of submission for collaborative work are wikis which have
203 shown their strength in efforts such as Wikipedia.]
204
205 The system has proven to be a good testing ground for ideas and is flexible and
206 extensible. (things that could usefully be done: apart from a front end for
207 simpler user interaction; feed text to an analytical search engine, like
208 Elasticsearch/Lucene; it still needs a bibliography parser (auto-generation of
209 a bibliography from footnotes); and it might be useful to allow rough auto
210 translation documents on the fly by passing text through a translator (such as
211 Google translate)).
212
213 In any event, my resulting technical opinions (in my modest domain of
214 action) may be regarded as encapsulated within SiSU
215 [http://www.sisudoc.org/]
216
217 http://www.sisudoc.org/
218 http://www.jus.uio.no/sisu/
219
220 git clone git://git.sisudoc.org/git/code/sisu.git --branch upstream
221 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary
222 (there may be additional commits in the upstream branch)
223 git clone --depth 1 git://git.sisudoc.org/git/code/sisu.git --branch upstream
224
225 git clone git://git.sisudoc.org/git/doc/sisu-markup-samples.git --branch upstream
226 git clone --depth 1 git://git.sisudoc.org/git/doc/sisu-markup-samples.git --branch upstream
227 Development work is on Linux and the easiest way to install it is through the
228 Debian Linux package as this takes care of optional external dependencies such
229 as XeTeX for PDF output and Postgresql or Sqlite for search.
230
231 ** multiple document formats
232
233 Text can be represented in multiple output formats with different
234 characteristics that are (or may be) regarded as strengths/advantages and
235 therefore preferred in different contexts.
236
237 Given the different strengths and characteristics of various output formats, it
238 makes little sense to try too hard to make different representations of a
239 document look the same. More interesting is have document representations that
240 take advantage of each given outputs strengths. As valuable if not more so is
241 the ability to cite, find, discuss text with ease, across the different output
242 formats.
243
244 For citation across output formats, SiSU uses object citation numbers.
245
246 ** document structure and document objects
247
248 SiSU breaks marked up text into document structure and objects
249
250 Document structure being the document heading hierarchy (having separated out
251 the document header).
252
253 *** What are document objects?
254 An object is an identified meaningful unit of a document, most commonly a
255 paragraph of text, but also for example a table, code block, verse or image.
256
257 SiSU tracks these substantive document units as document objects (and their
258 relationship to the document structure).
259
260 ** object citation numbers
261
262 *** What are object citation numbers?
263
264 An object citation number is a sequential number assigned to a document object.
265
266 In sisu output documents share this common object numbering system (dubbed
267 "object citation numbering" (ocn)) that is meaningful (machine & human readable)
268 across various digital outputs whether paper, screen, or database oriented,
269 (PDF, html, XML, EPUB, sqlite, postgresql), and across multilingual content if
270 prepared appropriately. This numbering system can be used to reference content
271 across output types.
272
273 *** Why might I want object citation numbering?
274
275 The ability to cite and quickly locate text can be invaluable if not essential.
276 (whether for instruction or discussion).
277
278 In this digital & Internet age we have multiple ways to represent documents and
279 multiple document output formats as options with different characteristics,
280 strengths/advantages etc. We need a way to cite text that works and is relevant
281 independent of the document format used.
282
283 I want to discuss (cite) html text how do I do this?
284 how do I refer to / cite / discuss text in html?
285 Issue: html may be viewed online or printed, it is not tied to paper (as
286 e.g. pdf) and prints differently depending on selected font face and font size.
287
288 I want to discuss (cite) text that is available in multiple formats (e.g. pdf,
289 epub, html) without having to worry about the output format that is referred
290 to.
291 How do I refer to / discuss text that is available in more than one format,
292 uncertain of what format is preferred, used or available to my colleagues?
293 e.g. html and epub or pdf have rather different text representations, how do I
294 discuss ...
295
296 I would like to have a book index that is relevant (can be used) across multiple
297 output formats (e.g. pdf, epub, html)
298
299 How do I make a book index (or a concordance file) that works across multiple
300 output formats?
301
302 I would like to have search results indicating where in a document matches are
303 found and I would like it to be relevant across available output formats (e.g.
304 pdf, epub, html)
305 How do I get search results for locations of text within each relevant document
306
307 I would like to be able to discuss a text that has been translated ...
308 how do I find text across languages?
309 Where I have a nicely translated document, how do I point to or discuss with my
310 foreign language counterpart some detail of the text, or, how do I point my
311 foreign language counterpart to the text I would like to bring to his
312 attention.
313
314 ** "Granular" Search
315
316 Of interest is the ease of streaming documents to a relational database, at an
317 object (roughly paragraph) level and the potential for increased precision in
318 the presentation of matches that results thereby. The ability to serialize
319 html, LaTeX, XML, SQL, (whatever) is also inherent in / incidental to the
320 design.
321
322 ** Summary
323 SiSU information Structuring Universe
324 Structured information, Serialized Units <www.sisudoc.org> or
325 <www.jus.uio.no/sisu/> software for electronic texts, document collections,
326 books, digital libraries, and search, with "atomic search" and text positioning
327 system (shared text citation numbering: "ocn")
328 outputs include: plaintext, html, XHTML, XML, ODF (OpenDocument), EPUB, LaTeX,
329 PDF, SQL (PostgreSQL and SQLite)
330
331 ** SiSU Short Description
332
333 SiSU is a comprehensive future-resilient electronic document management system.
334 Built-in search capabilities allow you to search across multiple documents and
335 highlight matches in an easy-to-follow format. Paragraph numbering system
336 allows you to cite your electronic documents in a consistent manner across
337 multiple file formats. Multiple format outputs allow you to display your
338 documents in plain text, PDF (portrait and horizontal), OpenDocument format,
339 HTML, or e-book reading format (EPUB). Word mapping allows you to easily create
340 word indexes for your documents. Future-resilient flexibility allows you to
341 quickly adapt your documents to newer output formats as needed. All these and
342 many other features are achieved with little or no additional work on your
343 documents - by marking up the documents with a super simplistic markup
344 language, leaving the SiSU engine to handle the heavy-lifting processing.
345
346 Potential users of SiSU include individual authors who want to publish their
347 books or articles electronically to reach a broad audience, web publishers who
348 want to provide multiple channels of access to their electronic documents, or
349 any organizations which centrally manage a medium or large set of electronic
350 documents, especially governmental organizations which may prefer to keep their
351 documents in easily accessible yet non-proprietary formats.
352
353 SiSU is an Open Source project initiated and led by Ralph Amissah
354 <ralph.amissah@gmail.com> and can be contacted via mailing list
355 <http://lists.sisudoc.org/listinfo/sisu> at <sisu@lists.sisudoc.org>. SiSU is
356 licensed under the GNU General Public License.
357
358 *** notes
359
360 For less markup than the most elementary HTML you can have more. SiSU -
361 Structured information, Serialized Units for electronic documents, is an
362 information structuring, transforming, publishing and search framework with the
363 following features:
364
365 (i) markup syntax: (a) simpler than html, (b) mnemonic, influenced by
366 mail/messaging/wiki markup practices, (c) human readable, and easily writable,
367
368 (ii) (a) minimal markup requirement, (b) single file marked up for multiple outputs,
369
370 * documents are prepared in a single UTF-8 file using a minimalistic mnemonic
371 syntax. Typical literature, documents like "War and Peace" require almost no
372 markup, and most of the headers are optional.
373
374 * markup is easily readable/parsed by the human eye, (basic markup is simpler
375 and more sparse than the most basic html), [this may also be converted to XML
376 representations of the same input/source document].
377
378 * markup defines document structure (this may be done once in a header
379 pattern-match description, or for heading levels individually); basic text
380 attributes (bold, italics, underscore, strike-through etc.) as required; and
381 semantic information related to the document (header information, extended
382 beyond the Dublin core and easily further extended as required); the headers
383 may also contain processing instructions.
384
385 (iii) (a) multiple output formats, including amongst others: plaintext (UTF-8);
386 html; (structured) XML; ODF (Open Document text); EPUB; LaTeX; PDF (via LaTeX);
387 SQL type databases (currently PostgreSQL and SQLite). SiSU produces:
388 concordance files; document content certificates (md5 or sha256 digests of
389 headings, paragraphs, images etc.) and html manifests (and sitemaps of
390 content). (b) takes advantage of the strengths implicit in these very different
391 output types, (e.g. PDFs produced using typesetting of LaTeX, databases
392 populated with documents at an individual object/paragraph level, making
393 possible granular search (and related possibilities))
394
395 (iv) outputs share a common numbering system (dubbed "object citation
396 numbering" (ocn)) that is meaningful (to man and machine) across various
397 digital outputs whether paper, screen, or database oriented, (PDF, html, XML,
398 EPUB, sqlite, postgresql), this numbering system can be used to reference
399 content.
400
401 (v) SQL databases are populated at an object level (roughly headings,
402 paragraphs, verse, tables) and become searchable with that degree of
403 granularity, the output information provides the object/paragraph numbers which
404 are relevant across all generated outputs; it is also possible to look at just
405 the matching paragraphs of the documents in the database; [output indexing also
406 work well with search indexing tools like hyperesteier].
407
408 (vi) use of semantic meta-tags in headers permit the addition of semantic
409 information on documents, (the available fields are easily extended)
410
411 (vii) creates organised directory/file structure for (file-system) output,
412 easily mapped with its clearly defined structure, with all text objects
413 numbered, you know in advance where in each document output type, a bit of text
414 will be found (e.g. from an SQL search, you know where to go to find the
415 prepared html output or PDF etc.)... there is more; easy directory management
416 and document associations, the document preparation (sub-)directory may be used
417 to determine output (sub-)directory, the skin used, and the SQL database used,
418
419 (viii) "Concordance file" wordmap, consisting of all the words in a document
420 and their (text/ object) locations within the text, (and the possibility of
421 adding vocabularies),
422
423 (ix) document content certification and comparison considerations: (a) the
424 document and each object within it stamped with an sha256 hash making it
425 possible to easily check or guarantee that the substantive content of a document
426 is unchanged, (b) version control, documents integrated with time based source
427 control system, default RCS or CVS with use of $Id$ tag, which SiSU checks
428
429 (x) SiSU's minimalist markup makes for meaningful "diffing" of the substantive
430 content of markup-files,
431
432 (xi) easily skinnable, document appearance on a project/site wide, directory
433 wide, or document instance level easily controlled/changed,
434
435 (xii) in many cases a regular expression may be used (once in the document
436 header) to define all or part of a documents structure obviating or reducing
437 the need to provide structural markup within the document,
438
439 (xiii) prepared files may be batch process, documents produced are static files
440 so this needs to be done only once but may be repeated for various reasons as
441 desired (updated content, addition of new output formats, updated technology
442 document presentations/representations)
443
444 (xiv) possible to pre-process, which permits: the easy creation of standard
445 form documents, and templates/term-sheets, or; building of composite documents
446 (master documents) from other sisu marked up documents, or marked up parts,
447 i.e. import documents or parts of text into a main document should this be
448 desired
449
450 there is a considerable degree of future-resilience, output representations are
451 "upgradeable", and new document formats may be added.
452
453 (xv) there is a considerable degree of future-resilience, output representations
454 are "upgradeable", and new document formats may be added: (a) modular, (thanks
455 in no small part to Ruby) another output format required, write another
456 module.... (b) easy to update output formats (eg html, XHTML, LaTeX/PDF
457 produced can be updated in program and run against whole document set), (c)
458 easy to add, modify, or have alternative syntax rules for input, should you
459 need to,
460
461 (xvi) scalability, dependent on your file-system (ext3, Reiserfs, XFS,
462 whatever) and on the relational database used (currently Postgresql and
463 SQLite), and your hardware,
464
465 (xvii) only marked up files need be backed up, to secure the larger document
466 set produced,
467
468 (xviii) document management,
469
470 (xix) Syntax highlighting for SiSU markup is available for a number of text
471 editors.
472
473 (xx) remote operations: (a) run SiSU on a remote server, (having prepared sisu
474 markup documents locally or on that server, i.e. this solution where sisu is
475 installed on the remote server, would work whatever type of machine you chose
476 to prepare your markup documents on), (b) generated document outputs may be
477 posted by sisu to remote sites (using rsync/scp) (c) document source (plaintext
478 utf-8) if shared on the net may be identified by its url and processed locally
479 to produce the different document outputs.
480
481 (xxi) document source may be bundled together (automatically) with associated
482 documents (multiple language versions or master document with inclusions) and
483 images and sent as a zip file called a sisupod, if shared on the net these too
484 may be processed locally to produce the desired document outputs, these may be
485 downloaded, shared as email attachments, or processed by running sisu against
486 them, either using a url or the filename.
487
488 (xxii) for basic document generation, the only software dependency is Ruby, and
489 a few standard Unix tools (this covers plaintext, html, XML, ODF, EPUB, LaTeX).
490 To use a database you of course need that, and to convert the LaTeX generated
491 to PDF, a LaTeX processor like tetex or texlive.
492
493 as a developers tool it is flexible and extensible
494
495 ** description
496
497 SiSU ("SiSU information Structuring Universe" or "Structured information,
498 Serialized Units"),1 is a Unix command line oriented framework for document
499 structuring, publishing and search. Featuring minimalistic markup, multiple
500 standard outputs, a common citation system, and granular search. Using markup
501 applied to a document, SiSU can produce plain text, HTML, XHTML, XML,
502 OpenDocument, LaTeX or PDF files, and populate an SQL database with objects2
503 (equating generally to paragraph-sized chunks) so searches may be performed and
504 matches returned with that degree of granularity (e.g. your search criteria is
505 met by these documents and at these locations within each document). Document
506 output formats share a common object numbering system for locating content.
507 This is particularly suitable for "published" works (finalized texts as opposed
508 to works that are frequently changed or updated) for which it provides a fixed
509 means of reference of content. How it works
510
511 SiSU markup is fairly minimalistic, it consists of: a (largely optional)
512 document header, made up of information about the document (such as when it was
513 published, who authored it, and granting what rights) and any processing
514 instructions; and markup within text which is related to document structure and
515 typeface. SiSU must be able to discern the structure of a document, (text
516 headings and their levels in relation to each other), either from information
517 provided in the instruction header or from markup within the text (or from a
518 combination of both). Processing is done against an abstraction of the document
519 comprising of information on the document's structure and its objects,2 which
520 the program serializes (providing the object numbers) and which are assigned
521 hash sum values based on their content. This abstraction of information about
522 document structure, objects, (and hash sums), provides considerable flexibility
523 in representing documents different ways and for different purposes (e.g.
524 search, document layout, publishing, content certification, concordance etc.),
525 and makes it possible to take advantage of some of the strengths of established
526 ways of representing documents, (or indeed to create new ones).
527
528 1. also chosen for the meaning of the Finnish term "sisu".
529
530 2 objects include: headings, paragraphs, verse, tables, images, but not
531 footnotes/endnotes which are numbered separately and tied to the object from
532 which they are referenced.
533
534 More information on SiSU provided at: <www.sisudoc.org/sisu/SiSU>
535
536 SiSU was developed in relation to legal documents, and is strong across a wide
537 variety of texts (law, literature...(humanities, law and part of the social
538 sciences)). SiSU handles images but is not suitable for formulae/ statistics,
539 or for technical writing at this time.
540
541 SiSU has been developed and has been in use for several years. Requirements to
542 cover a wide range of documents within its use domain have been explored.
543
544 <ralph@amissah.com>
545 <ralph.amissah@gmail.com>
546 <sisu@lists.sisudoc.org>
547 <http://lists.sisudoc.org/listinfo/sisu>
548 2010
549 w3 since October 3 1993
550 * Finding SiSU
551 ** source
552 http://git.sisudoc.org/gitweb/
553
554 *** sisu
555 sisu git repo:
556 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary
557
558 **** most recent source without repo history
559 git clone --depth 1 git://git.sisudoc.org/git/code/sisu.git --branch upstream
560 **** full clone
561 git clone git://git.sisudoc.org/git/code/sisu.git --branch upstream
562
563 *** sisu-markup-samples git repo:
564 http://git.sisudoc.org/gitweb/?p=doc/sisu-markup-samples.git;a=summary
565
566 ** mailing list
567 sisu at lists.sisudoc.org
568 http://lists.sisudoc.org/listinfo/sisu
569
570 ** irc oftc #sisu
571
572 ** home pages
573 <http://www.sisudoc.org/>
574 <http://search.sisudoc.org/>
575 <http://www.jus.uio.no/sisu>
576
577 * Installation
578
579 ** where you take responsibility for having the correct dependencies
580
581 Provided you have *Ruby*, *SiSU* can be run.
582
583 SiSU should be run from the directory containing your sisu marked up document
584 set.
585
586 This works fine so long as you already have sisu external dependencies in
587 place. For many operations such as html, epub, odt this is likely to be fine.
588 Note however, that additional external package dependencies, such as texlive
589 (for pdfs), sqlite3 or postgresql (for search) should you desire to use them
590 are not taken care of for you.
591
592 *** run off the source tarball without installation
593
594 RUN OFF SOURCE PACKAGE DIRECTORY TREE (WITHOUT INSTALLING)
595 ..........................................................
596
597 **** 1. Obtain the latest sisu source
598
599 using git:
600
601 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary
602 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=log
603
604 git clone git://git.sisudoc.org/git/code/sisu.git --branch upstream
605 git clone --depth 1 git://git.sisudoc.org/git/code/sisu.git --branch upstream
606
607 or, identify latest available source:
608
609 https://packages.debian.org/sid/sisu
610 http://packages.qa.debian.org/s/sisu.html
611 http://qa.debian.org/developer.php?login=sisu@lists.sisudoc.org
612
613 http://sisudoc.org/sisu/archive/pool/main/s/sisu/
614
615 and download the:
616
617 sisu_5.4.5.orig.tar.xz
618
619 using debian tool dget:
620
621 The dget tool is included within the devscripts package
622 https://packages.debian.org/search?keywords=devscripts
623 to install dget install devscripts:
624
625 apt-get install devscripts
626
627 and then you can get it from Debian:
628 dget -xu http://ftp.fi.debian.org/debian/pool/main/s/sisu/sisu_5.4.5-1.dsc
629
630 or off sisu repos
631 dget -x http://www.jus.uio.no/sisu/archive/pool/main/s/sisu/sisu_5.4.5-1.dsc
632 or
633 dget -x http://sisudoc.org/sisu/archive/pool/main/s/sisu/sisu_5.4.5-1.dsc
634
635 **** 2. Unpack the source
636
637 Provided you have *Ruby*, *SiSU* can be run without installation straight from
638 the source package directory tree.
639
640 Run ruby against the full path to bin/sisu (in the unzipped source package
641 directory tree). SiSU should be run from the directory containing your sisu
642 marked up document set.
643
644 ruby ~/sisu-5.4.5/bin/sisu --html -v document_name.sst
645
646 This works fine so long as you already have sisu external dependencies in
647 place. For many operations such as html, epub, odt this is likely to be fine.
648 Note however, that additional external package dependencies, such as texlive
649 (for pdfs), sqlite3 or postgresql (for search) should you desire to use them
650 are not taken care of for you.
651
652 *** gem install (with rake)
653
654 (i) create the gemspec; (ii) build the gem (from the gemspec); (iii) install
655 the gem
656
657 Provided you have ruby & rake, this can be done with the single command:
658
659 rake gem_create_build_install
660
661 to build and install sisu v5 & sisu v6, alias gemcbi
662
663 separate gems are made/installed for sisu v5 & sisu v6 contained in source.
664
665 to build and install sisu v5, alias gem5cbi:
666
667 rake gem_create_build_install_stable
668
669 to build and install sisu v6, alias gem6cbi:
670
671 rake gem_create_build_install_unstable
672
673 for individual steps (create, build, install) see rake options, rake -T to
674 specify sisu version for sisu installed via gem
675
676 gem search sisu
677
678 sisu _5.4.5_ --version
679
680 sisu _6.0.11_ --version
681
682 to uninstall sisu installed via gem
683
684 sudo gem uninstall --verbose sisu
685
686 For a list of alternative actions you may type:
687
688 rake help
689
690 rake -T
691
692 Rake: <http://rake.rubyforge.org/> <http://rubyforge.org/frs/?group_id=50>
693
694 *** installation with setup.rb
695
696 this is a three step process, in the root directory of the unpacked *SiSU* as
697 root type:
698
699 ruby setup.rb config
700 ruby setup.rb setup
701 #[as root:]
702 ruby setup.rb install
703
704 further information:
705 <http://i.loveruby.net/en/projects/setup/>
706 <http://i.loveruby.net/en/projects/setup/doc/usage.html>
707
708 ruby setup.rb config && ruby setup.rb setup && sudo ruby setup.rb install
709
710 ** Debian install
711
712 *SiSU* is available off the *Debian* archives. It should necessary only to run
713 as root, Using apt-get:
714
715 apt-get update
716
717 apt get install sisu-complete
718
719 (all sisu dependencies should be taken care of)
720
721 If there are newer versions of *SiSU* upstream, they will be available by
722 adding the following to your sources list /etc/apt/sources.list
723
724 #/etc/apt/sources.list
725
726 deb http://www.jus.uio.no/sisu/archive unstable main non-free
727 deb-src http://www.jus.uio.no/sisu/archive unstable main non-free
728
729 The non-free section is for sisu markup samples provided, which contain
730 authored works the substantive text of which cannot be changed, and which as a
731 result do not meet the debian free software guidelines.
732
733 *SiSU* is developed on *Debian*, and packages are available for *Debian* that
734 take care of the dependencies encountered on installation.
735
736 The package is divided into the following components:
737
738 *sisu*, the base code, (the main package on which the others depend), without
739 any dependencies other than ruby (and for convenience the ruby webrick web
740 server), this generates a number of types of output on its own, other
741 packages provide additional functionality, and have their dependencies
742
743 *sisu-complete*, a dummy package that installs the whole of greater sisu as
744 described below, apart from sisu -examples
745
746 *sisu-pdf*, dependencies used by sisu to produce pdf from /LaTeX/ generated
747
748 *sisu-postgresql*, dependencies used by sisu to populate postgresql database
749 (further configuration is necessary)
750
751 *sisu-sqlite*, dependencies used by sisu to populate sqlite database
752
753 *sisu-markup-samples*, sisu markup samples and other miscellany (under
754 *Debian* Free Software Guidelines non-free)
755
756 *SiSU* is available off Debian Unstable and Testing [link:
757 <http://packages.debian.org/cgi-bin/search_packages.pl?searchon=names&subword=1&version=all&release=all&keywords=sisu>]
758 [^1] install it using apt-get, aptitude or alternative *Debian* install tools.
759
760 ** Arch Linux
761
762 * sisu markup :sisu:
763
764 ** markup :markup:
765
766 *** sisu document parts
767 - header
768 - metadata
769 - make instructionS
770 - substantive (& other) content
771 (sisu markup)
772 - endnotes
773 (markup within substantive content)
774 - glossary
775 (section, special markup)
776 - bibliography
777 (section, special markup)
778 - book index
779 (markup attached to substantive content objects)
780
781 |---------------------+-----------------------------------------------------------------------+------------------------+--------|
782 | header | sisu /header markup/ | markup | |
783 | - metadata | | | |
784 | - make instructions | | | |
785 |---------------------+-----------------------------------------------------------------------+------------------------+--------|
786 | substantive content | sisu /content markup/ | markup | output |
787 | | headings (providing document structure), paragraphs, | (regular content) | |
788 | | blocks (code, poem, group, table) | | |
789 |---------------------+-----------------------------------------------------------------------+------------------------+--------|
790 | endnotes | markup within substantive content | markup | output |
791 | | (extracted from sisu /content markup/) | (from regular content) | |
792 |---------------------+-----------------------------------------------------------------------+------------------------+--------|
793 | glossary | identify special section, regular /content markup/ | markup | output |
794 |---------------------+-----------------------------------------------------------------------+------------------------+--------|
795 | bibliography | identify section, special /bibliography markup/ | markup | output |
796 |---------------------+-----------------------------------------------------------------------+------------------------+--------|
797 | book index | extracted from markup attached to related substantive content objects | markup | output |
798 | | (special tags in sisu /content markup/) | (from regular content) | |
799 |---------------------+-----------------------------------------------------------------------+------------------------+--------|
800 | metadata | | (from regular header) | output |
801 |---------------------+-----------------------------------------------------------------------+------------------------+--------|
802
803 *** structure - headings, levels
804 - headings (A-D, 1-3)
805
806 'A~ ' NOTE title level
807
808 'B~ ' NOTE optional
809 'C~ ' NOTE optional
810 'D~ ' NOTE optional
811
812 '1~ ' NOTE chapter level
813 '2~ ' NOTE optional
814 '3~ ' NOTE optional
815
816 * node
817 * parent
818 * children
819
820 *** font face NOTE open & close marks, inline within paragraph
821 * emphasize '*{ ... }*' NOTE configure whether bold italics or underscore, default bold
822 * bold '!{ ... }!'
823 * italics '/{ ... }/'
824 * underscore '_{ ... }_'
825 * superscript '^{ ... }^'
826 * subscript ',{ ... },'
827 * strike '-{ ... }-'
828 * add '+{ ... }+'
829 * monospace '#{ ... }#'
830
831 *** para
832 NOTE paragraph controls are at the start of a paragraph
833 * a para is a block of text separated from others by an empty line
834 * indent
835 * default, all '_1 ' up to '_9 '
836 * first line hang '_1_0 '
837 * first line indent further '_0_1 '
838 * bullet
839 [levels 1-6]
840 '_* '
841 '_1* '
842 '_2* '
843 * numbered list
844 [levels 1-3]
845 '# '
846
847 *** blocks
848 NOTE text blocks that are not to be treated in the way that ordinary paragraphs would be
849 * code
850 * [type of markup if any]
851 * poem
852 * group
853 * alt
854 * tables
855
856 *** notes (footnotes/ endnotes)
857 NOTE inline within paragraph at the location where the note reference is to occur
858 * footnotes '~{ ... }~'
859 * [bibliography] [NB N/A not implemented]
860
861 *** links, linking
862 * links - external, web, url
863 * links - internal
864
865 *** images [multimedia?]
866 * images
867 * [base64 inline] [N/A not implemented]
868
869 *** object numbers
870 * ocn (object numbers)
871 automatically attributed to substantive objects, paragraphs, tables, blocks, verse (unless exclude marker provided)
872
873 *** contents
874 * toc (table of contents)
875 autogenerated from structure/headings information
876 * index (book index)
877 built from hints in newline text following a paragraph and starting with ={} has identifying rules for main and subsidiary text
878
879 *** breaks
880 * line break ' \\ ' inline
881 * page break, column break ' -\\- ' start of line, breaks a column, starts a new column, if using columns, else breaks the page, starts a new page.
882 * page break, page new ' =\\= ' start of line, breaks the page, starts a new page.
883 * horizontal '-..-' start of line, rule page (break) line across page (dividing paragraphs)
884
885 *** book type index
886 built from hints in newline text following a paragraph and starting with ={} has identifying rules for main and subsidiary text
887
888 #% comment
889 * comment
890
891 #% misc
892 * term & definition
893
894 ** syntax highlighting :syntax:highlighting:
895
896 *** vim
897 data/sisu/conf/editor-syntax-etc/vim/
898 data/sisu/conf/editor-syntax-etc/vim/syntax/sisu.vim
899
900 *** emacs
901 data/sisu/conf/editor-syntax-etc/emacs/
902 data/sisu/conf/editor-syntax-etc/emacs/sisu-mode.el
903
904 * todo
905 sisu_todo.org