c&d: small fixes
[software/sisu] / data / doc / sisu / org / sisu.org
1 #+PRIORITIES: A F E
2 (emacs:evil mode gifts a "vim" of enticing "alternative" powers! ;)
3 (vim, my _editor_ of choice also in the emacs environment :)
4
5 * General
6
7 ** what is sisu?
8
9 Multiple output formats with a nod to the strengths of each output format and
10 the ability to cite text easily across output formats.
11
12 *** debian/control desc
13
14 documents - structuring, publishing in multiple formats and search
15 SiSU is a lightweight markup based, command line oriented, document
16 structuring, publishing and search, static content framework for document
17 collections.
18 .
19 With minimal preparation of a plain-text (UTF-8) file, using sisu markup syntax
20 in your text editor of choice, SiSU can generate various document formats, most
21 of which share a common object numbering system for locating content, including
22 plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODF:ODT), LaTeX, PDF
23 files, and populate an SQL database with objects (roughly paragraph-sized
24 chunks) so searches may be performed and matches returned with that degree of
25 granularity. Think of being able to finely match text in documents, using
26 common object numbers, across different output formats and across languages if
27 you have translations of the same document. For search, your criteria is met
28 by these documents at these locations within each document (equally relevant
29 across different output formats and languages). To be clear (if obvious) page
30 numbers provide none of this functionality. Object numbering is particularly
31 suitable for "published" works (finalized texts as opposed to works that are
32 frequently changed or updated) for which it provides a fixed means of reference
33 of content. Document outputs can also share provided semantic meta-data.
34 .
35 SiSU also provides concordance files, document content certificates and
36 manifests of generated output and the means to make book indexes that make use
37 of its object numbering.
38 .
39 Syntax highlighting and folding (outlining) files are provided for the Vim and
40 Emacs editors.
41 .
42 Dependencies for various features are taken care of in sisu related packages.
43 The package sisu-complete installs the whole of SiSU.
44 .
45 Additional document markup samples are provided in the package
46 sisu-markup-samples which is found in the non-free archive. The licenses for
47 the substantive content of the marked up documents provided is that provided
48 by the author or original publisher.
49 .
50 SiSU uses utf-8 & parses left to right. Currently supported languages:
51 am bg bn br ca cs cy da de el en eo es et eu fi fr ga gl he hi hr hy ia is it
52 ja ko la lo lt lv ml mr nl nn no oc pl pt pt_BR ro ru sa se sk sl sq sr sv ta
53 te th tk tr uk ur us vi zh (see XeTeX polyglossia & cjk)
54 .
55 SiSU works well under po4a translation management, for which an administrative
56 sample Rakefile is provided with sisu_manual under markup-samples. j
57
58 *** take two
59
60 SiSU may be regarded as an open access document publishing platform, applicable
61 to a modest but substantial domain of documents (typically law and literature,
62 but also some forms of technical writing), that is tasked to address certain
63 challenges I identified as being of interest to me over the years in open
64 publishing.
65
66 The idea and implementation may be of interest to consider as some of the
67 issues encountered and that it seeks to address are known and common to such
68 endeavors. Amongst them:
69
70 * how do you ensure what you do now can be read in decades?
71 * how do you keep up with new changing and technologies?
72 * do you select a canonical format to represent your documents, if so
73 what?
74 * how do you reliably cite (locate) material in different document
75 representations?
76 * how do you deal with multilingual texts?
77 * what of search?
78 * how are documents contributed to the collection?
79
80 (these questions are selected in to help describe the direction of efforts with
81 regard to sisu).
82
83 My Dabblings in the Domain of Open Publishing
84 ---------------------------------------------
85
86 The system is called SiSU, it is an offshoot of my early efforts at finding out
87 what to make of the web, that started at the University of Tromsø in 1993 (an
88 early law website Ananse/ International Trade Law Project / Lex Mercatoria). I
89 have worked on SiSU continually since 1997 and it has been open source in 2005
90 (under a license called GPL3+), though I remain its developer.
91
92 In working in this field I have had to address some of the common issues.
93
94 So how do you ensure what you do now can be read in decades to come? There are
95 alternative solutions. (i) stick with a widely used and not overly complicated
96 well document open standard, and for that the likes of odf is an excellent
97 choice (ii) alternatively go for the most basic representation of a document
98 that meets your needs, in my case based on UTF-8 text and some markup tags,
99 fairly easily parsable by the human eye and as long as utf8 is in use it will
100 always be possible to extract the information
101
102 How do you keep up with new changing and technologies? Here my solution has
103 been to generate new versions of the substantive content so as to always have
104 the latest document representations available e.g. HTML has changed a lot over
105 the years, different specifications come out for various formats including ODF,
106 electronic readers have become an important viewing alternative, introducing
107 the open reader format EPUB. Output representations are generated from source
108 documents. Different open document file formats can be produced and databases
109 and search engines populated. (The source documents and interpreter are all
110 that are required to re-create site content. Source documents can be made
111 public or retained privately). The strict separation of a simple source
112 document from the output produced, means that with updates to SiSU (the
113 interpreter/processor/generator), outputs can be updated technically as
114 necessary, and new output formats added when needed. Amongst the output formats
115 currently supported are HTML, LaTeX generated Pdfs (A4, letter, other;
116 landscape, portrait), Epub, Open Document Format text. Returning to HTML as an
117 example, it has changed a lot over the years I have worked with it, this way of
118 working has meant it is possible to keep producing current versions of HTML,
119 retaining the original substantive document... and new formats have been added
120 as thought desired. There is no attempt to make output in different document
121 formats/ representations look alike let alone identical. Rather the attempt is
122 to optimize output for the particular document filetype, (there is no reason
123 why an epub document would look or behave like an open document text or that a
124 Pdf would look like HTML output; rather PDF is optimized for paper viewing,
125 HTML for screen etc.) Wherever possible features associated with the
126 particular output type are taken advantage of. This freedom is made possible to
127 a large extent by the answer to the question that follows.
128
129 How do you reliably cite (locate) material in different document
130 representations? The traditional answer has been to have a canonical
131 publication, and resulting fixed page numbers. This was not a viable solution
132 for HTML (which changes from one viewer to another and with selectable font
133 faces & size etc.); nor is it otherwise ideal in an electronic age with the
134 possibility of presenting/interacting with material/documents in so many
135 different ways. Why be so restricted? Here my solution has been "object
136 citation numbering". What the various generated document formats have in
137 common is a shared object numbering system that identifies the location of text
138 and that is available for citation purposes. Object numbers are: sequential
139 numbers assigned to each identified object in a document. Objects are logical
140 units of text (or equivalent parts of a document), usually paragraphs, but also
141 document headings, tables, images, in a poem a verse etc. [In an electronic
142 publishing age are page numbers the best we can come up with? Change font
143 type, font size, page orientation, paper size (sometimes even the viewer) and
144 where are you with them? And paper though a favorite medium of mine is no
145 longer the sole (or sometimes primary) means of interacting with documents/text
146 or of sharing knowledge]
147
148 What object numbers mean (unlike page numbers) is e.g.
149
150 * if you cite text in any format, the resulting output can be reliably located
151 in any other document format type. Cite HTML and the reader can choose to
152 view in Epub or Pdf (the PDFs being an independent output, generated by
153 book publishing software XeTeX/LaTeX).
154
155 * if you do a search, you can be given a result "index" indicating that your
156 search criteria is met by these documents, and at these specific locations
157 within each document, and the "index" is relevant not only for content
158 within the database, but for all document formats.
159
160 * if you have a translated text prepared for sisu, then your citations are
161 relevant across languages e.g. you can specify exactly where in a Chinese
162 document text is to be found.
163
164 * generated document index references & concordance list references etc. are
165 relevant across all output formats.
166
167 What of search? For search, see the implications of object numbers for search
168 mentioned above. The system currently loads an SQL server (Postgresql) with
169 object sized text chunks. It could just as well populate an analytical engine
170 with larger sections or chapters of text for analytical purposes (such as the
171 currently popular Elasticsearch), whilst availing itself also of the concept of
172 objects and object numbers in search results.
173
174 How do you deal with multilingual texts? If you have translated text prepared
175 for sisu, then your citations are relevant across languages. Object numbers
176 also provide an easy way to compare, discuss text (translations) across
177 languages. Text found/cited in one language has the same object number in its
178 translations, a given paragraph will be the same in another language, just
179 change the language code. (documents are prepared in UTF-8, current language
180 restrictions are: through use of LaTeX tools, Polyglosia & CJK (Chinese,
181 Japanese & Korean), and from the fact that sisu parses left to right)
182
183 How are materials prepared for contribution to the collection? (a) The easiest
184 solution if the system allows is for submission in the format in which work is
185 authored, usually a word processor, for which odf may be a decent selection.
186 (b) I have stuck with enhanced plaintext, UTF-8 with minimal markup. Source
187 documents are prepared in UTF-8 text, with a minimalist native markup to
188 indicate the document structure (headings and their relative levels),
189 footnotes, and other document "features". This markup is easily parsable to the
190 human eye, and plays well with version control systems. Documents are prepared
191 in a text editor. Front ends such as markup assistants in a word processor that
192 can save to sisu text format or other tool whist possible do not exist. [(c)
193 yet another form of submission for collaborative work are wikis which have
194 shown their strength in efforts such as Wikipedia.]
195
196 The system has proven to be a good testing ground for ideas and is flexible and
197 extensible. (things that could usefully be done: apart from a front end for
198 simpler user interaction; feed text to an analytical search engine, like
199 Elasticsearch/Lucene; it still needs a bibliography parser (auto-generation of
200 a bibliography from footnotes); and it might be useful to allow rough auto
201 translation documents on the fly by passing text through a translator (such as
202 Google translate)).
203
204 In any event, my resulting technical opinions (in my modest domain of
205 action) may be regarded as encapsulated within SiSU
206 [http://www.sisudoc.org/]
207
208 http://www.sisudoc.org/
209 http://www.jus.uio.no/sisu/
210
211 git clone git://git.sisudoc.org/git/code/sisu.git
212 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary
213 (there are additional commits in the upstream branch)
214 git clone git://git.sisudoc.org/git/doc/sisu-markup-samples.git
215 Development work is on Linux and the easiest way to install it is through the
216 Debian Linux package as this takes care of optional external dependencies such
217 as XeTeX for PDF output and Postgresql or Sqlite for search.
218
219 *** multiple document formats
220
221 Text can be represented in multiple output formats with different
222 characteristics that are (or may be) regarded as strengths/advantages and
223 therefore preferred in different contexts.
224
225 Given the different strengths and characteristics of various output formats, it
226 makes little sense to try too hard to make different representations of a
227 document look the same. More interesting is have document representations that
228 take advantage of each given outputs strengths. As valuable if not more so is
229 the ability to cite, find, discuss text with ease, across the different output
230 formats.
231
232 For citation across output formats, SiSU uses object citation numbers.
233
234 *** document structure and document objects
235
236 SiSU breaks marked up text into document structure and objects
237
238 Document structure being the document heading hierarchy (having separated out
239 the document header).
240
241 **** What are document objects?
242 An object is an identified meaningful unit of a document, most commonly a
243 paragraph of text, but also for example a table, code block, verse or image.
244
245 SiSU tracks these substantive document units as document objects (and their
246 relationship to the document structure).
247
248 *** object citation numbers
249
250 **** What are object citation numbers?
251
252 An object citation number is a sequential number assigned to a document object.
253
254 In sisu output documents share this common object numbering system (dubbed
255 "object citation numbering" (ocn)) that is meaningful (machine & human readable)
256 across various digital outputs whether paper, screen, or database oriented,
257 (PDF, html, XML, EPUB, sqlite, postgresql), and across multilingual content if
258 prepared appropriately. This numbering system can be used to reference content
259 across output types.
260
261 **** Why might I want object citation numbering?
262
263 The ability to cite and quickly locate text can be invaluable if not essential.
264 (whether for instruction or discussion).
265
266 In this digital & Internet age we have multiple ways to represent documents and
267 multiple document output formats as options with different characteristics,
268 strengths/advantages etc. We need a way to cite text that works and is relevant
269 independent of the document format used.
270
271 I want to discuss (cite) html text how do I do this?
272 how do I refer to / cite / discuss text in html?
273 Issue: html may be viewed online or printed, it is not tied to paper (as
274 e.g. pdf) and prints differently depending on selected font face and font size.
275
276 I want to discuss (cite) text that is available in multiple formats (e.g. pdf,
277 epub, html) without having to worry about the output format that is referred
278 to.
279 How do I refer to / discuss text that is available in more than one format,
280 uncertain of what format is preferred, used or available to my colleagues?
281 e.g. html and epub or pdf have rather different text representations, how do I
282 discuss ...
283
284 I would like to have a book index that is relevant (can be used) across multiple
285 output formats (e.g. pdf, epub, html)
286
287 How do I make a book index (or a concordance file) that works across multiple
288 output formats?
289
290 I would like to have search results indicating where in a document matches are
291 found and I would like it to be relevant across available output formats (e.g.
292 pdf, epub, html)
293 How do I get search results for locations of text within each relevant document
294
295 I would like to be able to discuss a text that has been translated ...
296 how do I find text across languages?
297 Where I have a nicely translated document, how do I point to or discuss with my
298 foreign language counterpart some detail of the text, or, how do I point my
299 foreign language counterpart to the text I would like to bring to his
300 attention.
301
302 *** "Granular" Search
303
304 Of interest is the ease of streaming documents to a relational database, at an
305 object (roughly paragraph) level and the potential for increased precision in
306 the presentation of matches that results thereby. The ability to serialize
307 html, LaTeX, XML, SQL, (whatever) is also inherent in / incidental to the
308 design.
309
310 *** Summary
311 SiSU information Structuring Universe
312 Structured information, Serialized Units <www.sisudoc.org> or
313 <www.jus.uio.no/sisu/> software for electronic texts, document collections,
314 books, digital libraries, and search, with "atomic search" and text positioning
315 system (shared text citation numbering: "ocn")
316 outputs include: plaintext, html, XHTML, XML, ODF (OpenDocument), EPUB, LaTeX,
317 PDF, SQL (PostgreSQL and SQLite)
318
319 *** SiSU Short Description
320
321 SiSU is a comprehensive future-resilient electronic document management system.
322 Built-in search capabilities allow you to search across multiple documents and
323 highlight matches in an easy-to-follow format. Paragraph numbering system
324 allows you to cite your electronic documents in a consistent manner across
325 multiple file formats. Multiple format outputs allow you to display your
326 documents in plain text, PDF (portrait and horizontal), OpenDocument format,
327 HTML, or e-book reading format (EPUB). Word mapping allows you to easily create
328 word indexes for your documents. Future-resilient flexibility allows you to
329 quickly adapt your documents to newer output formats as needed. All these and
330 many other features are achieved with little or no additional work on your
331 documents - by marking up the documents with a super simplistic markup
332 language, leaving the SiSU engine to handle the heavy-lifting processing.
333
334 Potential users of SiSU include individual authors who want to publish their
335 books or articles electronically to reach a broad audience, web publishers who
336 want to provide multiple channels of access to their electronic documents, or
337 any organizations which centrally manage a medium or large set of electronic
338 documents, especially governmental organizations which may prefer to keep their
339 documents in easily accessible yet non-proprietary formats.
340
341 SiSU is an Open Source project initiated and led by Ralph Amissah
342 <ralph.amissah@gmail.com> and can be contacted via mailing list
343 <http://lists.sisudoc.org/listinfo/sisu> at <sisu@lists.sisudoc.org>. SiSU is
344 licensed under the GNU General Public License.
345
346 **** notes
347
348 For less markup than the most elementary HTML you can have more. SiSU -
349 Structured information, Serialized Units for electronic documents, is an
350 information structuring, transforming, publishing and search framework with the
351 following features:
352
353 (i) markup syntax: (a) simpler than html, (b) mnemonic, influenced by
354 mail/messaging/wiki markup practices, (c) human readable, and easily writable,
355
356 (ii) (a) minimal markup requirement, (b) single file marked up for multiple outputs,
357
358 * documents are prepared in a single UTF-8 file using a minimalistic mnemonic
359 syntax. Typical literature, documents like "War and Peace" require almost no
360 markup, and most of the headers are optional.
361
362 * markup is easily readable/parsed by the human eye, (basic markup is simpler
363 and more sparse than the most basic html), [this may also be converted to XML
364 representations of the same input/source document].
365
366 * markup defines document structure (this may be done once in a header
367 pattern-match description, or for heading levels individually); basic text
368 attributes (bold, italics, underscore, strike-through etc.) as required; and
369 semantic information related to the document (header information, extended
370 beyond the Dublin core and easily further extended as required); the headers
371 may also contain processing instructions.
372
373 (iii) (a) multiple output formats, including amongst others: plaintext (UTF-8);
374 html; (structured) XML; ODF (Open Document text); EPUB; LaTeX; PDF (via LaTeX);
375 SQL type databases (currently PostgreSQL and SQLite). SiSU produces:
376 concordance files; document content certificates (md5 or sha256 digests of
377 headings, paragraphs, images etc.) and html manifests (and sitemaps of
378 content). (b) takes advantage of the strengths implicit in these very different
379 output types, (e.g. PDFs produced using typesetting of LaTeX, databases
380 populated with documents at an individual object/paragraph level, making
381 possible granular search (and related possibilities))
382
383 (iv) outputs share a common numbering system (dubbed "object citation
384 numbering" (ocn)) that is meaningful (to man and machine) across various
385 digital outputs whether paper, screen, or database oriented, (PDF, html, XML,
386 EPUB, sqlite, postgresql), this numbering system can be used to reference
387 content.
388
389 (v) SQL databases are populated at an object level (roughly headings,
390 paragraphs, verse, tables) and become searchable with that degree of
391 granularity, the output information provides the object/paragraph numbers which
392 are relevant across all generated outputs; it is also possible to look at just
393 the matching paragraphs of the documents in the database; [output indexing also
394 work well with search indexing tools like hyperesteier].
395
396 (vi) use of semantic meta-tags in headers permit the addition of semantic
397 information on documents, (the available fields are easily extended)
398
399 (vii) creates organised directory/file structure for (file-system) output,
400 easily mapped with its clearly defined structure, with all text objects
401 numbered, you know in advance where in each document output type, a bit of text
402 will be found (e.g. from an SQL search, you know where to go to find the
403 prepared html output or PDF etc.)... there is more; easy directory management
404 and document associations, the document preparation (sub-)directory may be used
405 to determine output (sub-)directory, the skin used, and the SQL database used,
406
407 (viii) "Concordance file" wordmap, consisting of all the words in a document
408 and their (text/ object) locations within the text, (and the possibility of
409 adding vocabularies),
410
411 (ix) document content certification and comparison considerations: (a) the
412 document and each object within it stamped with an sha256 hash making it
413 possible to easily check or guarantee that the substantive content of a document
414 is unchanged, (b) version control, documents integrated with time based source
415 control system, default RCS or CVS with use of $Id$ tag, which SiSU checks
416
417 (x) SiSU's minimalist markup makes for meaningful "diffing" of the substantive
418 content of markup-files,
419
420 (xi) easily skinnable, document appearance on a project/site wide, directory
421 wide, or document instance level easily controlled/changed,
422
423 (xii) in many cases a regular expression may be used (once in the document
424 header) to define all or part of a documents structure obviating or reducing
425 the need to provide structural markup within the document,
426
427 (xiii) prepared files may be batch process, documents produced are static files
428 so this needs to be done only once but may be repeated for various reasons as
429 desired (updated content, addition of new output formats, updated technology
430 document presentations/representations)
431
432 (xiv) possible to pre-process, which permits: the easy creation of standard
433 form documents, and templates/term-sheets, or; building of composite documents
434 (master documents) from other sisu marked up documents, or marked up parts,
435 i.e. import documents or parts of text into a main document should this be
436 desired
437
438 there is a considerable degree of future-resilience, output representations are
439 "upgradeable", and new document formats may be added.
440
441 (xv) there is a considerable degree of future-resilience, output representations
442 are "upgradeable", and new document formats may be added: (a) modular, (thanks
443 in no small part to Ruby) another output format required, write another
444 module.... (b) easy to update output formats (eg html, XHTML, LaTeX/PDF
445 produced can be updated in program and run against whole document set), (c)
446 easy to add, modify, or have alternative syntax rules for input, should you
447 need to,
448
449 (xvi) scalability, dependent on your file-system (ext3, Reiserfs, XFS,
450 whatever) and on the relational database used (currently Postgresql and
451 SQLite), and your hardware,
452
453 (xvii) only marked up files need be backed up, to secure the larger document
454 set produced,
455
456 (xviii) document management,
457
458 (xix) Syntax highlighting for SiSU markup is available for a number of text
459 editors.
460
461 (xx) remote operations: (a) run SiSU on a remote server, (having prepared sisu
462 markup documents locally or on that server, i.e. this solution where sisu is
463 installed on the remote server, would work whatever type of machine you chose
464 to prepare your markup documents on), (b) generated document outputs may be
465 posted by sisu to remote sites (using rsync/scp) (c) document source (plaintext
466 utf-8) if shared on the net may be identified by its url and processed locally
467 to produce the different document outputs.
468
469 (xxi) document source may be bundled together (automatically) with associated
470 documents (multiple language versions or master document with inclusions) and
471 images and sent as a zip file called a sisupod, if shared on the net these too
472 may be processed locally to produce the desired document outputs, these may be
473 downloaded, shared as email attachments, or processed by running sisu against
474 them, either using a url or the filename.
475
476 (xxii) for basic document generation, the only software dependency is Ruby, and
477 a few standard Unix tools (this covers plaintext, html, XML, ODF, EPUB, LaTeX).
478 To use a database you of course need that, and to convert the LaTeX generated
479 to PDF, a LaTeX processor like tetex or texlive.
480
481 as a developers tool it is flexible and extensible
482
483 *** description
484
485 SiSU ("SiSU information Structuring Universe" or "Structured information,
486 Serialized Units"),1 is a Unix command line oriented framework for document
487 structuring, publishing and search. Featuring minimalistic markup, multiple
488 standard outputs, a common citation system, and granular search. Using markup
489 applied to a document, SiSU can produce plain text, HTML, XHTML, XML,
490 OpenDocument, LaTeX or PDF files, and populate an SQL database with objects2
491 (equating generally to paragraph-sized chunks) so searches may be performed and
492 matches returned with that degree of granularity (e.g. your search criteria is
493 met by these documents and at these locations within each document). Document
494 output formats share a common object numbering system for locating content.
495 This is particularly suitable for "published" works (finalized texts as opposed
496 to works that are frequently changed or updated) for which it provides a fixed
497 means of reference of content. How it works
498
499 SiSU markup is fairly minimalistic, it consists of: a (largely optional)
500 document header, made up of information about the document (such as when it was
501 published, who authored it, and granting what rights) and any processing
502 instructions; and markup within text which is related to document structure and
503 typeface. SiSU must be able to discern the structure of a document, (text
504 headings and their levels in relation to each other), either from information
505 provided in the instruction header or from markup within the text (or from a
506 combination of both). Processing is done against an abstraction of the document
507 comprising of information on the document's structure and its objects,2 which
508 the program serializes (providing the object numbers) and which are assigned
509 hash sum values based on their content. This abstraction of information about
510 document structure, objects, (and hash sums), provides considerable flexibility
511 in representing documents different ways and for different purposes (e.g.
512 search, document layout, publishing, content certification, concordance etc.),
513 and makes it possible to take advantage of some of the strengths of established
514 ways of representing documents, (or indeed to create new ones).
515
516 1. also chosen for the meaning of the Finnish term "sisu".
517
518 2 objects include: headings, paragraphs, verse, tables, images, but not
519 footnotes/endnotes which are numbered separately and tied to the object from
520 which they are referenced.
521
522 More information on SiSU provided at: <www.sisudoc.org/sisu/SiSU>
523
524 SiSU was developed in relation to legal documents, and is strong across a wide
525 variety of texts (law, literature...(humanities, law and part of the social
526 sciences)). SiSU handles images but is not suitable for formulae/ statistics,
527 or for technical writing at this time.
528
529 SiSU has been developed and has been in use for several years. Requirements to
530 cover a wide range of documents within its use domain have been explored.
531
532 <ralph@amissah.com>
533 <ralph.amissah@gmail.com>
534 <sisu@lists.sisudoc.org>
535 <http://lists.sisudoc.org/listinfo/sisu>
536 2010
537 w3 since October 3 1993
538 ** Finding
539 *** source
540 http://git.sisudoc.org/gitweb/
541
542 sisu git repo:
543 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary
544
545 sisu-markup-samples git repo:
546 http://git.sisudoc.org/gitweb/?p=doc/sisu-markup-samples.git;a=summary
547
548 *** mailing list
549 sisu at lists.sisudoc.org
550 http://lists.sisudoc.org/listinfo/sisu
551
552 ** irc oftc #sisu
553
554 ** home pages
555 <http://www.sisudoc.org/>
556 <http://search.sisudoc.org/>
557 <http://www.jus.uio.no/sisu>
558
559 ** Installing sisu
560
561 *** where you take responsibility for having the correct dependencies
562
563 Provided you have *Ruby*, *SiSU* can be run.
564
565 SiSU should be run from the directory containing your sisu marked up document
566 set.
567
568 This works fine so long as you already have sisu external dependencies in
569 place. For many operations such as html, epub, odt this is likely to be fine.
570 Note however, that additional external package dependencies, such as texlive
571 (for pdfs), sqlite3 or postgresql (for search) should you desire to use them
572 are not taken care of for you.
573
574 **** run off the source tarball without installation
575
576 RUN OFF SOURCE PACKAGE DIRECTORY TREE (WITHOUT INSTALLING)
577 ..........................................................
578
579 ***** 1. Obtain the latest sisu source
580
581 using git:
582
583 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary
584 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=log
585
586 git clone git://git.sisudoc.org/git/code/sisu.git
587
588 or, identify latest available source:
589
590 https://packages.debian.org/sid/sisu
591 http://packages.qa.debian.org/s/sisu.html
592 http://qa.debian.org/developer.php?login=sisu@lists.sisudoc.org
593
594 http://sisudoc.org/sisu/archive/pool/main/s/sisu/
595
596 and download the:
597
598 sisu_5.4.5.orig.tar.xz
599
600 using debian tool dget:
601
602 The dget tool is included within the devscripts package
603 https://packages.debian.org/search?keywords=devscripts
604 to install dget install devscripts:
605
606 apt-get install devscripts
607
608 and then you can get it from Debian:
609 dget -xu http://ftp.fi.debian.org/debian/pool/main/s/sisu/sisu_5.4.5-1.dsc
610
611 or off sisu repos
612 dget -x http://www.jus.uio.no/sisu/archive/pool/main/s/sisu/sisu_5.4.5-1.dsc
613 or
614 dget -x http://sisudoc.org/sisu/archive/pool/main/s/sisu/sisu_5.4.5-1.dsc
615
616 ***** 2. Unpack the source
617
618 Provided you have *Ruby*, *SiSU* can be run without installation straight from
619 the source package directory tree.
620
621 Run ruby against the full path to bin/sisu (in the unzipped source package
622 directory tree). SiSU should be run from the directory containing your sisu
623 marked up document set.
624
625 ruby ~/sisu-5.4.5/bin/sisu --html -v document_name.sst
626
627 This works fine so long as you already have sisu external dependencies in
628 place. For many operations such as html, epub, odt this is likely to be fine.
629 Note however, that additional external package dependencies, such as texlive
630 (for pdfs), sqlite3 or postgresql (for search) should you desire to use them
631 are not taken care of for you.
632
633 **** gem install (with rake)
634
635 (i) create the gemspec; (ii) build the gem (from the gemspec); (iii) install
636 the gem
637
638 Provided you have ruby & rake, this can be done with the single command:
639
640 rake gem_create_build_install
641
642 to build and install sisu v5 & sisu v6, alias gemcbi
643
644 separate gems are made/installed for sisu v5 & sisu v6 contained in source.
645
646 to build and install sisu v5, alias gem5cbi:
647
648 rake gem_create_build_install_stable
649
650 to build and install sisu v6, alias gem6cbi:
651
652 rake gem_create_build_install_unstable
653
654 for individual steps (create, build, install) see rake options, rake -T to
655 specify sisu version for sisu installed via gem
656
657 gem search sisu
658
659 sisu _5.4.5_ --version
660
661 sisu _6.0.11_ --version
662
663 to uninstall sisu installed via gem
664
665 sudo gem uninstall --verbose sisu
666
667 For a list of alternative actions you may type:
668
669 rake help
670
671 rake -T
672
673 Rake: <http://rake.rubyforge.org/> <http://rubyforge.org/frs/?group_id=50>
674
675 **** installation with setup.rb
676
677 this is a three step process, in the root directory of the unpacked *SiSU* as
678 root type:
679
680 ruby setup.rb config
681 ruby setup.rb setup
682 #[as root:]
683 ruby setup.rb install
684
685 further information:
686 <http://i.loveruby.net/en/projects/setup/>
687 <http://i.loveruby.net/en/projects/setup/doc/usage.html>
688
689 ruby setup.rb config && ruby setup.rb setup && sudo ruby setup.rb install
690
691 *** Debian install
692
693 *SiSU* is available off the *Debian* archives. It should necessary only to run
694 as root, Using apt-get:
695
696 apt-get update
697
698 apt get install sisu-complete
699
700 (all sisu dependencies should be taken care of)
701
702 If there are newer versions of *SiSU* upstream, they will be available by
703 adding the following to your sources list /etc/apt/sources.list
704
705 #/etc/apt/sources.list
706
707 deb http://www.jus.uio.no/sisu/archive unstable main non-free
708 deb-src http://www.jus.uio.no/sisu/archive unstable main non-free
709
710 The non-free section is for sisu markup samples provided, which contain
711 authored works the substantive text of which cannot be changed, and which as a
712 result do not meet the debian free software guidelines.
713
714 *SiSU* is developed on *Debian*, and packages are available for *Debian* that
715 take care of the dependencies encountered on installation.
716
717 The package is divided into the following components:
718
719 *sisu*, the base code, (the main package on which the others depend), without
720 any dependencies other than ruby (and for convenience the ruby webrick web
721 server), this generates a number of types of output on its own, other
722 packages provide additional functionality, and have their dependencies
723
724 *sisu-complete*, a dummy package that installs the whole of greater sisu as
725 described below, apart from sisu -examples
726
727 *sisu-pdf*, dependencies used by sisu to produce pdf from /LaTeX/ generated
728
729 *sisu-postgresql*, dependencies used by sisu to populate postgresql database
730 (further configuration is necessary)
731
732 *sisu-sqlite*, dependencies used by sisu to populate sqlite database
733
734 *sisu-markup-samples*, sisu markup samples and other miscellany (under
735 *Debian* Free Software Guidelines non-free)
736
737 *SiSU* is available off Debian Unstable and Testing [link:
738 <http://packages.debian.org/cgi-bin/search_packages.pl?searchon=names&subword=1&version=all&release=all&keywords=sisu>]
739 [^1] install it using apt-get, aptitude or alternative *Debian* install tools.
740
741 ** sisu markup :sisu:markup:
742
743 *** sisu markup
744
745 #% structure - headings, levels
746 * headings (A-D, 1-3)
747 * inline
748 'A~ ' NOTE title level
749 'B~ ' NOTE optional
750 'C~ ' NOTE optional
751 'D~ ' NOTE optional
752 '1~ ' NOTE chapter level
753 '2~ ' NOTE optional
754 '3~ ' NOTE optional
755 '4~ ' NOTE optional :consider:
756 * node
757 * parent
758 * children
759
760 #% font face NOTE open & close marks, inline within paragraph
761 * emphasize '*{ ... }*' NOTE configure whether bold italics or underscore, default bold
762 * bold '!{ ... }!'
763 * italics '/{ ... }/'
764 * underscore '_{ ... }_'
765 * superscript '^{ ... }^'
766 * subscript ',{ ... },'
767 * strike '-{ ... }-'
768 * add '+{ ... }+'
769 * monospace '#{ ... }#'
770 #% para NOTE paragraph controls are at the start of a paragraph
771 * a para is a block of text separated from others by an empty line
772 * indent
773 * default, all '_1 ' up to '_9 '
774 * first line hang '_1_0 '
775 * first line indent further '_0_1 '
776 * bullet
777 [levels 1-6]
778 '_* '
779 '_1* '
780 '_2* '
781 * numbered list
782 [levels 1-3]
783 '# '
784
785 #% blocks NOTE text blocks that are not to be treated in the way that ordinary paragraphs would be
786 * code
787 * [type of markup if any]
788 * poem
789 * group
790 * alt
791 * tables
792 #% boxes
793 NOTE grouped text with code block type color & possibly default image, warning, tip, red, blue etc. decide [NB N/A not implemented]
794
795 #% notes NOTE inline within paragraph at the location where the note reference is to occur
796 * footnotes '~{ ... }~'
797 * [bibliography] [NB N/A not implemented]
798
799 #% links, linking
800 * links - external, web, url
801 * links - internal
802
803 #% images [multimedia?]
804 * images
805 * [base64 inline] [N/A not implemented]
806
807 #% object numbers
808 * ocn (object numbers)
809 automatically attributed to substantive objects, paragraphs, tables, blocks, verse (unless exclude marker provided)
810
811 #% contents
812 * toc (table of contents)
813 autogenerated from structure/headings information
814 * index (book index)
815 built from hints in newline text following a paragraph and starting with ={} has identifying rules for main and subsidiary text
816
817 #% breaks
818 * line break ' \\ ' inline
819 * page break, column break ' -\\- ' start of line, breaks a column, starts a new column, if using columns, else breaks the page, starts a new page.
820 * page break, page new ' =\\= ' start of line, breaks the page, starts a new page.
821 * horizontal '-..-' start of line, rule page (break) line across page (dividing paragraphs)
822
823 #% book type index
824
825 #% comment
826 * comment
827
828 #% misc
829 * term & definition
830
831 *** syntax hilighting
832
833 **** vim
834 data/sisu/conf/editor-syntax-etc/vim/
835 data/sisu/conf/editor-syntax-etc/vim/syntax/sisu.vim
836
837 **** emacs
838 data/sisu/conf/editor-syntax-etc/emacs/
839 data/sisu/conf/editor-syntax-etc/emacs/sisu-mode.el
840 ** todo
841 sisu_todo.org