debian/changelog (7.1.7-1)
[software/sisu] / data / doc / sisu / org / sisu.org
1 #+OPTIONS: ^:nil _:nil
2 #+PRIORITIES: A F E
3 (emacs:evil mode gifts a "vim" of enticing "alternative" powers! ;)
4 (vim, my _editor_ of choice also in the emacs environment :)
5
6 * What is SiSU?
7
8 Multiple output formats with a nod to the strengths of each output format and
9 the ability to cite text easily across output formats.
10
11 ** debian/control desc
12
13 documents - structuring, publishing in multiple formats and search
14 SiSU is a lightweight markup based, command line oriented, document
15 structuring, publishing and search, static content tool for document
16 collections.
17 .
18 With minimal preparation of a plain-text (UTF-8) file, using sisu markup syntax
19 in your text editor of choice, SiSU can generate various document formats, most
20 of which share a common object numbering system for locating content, including
21 plain text, HTML, XHTML, XML, EPUB, OpenDocument text (ODF:ODT), LaTeX, PDF
22 files, and populate an SQL database with objects (roughly paragraph-sized
23 chunks) so searches may be performed and matches returned with that degree of
24 granularity. Think of being able to finely match text in documents, using
25 common object numbers, across different output formats and across languages if
26 you have translations of the same document. For search, your criteria is met
27 by these documents at these locations within each document (equally relevant
28 across different output formats and languages). To be clear (if obvious) page
29 numbers provide none of this functionality. Object numbering is particularly
30 suitable for "published" works (finalized texts as opposed to works that are
31 frequently changed or updated) for which it provides a fixed means of reference
32 of content. Document outputs can also share provided semantic meta-data.
33 .
34 SiSU also provides concordance files, document content certificates and
35 manifests of generated output and the means to make book indexes that make use
36 of its object numbering.
37 .
38 Syntax highlighting and folding (outlining) files are provided for the Vim and
39 Emacs editors.
40 .
41 Dependencies for various features are taken care of in sisu related packages.
42 The package sisu-complete installs the whole of SiSU.
43 .
44 Additional document markup samples are provided in the package
45 sisu-markup-samples which is found in the non-free archive. The licenses for
46 the substantive content of the marked up documents provided is that provided
47 by the author or original publisher.
48 .
49 SiSU uses utf-8 & parses left to right. Currently supported languages:
50 am bg bn br ca cs cy da de el en eo es et eu fi fr ga gl he hi hr hy ia is it
51 ja ko la lo lt lv ml mr nl nn no oc pl pt pt_BR ro ru sa se sk sl sq sr sv ta
52 te th tk tr uk ur us vi zh (see XeTeX polyglossia & cjk)
53 .
54 SiSU works well under po4a translation management, for which an administrative
55 sample Rakefile is provided with sisu_manual under markup-samples.
56
57 ** take two
58
59 SiSU may be regarded as an open access document publishing platform, applicable
60 to a modest but substantial domain of documents (typically law and literature,
61 but also some forms of technical writing), that is tasked to address certain
62 challenges I identified as being of interest to me over the years in open
63 publishing.
64
65 The idea and implementation may be of interest to consider as some of the
66 issues encountered and that it seeks to address are known and common to such
67 endeavors. Amongst them:
68
69 * how do you ensure what you do now can be read in decades?
70 * how do you keep up with new changing and technologies?
71 * do you select a canonical format to represent your documents, if so
72 what?
73 * how do you reliably cite (locate) material in different document
74 representations?
75 * how do you deal with multilingual texts?
76 * what of search?
77 * how are documents contributed to the collection?
78
79 (these questions are selected in to help describe the direction of efforts with
80 regard to sisu).
81
82 My Dabblings in the Domain of Open Publishing
83 ---------------------------------------------
84
85 The system is called SiSU, it is an offshoot of my early efforts at finding out
86 what to make of the web, that started at the University of Tromsø in 1993 (an
87 early law website Ananse/ International Trade Law Project / Lex Mercatoria). I
88 have worked on SiSU continually since 1997 and it has been open source in 2005
89 (under a license called GPL3+), though I remain its developer.
90
91 In working in this field I have had to address some of the common issues.
92
93 So how do you ensure what you do now can be read in decades to come? There are
94 alternative solutions. (i) stick with a widely used and not overly complicated
95 well document open standard, and for that the likes of odf is an excellent
96 choice (ii) alternatively go for the most basic representation of a document
97 that meets your needs, in my case based on UTF-8 text and some markup tags,
98 fairly easily parsable by the human eye and as long as utf8 is in use it will
99 always be possible to extract the information
100
101 How do you keep up with new changing and technologies? Here my solution has
102 been to generate new versions of the substantive content so as to always have
103 the latest document representations available e.g. HTML has changed a lot over
104 the years, different specifications come out for various formats including ODF,
105 electronic readers have become an important viewing alternative, introducing
106 the open reader format EPUB. Output representations are generated from source
107 documents. Different open document file formats can be produced and databases
108 and search engines populated. (The source documents and interpreter are all
109 that are required to re-create site content. Source documents can be made
110 public or retained privately). The strict separation of a simple source
111 document from the output produced, means that with updates to SiSU (the
112 interpreter/processor/generator), outputs can be updated technically as
113 necessary, and new output formats added when needed. Amongst the output formats
114 currently supported are HTML, LaTeX generated Pdfs (A4, letter, other;
115 landscape, portrait), Epub, Open Document Format text. Returning to HTML as an
116 example, it has changed a lot over the years I have worked with it, this way of
117 working has meant it is possible to keep producing current versions of HTML,
118 retaining the original substantive document... and new formats have been added
119 as thought desired. There is no attempt to make output in different document
120 formats/ representations look alike let alone identical. Rather the attempt is
121 to optimize output for the particular document filetype, (there is no reason
122 why an epub document would look or behave like an open document text or that a
123 Pdf would look like HTML output; rather PDF is optimized for paper viewing,
124 HTML for screen etc.) Wherever possible features associated with the
125 particular output type are taken advantage of. This freedom is made possible to
126 a large extent by the answer to the question that follows.
127
128 How do you reliably cite (locate) material in different document
129 representations? The traditional answer has been to have a canonical
130 publication, and resulting fixed page numbers. This was not a viable solution
131 for HTML (which changes from one viewer to another and with selectable font
132 faces & size etc.); nor is it otherwise ideal in an electronic age with the
133 possibility of presenting/interacting with material/documents in so many
134 different ways. Why be so restricted? Here my solution has been "object
135 citation numbering". What the various generated document formats have in
136 common is a shared object numbering system that identifies the location of text
137 and that is available for citation purposes. Object numbers are: sequential
138 numbers assigned to each identified object in a document. Objects are logical
139 units of text (or equivalent parts of a document), usually paragraphs, but also
140 document headings, tables, images, in a poem a verse etc. [In an electronic
141 publishing age are page numbers the best we can come up with? Change font
142 type, font size, page orientation, paper size (sometimes even the viewer) and
143 where are you with them? And paper though a favorite medium of mine is no
144 longer the sole (or sometimes primary) means of interacting with documents/text
145 or of sharing knowledge]
146
147 What object numbers mean (unlike page numbers) is e.g.
148
149 * if you cite text in any format, the resulting output can be reliably located
150 in any other document format type. Cite HTML and the reader can choose to
151 view in Epub or Pdf (the PDFs being an independent output, generated by
152 book publishing software XeTeX/LaTeX).
153
154 * if you do a search, you can be given a result "index" indicating that your
155 search criteria is met by these documents, and at these specific locations
156 within each document, and the "index" is relevant not only for content
157 within the database, but for all document formats.
158
159 * if you have a translated text prepared for sisu, then your citations are
160 relevant across languages e.g. you can specify exactly where in a Chinese
161 document text is to be found.
162
163 * generated document index references & concordance list references etc. are
164 relevant across all output formats.
165
166 What of search? For search, see the implications of object numbers for search
167 mentioned above. The system currently loads an SQL server (Postgresql) with
168 object sized text chunks. It could just as well populate an analytical engine
169 with larger sections or chapters of text for analytical purposes (such as the
170 currently popular Elasticsearch), whilst availing itself also of the concept of
171 objects and object numbers in search results.
172
173 How do you deal with multilingual texts? If you have translated text prepared
174 for sisu, then your citations are relevant across languages. Object numbers
175 also provide an easy way to compare, discuss text (translations) across
176 languages. Text found/cited in one language has the same object number in its
177 translations, a given paragraph will be the same in another language, just
178 change the language code. (documents are prepared in UTF-8, current language
179 restrictions are: through use of LaTeX tools, Polyglosia & CJK (Chinese,
180 Japanese & Korean), and from the fact that sisu parses left to right)
181
182 How are materials prepared for contribution to the collection? (a) The easiest
183 solution if the system allows is for submission in the format in which work is
184 authored, usually a word processor, for which odf may be a decent selection.
185 (b) I have stuck with enhanced plaintext, UTF-8 with minimal markup. Source
186 documents are prepared in UTF-8 text, with a minimalist native markup to
187 indicate the document structure (headings and their relative levels),
188 footnotes, and other document "features". This markup is easily parsable to the
189 human eye, and plays well with version control systems. Documents are prepared
190 in a text editor. Front ends such as markup assistants in a word processor that
191 can save to sisu text format or other tool whist possible do not exist. [(c)
192 yet another form of submission for collaborative work are wikis which have
193 shown their strength in efforts such as Wikipedia.]
194
195 The system has proven to be a good testing ground for ideas and is flexible and
196 extensible. (things that could usefully be done: apart from a front end for
197 simpler user interaction; feed text to an analytical search engine, like
198 Elasticsearch/Lucene; it still needs a bibliography parser (auto-generation of
199 a bibliography from footnotes); and it might be useful to allow rough auto
200 translation documents on the fly by passing text through a translator (such as
201 Google translate)).
202
203 In any event, my resulting technical opinions (in my modest domain of
204 action) may be regarded as encapsulated within SiSU
205 [http://www.sisudoc.org/]
206
207 http://www.sisudoc.org/
208 http://www.jus.uio.no/sisu/
209
210 git clone git://git.sisudoc.org/git/code/sisu.git --branch upstream
211 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary
212 (there may be additional commits in the upstream branch)
213 git clone --depth 1 git://git.sisudoc.org/git/code/sisu.git --branch upstream
214
215 git clone git://git.sisudoc.org/git/doc/sisu-markup-samples.git --branch upstream
216 git clone --depth 1 git://git.sisudoc.org/git/doc/sisu-markup-samples.git --branch upstream
217 Development work is on Linux and the easiest way to install it is through the
218 Debian Linux package as this takes care of optional external dependencies such
219 as XeTeX for PDF output and Postgresql or Sqlite for search.
220
221 ** multiple document formats
222
223 Text can be represented in multiple output formats with different
224 characteristics that are (or may be) regarded as strengths/advantages and
225 therefore preferred in different contexts.
226
227 Given the different strengths and characteristics of various output formats, it
228 makes little sense to try too hard to make different representations of a
229 document look the same. More interesting is have document representations that
230 take advantage of each given outputs strengths. As valuable if not more so is
231 the ability to cite, find, discuss text with ease, across the different output
232 formats.
233
234 For citation across output formats, SiSU uses object citation numbers.
235
236 ** document structure and document objects
237
238 SiSU breaks marked up text into document structure and objects
239
240 Document structure being the document heading hierarchy (having separated out
241 the document header).
242
243 *** What are document objects?
244 An object is an identified meaningful unit of a document, most commonly a
245 paragraph of text, but also for example a table, code block, verse or image.
246
247 SiSU tracks these substantive document units as document objects (and their
248 relationship to the document structure).
249
250 ** object citation numbers
251
252 *** What are object citation numbers?
253
254 An object citation number is a sequential number assigned to a document object.
255
256 In sisu output documents share this common object numbering system (dubbed
257 "object citation numbering" (ocn)) that is meaningful (machine & human readable)
258 across various digital outputs whether paper, screen, or database oriented,
259 (PDF, html, XML, EPUB, sqlite, postgresql), and across multilingual content if
260 prepared appropriately. This numbering system can be used to reference content
261 across output types.
262
263 *** Why might I want object citation numbering?
264
265 The ability to cite and quickly locate text can be invaluable if not essential.
266 (whether for instruction or discussion).
267
268 In this digital & Internet age we have multiple ways to represent documents and
269 multiple document output formats as options with different characteristics,
270 strengths/advantages etc. We need a way to cite text that works and is relevant
271 independent of the document format used.
272
273 I want to discuss (cite) html text how do I do this?
274 how do I refer to / cite / discuss text in html?
275 Issue: html may be viewed online or printed, it is not tied to paper (as
276 e.g. pdf) and prints differently depending on selected font face and font size.
277
278 I want to discuss (cite) text that is available in multiple formats (e.g. pdf,
279 epub, html) without having to worry about the output format that is referred
280 to.
281 How do I refer to / discuss text that is available in more than one format,
282 uncertain of what format is preferred, used or available to my colleagues?
283 e.g. html and epub or pdf have rather different text representations, how do I
284 discuss ...
285
286 I would like to have a book index that is relevant (can be used) across multiple
287 output formats (e.g. pdf, epub, html)
288
289 How do I make a book index (or a concordance file) that works across multiple
290 output formats?
291
292 I would like to have search results indicating where in a document matches are
293 found and I would like it to be relevant across available output formats (e.g.
294 pdf, epub, html)
295 How do I get search results for locations of text within each relevant document
296
297 I would like to be able to discuss a text that has been translated ...
298 how do I find text across languages?
299 Where I have a nicely translated document, how do I point to or discuss with my
300 foreign language counterpart some detail of the text, or, how do I point my
301 foreign language counterpart to the text I would like to bring to his
302 attention.
303
304 ** "Granular" Search
305
306 Of interest is the ease of streaming documents to a relational database, at an
307 object (roughly paragraph) level and the potential for increased precision in
308 the presentation of matches that results thereby. The ability to serialize
309 html, LaTeX, XML, SQL, (whatever) is also inherent in / incidental to the
310 design.
311
312 ** Summary
313 SiSU information Structuring Universe
314 Structured information, Serialized Units <www.sisudoc.org> or
315 <www.jus.uio.no/sisu/> software for electronic texts, document collections,
316 books, digital libraries, and search, with "atomic search" and text positioning
317 system (shared text citation numbering: "ocn")
318 outputs include: plaintext, html, XHTML, XML, ODF (OpenDocument), EPUB, LaTeX,
319 PDF, SQL (PostgreSQL and SQLite)
320
321 ** SiSU Short Description
322
323 SiSU is a comprehensive future-resilient electronic document management system.
324 Built-in search capabilities allow you to search across multiple documents and
325 highlight matches in an easy-to-follow format. Paragraph numbering system
326 allows you to cite your electronic documents in a consistent manner across
327 multiple file formats. Multiple format outputs allow you to display your
328 documents in plain text, PDF (portrait and horizontal), OpenDocument format,
329 HTML, or e-book reading format (EPUB). Word mapping allows you to easily create
330 word indexes for your documents. Future-resilient flexibility allows you to
331 quickly adapt your documents to newer output formats as needed. All these and
332 many other features are achieved with little or no additional work on your
333 documents - by marking up the documents with a super simplistic markup
334 language, leaving the SiSU engine to handle the heavy-lifting processing.
335
336 Potential users of SiSU include individual authors who want to publish their
337 books or articles electronically to reach a broad audience, web publishers who
338 want to provide multiple channels of access to their electronic documents, or
339 any organizations which centrally manage a medium or large set of electronic
340 documents, especially governmental organizations which may prefer to keep their
341 documents in easily accessible yet non-proprietary formats.
342
343 SiSU is an Open Source project initiated and led by Ralph Amissah
344 <ralph.amissah@gmail.com> and can be contacted via mailing list
345 <http://lists.sisudoc.org/listinfo/sisu> at <sisu@lists.sisudoc.org>. SiSU is
346 licensed under the GNU General Public License.
347
348 *** notes
349
350 For less markup than the most elementary HTML you can have more. SiSU -
351 Structured information, Serialized Units for electronic documents, is an
352 information structuring, transforming, publishing and search framework with the
353 following features:
354
355 (i) markup syntax: (a) simpler than html, (b) mnemonic, influenced by
356 mail/messaging/wiki markup practices, (c) human readable, and easily writable,
357
358 (ii) (a) minimal markup requirement, (b) single file marked up for multiple outputs,
359
360 * documents are prepared in a single UTF-8 file using a minimalistic mnemonic
361 syntax. Typical literature, documents like "War and Peace" require almost no
362 markup, and most of the headers are optional.
363
364 * markup is easily readable/parsed by the human eye, (basic markup is simpler
365 and more sparse than the most basic html), [this may also be converted to XML
366 representations of the same input/source document].
367
368 * markup defines document structure (this may be done once in a header
369 pattern-match description, or for heading levels individually); basic text
370 attributes (bold, italics, underscore, strike-through etc.) as required; and
371 semantic information related to the document (header information, extended
372 beyond the Dublin core and easily further extended as required); the headers
373 may also contain processing instructions.
374
375 (iii) (a) multiple output formats, including amongst others: plaintext (UTF-8);
376 html; (structured) XML; ODF (Open Document text); EPUB; LaTeX; PDF (via LaTeX);
377 SQL type databases (currently PostgreSQL and SQLite). SiSU produces:
378 concordance files; document content certificates (md5 or sha256 digests of
379 headings, paragraphs, images etc.) and html manifests (and sitemaps of
380 content). (b) takes advantage of the strengths implicit in these very different
381 output types, (e.g. PDFs produced using typesetting of LaTeX, databases
382 populated with documents at an individual object/paragraph level, making
383 possible granular search (and related possibilities))
384
385 (iv) outputs share a common numbering system (dubbed "object citation
386 numbering" (ocn)) that is meaningful (to man and machine) across various
387 digital outputs whether paper, screen, or database oriented, (PDF, html, XML,
388 EPUB, sqlite, postgresql), this numbering system can be used to reference
389 content.
390
391 (v) SQL databases are populated at an object level (roughly headings,
392 paragraphs, verse, tables) and become searchable with that degree of
393 granularity, the output information provides the object/paragraph numbers which
394 are relevant across all generated outputs; it is also possible to look at just
395 the matching paragraphs of the documents in the database; [output indexing also
396 work well with search indexing tools like hyperesteier].
397
398 (vi) use of semantic meta-tags in headers permit the addition of semantic
399 information on documents, (the available fields are easily extended)
400
401 (vii) creates organised directory/file structure for (file-system) output,
402 easily mapped with its clearly defined structure, with all text objects
403 numbered, you know in advance where in each document output type, a bit of text
404 will be found (e.g. from an SQL search, you know where to go to find the
405 prepared html output or PDF etc.)... there is more; easy directory management
406 and document associations, the document preparation (sub-)directory may be used
407 to determine output (sub-)directory, the skin used, and the SQL database used,
408
409 (viii) "Concordance file" wordmap, consisting of all the words in a document
410 and their (text/ object) locations within the text, (and the possibility of
411 adding vocabularies),
412
413 (ix) document content certification and comparison considerations: (a) the
414 document and each object within it stamped with an sha256 hash making it
415 possible to easily check or guarantee that the substantive content of a document
416 is unchanged, (b) version control, documents integrated with time based source
417 control system, default RCS or CVS with use of $Id$ tag, which SiSU checks
418
419 (x) SiSU's minimalist markup makes for meaningful "diffing" of the substantive
420 content of markup-files,
421
422 (xi) easily skinnable, document appearance on a project/site wide, directory
423 wide, or document instance level easily controlled/changed,
424
425 (xii) in many cases a regular expression may be used (once in the document
426 header) to define all or part of a documents structure obviating or reducing
427 the need to provide structural markup within the document,
428
429 (xiii) prepared files may be batch process, documents produced are static files
430 so this needs to be done only once but may be repeated for various reasons as
431 desired (updated content, addition of new output formats, updated technology
432 document presentations/representations)
433
434 (xiv) possible to pre-process, which permits: the easy creation of standard
435 form documents, and templates/term-sheets, or; building of composite documents
436 (master documents) from other sisu marked up documents, or marked up parts,
437 i.e. import documents or parts of text into a main document should this be
438 desired
439
440 there is a considerable degree of future-resilience, output representations are
441 "upgradeable", and new document formats may be added.
442
443 (xv) there is a considerable degree of future-resilience, output representations
444 are "upgradeable", and new document formats may be added: (a) modular, (thanks
445 in no small part to Ruby) another output format required, write another
446 module.... (b) easy to update output formats (eg html, XHTML, LaTeX/PDF
447 produced can be updated in program and run against whole document set), (c)
448 easy to add, modify, or have alternative syntax rules for input, should you
449 need to,
450
451 (xvi) scalability, dependent on your file-system (ext3, Reiserfs, XFS,
452 whatever) and on the relational database used (currently Postgresql and
453 SQLite), and your hardware,
454
455 (xvii) only marked up files need be backed up, to secure the larger document
456 set produced,
457
458 (xviii) document management,
459
460 (xix) Syntax highlighting for SiSU markup is available for a number of text
461 editors.
462
463 (xx) remote operations: (a) run SiSU on a remote server, (having prepared sisu
464 markup documents locally or on that server, i.e. this solution where sisu is
465 installed on the remote server, would work whatever type of machine you chose
466 to prepare your markup documents on), (b) generated document outputs may be
467 posted by sisu to remote sites (using rsync/scp) (c) document source (plaintext
468 utf-8) if shared on the net may be identified by its url and processed locally
469 to produce the different document outputs.
470
471 (xxi) document source may be bundled together (automatically) with associated
472 documents (multiple language versions or master document with inclusions) and
473 images and sent as a zip file called a sisupod, if shared on the net these too
474 may be processed locally to produce the desired document outputs, these may be
475 downloaded, shared as email attachments, or processed by running sisu against
476 them, either using a url or the filename.
477
478 (xxii) for basic document generation, the only software dependency is Ruby, and
479 a few standard Unix tools (this covers plaintext, html, XML, ODF, EPUB, LaTeX).
480 To use a database you of course need that, and to convert the LaTeX generated
481 to PDF, a LaTeX processor like tetex or texlive.
482
483 as a developers tool it is flexible and extensible
484
485 ** description
486
487 SiSU ("SiSU information Structuring Universe" or "Structured information,
488 Serialized Units"),1 is a Unix command line oriented framework for document
489 structuring, publishing and search. Featuring minimalistic markup, multiple
490 standard outputs, a common citation system, and granular search. Using markup
491 applied to a document, SiSU can produce plain text, HTML, XHTML, XML,
492 OpenDocument, LaTeX or PDF files, and populate an SQL database with objects2
493 (equating generally to paragraph-sized chunks) so searches may be performed and
494 matches returned with that degree of granularity (e.g. your search criteria is
495 met by these documents and at these locations within each document). Document
496 output formats share a common object numbering system for locating content.
497 This is particularly suitable for "published" works (finalized texts as opposed
498 to works that are frequently changed or updated) for which it provides a fixed
499 means of reference of content. How it works
500
501 SiSU markup is fairly minimalistic, it consists of: a (largely optional)
502 document header, made up of information about the document (such as when it was
503 published, who authored it, and granting what rights) and any processing
504 instructions; and markup within text which is related to document structure and
505 typeface. SiSU must be able to discern the structure of a document, (text
506 headings and their levels in relation to each other), either from information
507 provided in the instruction header or from markup within the text (or from a
508 combination of both). Processing is done against an abstraction of the document
509 comprising of information on the document's structure and its objects,2 which
510 the program serializes (providing the object numbers) and which are assigned
511 hash sum values based on their content. This abstraction of information about
512 document structure, objects, (and hash sums), provides considerable flexibility
513 in representing documents different ways and for different purposes (e.g.
514 search, document layout, publishing, content certification, concordance etc.),
515 and makes it possible to take advantage of some of the strengths of established
516 ways of representing documents, (or indeed to create new ones).
517
518 1. also chosen for the meaning of the Finnish term "sisu".
519
520 2 objects include: headings, paragraphs, verse, tables, images, but not
521 footnotes/endnotes which are numbered separately and tied to the object from
522 which they are referenced.
523
524 More information on SiSU provided at: <www.sisudoc.org/sisu/SiSU>
525
526 SiSU was developed in relation to legal documents, and is strong across a wide
527 variety of texts (law, literature...(humanities, law and part of the social
528 sciences)). SiSU handles images but is not suitable for formulae/ statistics,
529 or for technical writing at this time.
530
531 SiSU has been developed and has been in use for several years. Requirements to
532 cover a wide range of documents within its use domain have been explored.
533
534 <ralph@amissah.com>
535 <ralph.amissah@gmail.com>
536 <sisu@lists.sisudoc.org>
537 <http://lists.sisudoc.org/listinfo/sisu>
538 2010
539 w3 since October 3 1993
540 * Finding SiSU
541 ** source
542 http://git.sisudoc.org/gitweb/
543
544 *** sisu
545 sisu git repo:
546 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary
547
548 **** most recent source without repo history
549 git clone --depth 1 git://git.sisudoc.org/git/code/sisu.git --branch upstream
550 **** full clone
551 git clone git://git.sisudoc.org/git/code/sisu.git --branch upstream
552
553 *** sisu-markup-samples git repo:
554 http://git.sisudoc.org/gitweb/?p=doc/sisu-markup-samples.git;a=summary
555
556 ** mailing list
557 sisu at lists.sisudoc.org
558 http://lists.sisudoc.org/listinfo/sisu
559
560 ** irc oftc #sisu
561
562 ** home pages
563 <http://www.sisudoc.org/>
564 <http://search.sisudoc.org/>
565 <http://www.jus.uio.no/sisu>
566
567 * Installation
568
569 ** where you take responsibility for having the correct dependencies
570
571 Provided you have *Ruby*, *SiSU* can be run.
572
573 SiSU should be run from the directory containing your sisu marked up document
574 set.
575
576 This works fine so long as you already have sisu external dependencies in
577 place. For many operations such as html, epub, odt this is likely to be fine.
578 Note however, that additional external package dependencies, such as texlive
579 (for pdfs), sqlite3 or postgresql (for search) should you desire to use them
580 are not taken care of for you.
581
582 *** run off the source tarball without installation
583
584 RUN OFF SOURCE PACKAGE DIRECTORY TREE (WITHOUT INSTALLING)
585 ..........................................................
586
587 **** 1. Obtain the latest sisu source
588
589 using git:
590
591 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary
592 http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=log
593
594 git clone git://git.sisudoc.org/git/code/sisu.git --branch upstream
595 git clone --depth 1 git://git.sisudoc.org/git/code/sisu.git --branch upstream
596
597 or, identify latest available source:
598
599 https://packages.debian.org/sid/sisu
600 http://packages.qa.debian.org/s/sisu.html
601 http://qa.debian.org/developer.php?login=sisu@lists.sisudoc.org
602
603 http://sisudoc.org/sisu/archive/pool/main/s/sisu/
604
605 and download the:
606
607 sisu_5.4.5.orig.tar.xz
608
609 using debian tool dget:
610
611 The dget tool is included within the devscripts package
612 https://packages.debian.org/search?keywords=devscripts
613 to install dget install devscripts:
614
615 apt-get install devscripts
616
617 and then you can get it from Debian:
618 dget -xu http://ftp.fi.debian.org/debian/pool/main/s/sisu/sisu_5.4.5-1.dsc
619
620 or off sisu repos
621 dget -x http://www.jus.uio.no/sisu/archive/pool/main/s/sisu/sisu_5.4.5-1.dsc
622 or
623 dget -x http://sisudoc.org/sisu/archive/pool/main/s/sisu/sisu_5.4.5-1.dsc
624
625 **** 2. Unpack the source
626
627 Provided you have *Ruby*, *SiSU* can be run without installation straight from
628 the source package directory tree.
629
630 Run ruby against the full path to bin/sisu (in the unzipped source package
631 directory tree). SiSU should be run from the directory containing your sisu
632 marked up document set.
633
634 ruby ~/sisu-5.4.5/bin/sisu --html -v document_name.sst
635
636 This works fine so long as you already have sisu external dependencies in
637 place. For many operations such as html, epub, odt this is likely to be fine.
638 Note however, that additional external package dependencies, such as texlive
639 (for pdfs), sqlite3 or postgresql (for search) should you desire to use them
640 are not taken care of for you.
641
642 *** gem install (with rake)
643
644 (i) create the gemspec; (ii) build the gem (from the gemspec); (iii) install
645 the gem
646
647 Provided you have ruby & rake, this can be done with the single command:
648
649 rake gem_create_build_install
650
651 to build and install sisu v5 & sisu v6, alias gemcbi
652
653 separate gems are made/installed for sisu v5 & sisu v6 contained in source.
654
655 to build and install sisu v5, alias gem5cbi:
656
657 rake gem_create_build_install_stable
658
659 to build and install sisu v6, alias gem6cbi:
660
661 rake gem_create_build_install_unstable
662
663 for individual steps (create, build, install) see rake options, rake -T to
664 specify sisu version for sisu installed via gem
665
666 gem search sisu
667
668 sisu _5.4.5_ --version
669
670 sisu _6.0.11_ --version
671
672 to uninstall sisu installed via gem
673
674 sudo gem uninstall --verbose sisu
675
676 For a list of alternative actions you may type:
677
678 rake help
679
680 rake -T
681
682 Rake: <http://rake.rubyforge.org/> <http://rubyforge.org/frs/?group_id=50>
683
684 *** installation with setup.rb
685
686 this is a three step process, in the root directory of the unpacked *SiSU* as
687 root type:
688
689 ruby setup.rb config
690 ruby setup.rb setup
691 #[as root:]
692 ruby setup.rb install
693
694 further information:
695 <http://i.loveruby.net/en/projects/setup/>
696 <http://i.loveruby.net/en/projects/setup/doc/usage.html>
697
698 ruby setup.rb config && ruby setup.rb setup && sudo ruby setup.rb install
699
700 ** Debian install
701
702 *SiSU* is available off the *Debian* archives. It should necessary only to run
703 as root, Using apt-get:
704
705 apt-get update
706
707 apt get install sisu-complete
708
709 (all sisu dependencies should be taken care of)
710
711 If there are newer versions of *SiSU* upstream, they will be available by
712 adding the following to your sources list /etc/apt/sources.list
713
714 #/etc/apt/sources.list
715
716 deb http://www.jus.uio.no/sisu/archive unstable main non-free
717 deb-src http://www.jus.uio.no/sisu/archive unstable main non-free
718
719 The non-free section is for sisu markup samples provided, which contain
720 authored works the substantive text of which cannot be changed, and which as a
721 result do not meet the debian free software guidelines.
722
723 *SiSU* is developed on *Debian*, and packages are available for *Debian* that
724 take care of the dependencies encountered on installation.
725
726 The package is divided into the following components:
727
728 *sisu*, the base code, (the main package on which the others depend), without
729 any dependencies other than ruby (and for convenience the ruby webrick web
730 server), this generates a number of types of output on its own, other
731 packages provide additional functionality, and have their dependencies
732
733 *sisu-complete*, a dummy package that installs the whole of greater sisu as
734 described below, apart from sisu -examples
735
736 *sisu-pdf*, dependencies used by sisu to produce pdf from /LaTeX/ generated
737
738 *sisu-postgresql*, dependencies used by sisu to populate postgresql database
739 (further configuration is necessary)
740
741 *sisu-sqlite*, dependencies used by sisu to populate sqlite database
742
743 *sisu-markup-samples*, sisu markup samples and other miscellany (under
744 *Debian* Free Software Guidelines non-free)
745
746 *SiSU* is available off Debian Unstable and Testing [link:
747 <http://packages.debian.org/cgi-bin/search_packages.pl?searchon=names&subword=1&version=all&release=all&keywords=sisu>]
748 [^1] install it using apt-get, aptitude or alternative *Debian* install tools.
749
750 ** Arch Linux
751
752 * sisu markup :sisu:markup:
753
754 ** sisu markup
755
756 #% structure - headings, levels
757 * headings (A-D, 1-3)
758 * inline
759 'A~ ' NOTE title level
760 'B~ ' NOTE optional
761 'C~ ' NOTE optional
762 'D~ ' NOTE optional
763 '1~ ' NOTE chapter level
764 '2~ ' NOTE optional
765 '3~ ' NOTE optional
766 '4~ ' NOTE optional :consider:
767 * node
768 * parent
769 * children
770
771 #% font face NOTE open & close marks, inline within paragraph
772 * emphasize '*{ ... }*' NOTE configure whether bold italics or underscore, default bold
773 * bold '!{ ... }!'
774 * italics '/{ ... }/'
775 * underscore '_{ ... }_'
776 * superscript '^{ ... }^'
777 * subscript ',{ ... },'
778 * strike '-{ ... }-'
779 * add '+{ ... }+'
780 * monospace '#{ ... }#'
781 #% para NOTE paragraph controls are at the start of a paragraph
782 * a para is a block of text separated from others by an empty line
783 * indent
784 * default, all '_1 ' up to '_9 '
785 * first line hang '_1_0 '
786 * first line indent further '_0_1 '
787 * bullet
788 [levels 1-6]
789 '_* '
790 '_1* '
791 '_2* '
792 * numbered list
793 [levels 1-3]
794 '# '
795
796 #% blocks NOTE text blocks that are not to be treated in the way that ordinary paragraphs would be
797 * code
798 * [type of markup if any]
799 * poem
800 * group
801 * alt
802 * tables
803 #% boxes
804 NOTE grouped text with code block type color & possibly default image, warning, tip, red, blue etc. decide [NB N/A not implemented]
805
806 #% notes NOTE inline within paragraph at the location where the note reference is to occur
807 * footnotes '~{ ... }~'
808 * [bibliography] [NB N/A not implemented]
809
810 #% links, linking
811 * links - external, web, url
812 * links - internal
813
814 #% images [multimedia?]
815 * images
816 * [base64 inline] [N/A not implemented]
817
818 #% object numbers
819 * ocn (object numbers)
820 automatically attributed to substantive objects, paragraphs, tables, blocks, verse (unless exclude marker provided)
821
822 #% contents
823 * toc (table of contents)
824 autogenerated from structure/headings information
825 * index (book index)
826 built from hints in newline text following a paragraph and starting with ={} has identifying rules for main and subsidiary text
827
828 #% breaks
829 * line break ' \\ ' inline
830 * page break, column break ' -\\- ' start of line, breaks a column, starts a new column, if using columns, else breaks the page, starts a new page.
831 * page break, page new ' =\\= ' start of line, breaks the page, starts a new page.
832 * horizontal '-..-' start of line, rule page (break) line across page (dividing paragraphs)
833
834 #% book type index
835
836 #% comment
837 * comment
838
839 #% misc
840 * term & definition
841
842 ** syntax hilighting
843
844 *** vim
845 data/sisu/conf/editor-syntax-etc/vim/
846 data/sisu/conf/editor-syntax-etc/vim/syntax/sisu.vim
847
848 *** emacs
849 data/sisu/conf/editor-syntax-etc/emacs/
850 data/sisu/conf/editor-syntax-etc/emacs/sisu-mode.el
851
852 * todo
853 sisu_todo.org