Content-type: text/html
sisu [-CcFLSVvW]
sisu [operations]
sisu --v3 [operations]
sisu --v2 [operations]
SiSU
is a framework for document structuring, publishing (in multiple open standard
formats) and search, comprising of: (a) a lightweight document structure and
presentation markup syntax; and (b) an accompanying engine for generating
standard document format outputs from documents prepared in sisu markup syntax,
which is able to produce multiple standard outputs (including the population of
sql databases) that (can) share a common numbering system for the citation of
text within a document.
SiSU is developed under an open source, software libre license (GPL3).
Its use case for development is work with medium to large document sets and
cope with evolving document formats/ representation technologies. Documents are
prepared once, and generated as need be to update the technical presentation or
add additional output formats. Various output formats (including search related
output) share a common mechanism for cross-output-format citation.
SiSU
both defines a markup syntax and provides an engine that produces open
standards format outputs from documents prepared with
SiSU
markup. From a single lightly prepared document sisu custom builds several
standard output formats which share a common (text object) numbering system for
citation of content within a document (that also has implications for search).
The sisu engine works with an abstraction of the document's structure and
content from which it is possible to generate different forms of representation
of the document. Significantly
SiSU
markup is more sparse than html and outputs which include html, EPUB, LaTeX,
landscape and portrait pdfs, Open Document Format (ODF), all of which can be
added to and updated.
SiSU
is also able to populate SQL type databases at an object level, which means
that searches can be made with that degree of granularity.
Source document preparation and output generation is a two step process: (i)
document source is prepared, that is, marked up in sisu markup syntax and (ii)
the desired output subsequently generated by running the sisu engine against
document source. Output representations if updated (in the sisu engine) can be
generated by re-running the engine against the prepared source. Using
SiSU markup applied to a document, SiSU custom builds (to take
advantage of the strengths of different ways of representing documents) various
standard open output formats including plain text, HTML, XHTML, XML, EPUB,
OpenDocument, LaTeX or PDF files, and populate an SQL database with objects[^1]
(equating generally to paragraph-sized chunks) so searches may be performed and
matches returned with that degree of granularity ( e.g. your search criteria is
met by these documents and at these locations within each document). Document
output formats share a common object numbering system for locating content.
This is particularly suitable for "published" works (finalized texts
as opposed to works that are frequently changed or updated) for which it
provides a fixed means of reference of content.
In preparing a SiSU document you optionally provide semantic information
related to the document in a document header, and in marking up the substantive
text provide information on the structure of the document, primarily indicating
heading levels and footnotes. You also provide information on basic text
attributes where used. The rest is automatic, sisu from this information
custom builds[^2] the different forms of output requested.
SiSU works with an abstraction of the document based on its structure
which is comprised of its headings[^3] and objects[^4], which enables
SiSU to represent the document in many different ways, and to take
advantage of the strengths of different ways of presenting documents. The
objects are numbered, and these numbers can be used to provide a common basis
for citing material within a document across the different output format types.
This is significant as page numbers are not well suited to the digital age, in
web publishing, changing a browser's default font or using a different browser
can mean that text will appear on a different page; and publishing in different
formats, html, landscape and portrait pdf etc. again page numbers are not
useful to cite text. Dealing with documents at an object level together with
object numbering also has implications for search that SiSU is able to
take advantage of.
One of the challenges of maintaining documents is to keep them in a format that
allows use of them independently of proprietary platforms. Consider issues
related to dealing with legacy proprietary formats today and what guarantee you
have that old proprietary formats will remain (or can be read without
proprietary software/equipment) in 15 years time, or the way the way in which
html has evolved over its relatively short span of existence. SiSU
provides the flexibility of producing documents in multiple non-proprietary
open formats including html, pdf[^5] ODF,[^6] and EPUB.[^7] Whilst SiSU
relies on software, the markup is uncomplicated and minimalistic which
guarantees that future engines can be written to run against it. It is also
easily converted to other formats, which means documents prepared in
SiSU can be migrated to other document formats. Further security is
provided by the fact that the software itself, SiSU is available under
GPL3 a licence that guarantees that the source code will always be open, and
free as in libre, which means that that code base can be used, updated and
further developed as required under the terms of its license. Another
challenge is to keep up with a moving target. SiSU permits new forms of
output to be added as they become important, (Open Document Format text was
added in 2006 when it became an ISO standard for office applications and the
archival of documents), EPUB was introduced in 2009; and allows the technical
representations existing output to be updated (html has evolved and the related
module has been updated repeatedly over the years, presumably when the World
Wide Web Consortium (w3c) finalises html 5 which is currently under
development, the html module will again be updated allowing all existing
documents to be regenerated as html 5).
The document formats are written to the file-system and available for indexing
by independent indexing tools, whether off the web like Google and Yahoo or on
the site like Lucene and Hyperestraier.
SiSU also provides other features such as concordance files and document
content certificates, and the working against an abstraction of document
structure has further possibilities for the research and development of other
document representations, the availability of objects is useful for example for
topic maps and thesauri, together with the flexibility of SiSU offers
great possibilities.
SiSU is primarily for published works, which can take advantage of the
citation system to reliably reference its documents. SiSU works well in
a complementary manner with such collaborative technologies as Wikis, which can
take advantage of and be used to discuss the substance of content prepared in
SiSU.
SiSU is a document publishing system, that from a simple single
marked-up document, produces multiple output formats including: plaintext,
html, xhtml, XML, epub, odt (odf text), LaTeX, pdf, info, and SQL (PostgreSQL
and SQLite), which share text object numbers ("object citation
numbering") and the same document structure information. For more see:
<http://www.jus.uio.no/sisu>
dbi - database interface
-D or --pgsql set for postgresql -d or --sqlite default set for sqlite
-d is modifiable with --db=[database type (pgsql or sqlite)]
The -v is for verbose output.
add -v for verbose mode and -c to toggle color state, e.g. sisu -2vc
[filename or wildcard]
consider -u for appended url info or -v for verbose output
In the data directory run sisu -mh filename or wildcard eg. "sisu -h
cisg.sst" or "sisu -h *.{sst,ssm}" to produce html version of all documents.
Running sisu (alone without any flags, filenames or wildcards) brings up the
interactive help, as does any sisu command that is not recognised. Enter to
escape.
The most up to date information on sisu should be contained in the sisu_manual,
available at:
<http://sisudoc.org/sisu/sisu_manual/>
The manual can be generated from source, found respectively, either within the
SiSU
tarball or installed locally at:
./data/doc/sisu/markup-samples/sisu_manual
/usr/share/doc/sisu/markup-samples/sisu_manual
move to the respective directory and type e.g.:
sisu sisu_manual.ssm
If
SiSU
is installed on your system usual man commands should be available, try:
man sisu
Most
SiSU
man pages are generated directly from sisu documents that are used to prepare
the sisu manual, the sources files for which are located within the
SiSU
tarball at:
./data/doc/sisu/markup-samples/sisu_manual
Once installed, directory equivalent to:
/usr/share/doc/sisu/markup-samples/sisu_manual
Available man pages are converted back to html using man2html:
/usr/share/doc/sisu/html/
./data/doc/sisu/html
An online version of the sisu man page is available here:
* various sisu man pages <http://www.jus.uio.no/sisu/man/> [^8]
* sisu.1 <http://www.jus.uio.no/sisu/man/sisu.1.html> [^9]
This is particularly useful for getting the current sisu setup/environment
information:
sisu --help
sisu --help [subject]
sisu --help commands
sisu --help markup
sisu --help env [for feedback on the way your system is setup with regard to sisu]
sisu -V [environment information, same as above command]
sisu (on its own provides version and some help information)
Apart from real-time information on your current configuration the
SiSU
manual and man pages are likely to contain more up-to-date information than
the sisu interactive help (for example on commands and markup).
NOTE: Running the command sisu (alone without any flags, filenames or
wildcards) brings up the interactive help, as does any sisu command that is not
recognised. Enter to escape.
SiSU
source documents are plaintext (UTF-8)[^11] files
All paragraphs are separated by an empty line.
Markup is comprised of:
* at the top of a document, the document header made up of semantic meta-data
about the document and if desired additional processing instructions (such an
instruction to automatically number headings from a particular level down)
* followed by the prepared substantive text of which the most important single
characteristic is the markup of different heading levels, which define the
primary outline of the document structure. Markup of substantive text includes:
* heading levels defines document structure
* text basic attributes, italics, bold etc.
* grouped text (objects), which are to be treated differently, such as code
blocks or poems.
* footnotes/endnotes
* linked text and images
* paragraph actions, such as indent, bulleted, numbered-lists, etc.
Some interactive help on markup is available, by typing sisu and selecting
markup or sisu --help markup
To check the markup in a file:
sisu --identify [filename].sst
For brief descriptive summary of markup history
sisu --query-history
or if for a particular version:
sisu --query-0.38
Online markup examples are available together with the respective outputs
produced from <http://www.jus.uio.no/sisu/SiSU/examples.html> or from
<http://www.jus.uio.no/sisu/sisu_examples/>
There is of course this document, which provides a cursory overview of sisu
markup and the respective output produced:
<http://www.jus.uio.no/sisu/sisu_markup/>
an alternative presentation of markup syntax:
/usr/share/doc/sisu/on_markup.txt.gz
With
SiSU
installed sample skins may be found in: /usr/share/doc/sisu/markup-samples (or
equivalent directory) and if sisu-markup-samples is installed also under:
/usr/share/doc/sisu/markup-samples-non-free
Headers contain either: semantic meta-data about a document, which can be used
by any output module of the program, or; processing instructions.
Note: the first line of a document may include information on the markup
version used in the form of a comment. Comments are a percentage mark at the
start of a paragraph (and as the first character in a line of text) followed by
a space and the comment:
% this would be a comment
This current document is loaded by a master document that has a header similar
to this one:
% SiSU master 2.0 @title: SiSU :subtitle: Manual @creator: :author: Amissah, Ralph @publisher: [publisher name] @rights: Copyright (C) Ralph Amissah 2007, License GPL 3 @classify: :type: information :topic_register: SiSU:manual;electronic documents:SiSU:manual :subject: ebook, epublishing, electronic book, electronic publishing, electronic document, electronic citation, data structure, citation systems, search % used_by: manual @date: :published: 2008-05-22 :created: 2002-08-28 :issued: 2002-08-28 :available: 2002-08-28 :modified: 2010-03-03 @make: :num_top: 1 :breaks: new=C; break=1 :skin: skin_sisu_manual :bold: /Gnu|Debian|Ruby|SiSU/ :manpage: name=sisu - documents: markup, structuring, publishing in multiple standard formats, and search; synopsis=sisu [-abcDdeFhIiMmNnopqRrSsTtUuVvwXxYyZz0-9] [filename/wildcard ] . sisu [-Ddcv] [instruction] . sisu [-CcFLSVvW] . sisu --v2 [operations] . sisu --v3 [operations] @links: { SiSU Homepage }http://www.sisudoc.org/ { SiSU Manual }http://www.sisudoc.org/sisu/sisu_manual/ { Book Samples & Markup Examples }http://www.jus.uio.no/sisu/SiSU/examples.html { SiSU Download }http://www.jus.uio.no/sisu/SiSU/download.html { SiSU Changelog }http://www.jus.uio.no/sisu/SiSU/changelog.html { SiSU Git repo }http://git.sisudoc.org/?p=code/sisu.git;a=summary { SiSU List Archives }http://lists.sisudoc.org/pipermail/sisu/ { SiSU @ Debian }http://packages.qa.debian.org/s/sisu.html { SiSU Project @ Debian }http://qa.debian.org/developer.php?login=sisu@lists.sisudoc.org { SiSU @ Wikipedia }http://en.wikipedia.org/wiki/SiSU
Header tags appear at the beginning of a document and provide meta information
on the document (such as the Dublin Core), or information as to how the
document as a whole is to be processed. All header instructions take the form
@headername: or on the next line and indented by once space :subheadername: All
Dublin Core meta tags are available
@indentifier:
information or instructions
where the "identifier" is a tag recognised by the program, and the
"information" or "instructions" belong to the tag/indentifier specified
Note: a header where used should only be used once; all headers apart from
@title: are optional; the @structure: header is used to describe document
structure, and can be useful to know.
This is a sample header
% SiSU 2.0 [declared file-type identifier with markup version]
@title: [title text] [this header is the only one that is mandatory] :subtitle: [subtitle if any] :language: English
@creator: :author: [Lastname, First names] :illustrator: [Lastname, First names] :translator: [Lastname, First names] :prepared_by: [Lastname, First names]
@date: :published: [year or yyyy-mm-dd] :created: [year or yyyy-mm-dd] :issued: [year or yyyy-mm-dd] :available: [year or yyyy-mm-dd] :modified: [year or yyyy-mm-dd] :valid: [year or yyyy-mm-dd] :added_to_site: [year or yyyy-mm-dd] :translated: [year or yyyy-mm-dd]
@rights: :copyright: Copyright (C) [Year and Holder] :license: [Use License granted] :text: [Year and Holder] :translation: [Name, Year] :illustrations: [Name, Year]
@classify: :topic_register: SiSU:markup sample:book;book:novel:fantasy :type: :subject: :description: :keywords: :abstract: :isbn: [ISBN] :loc: [Library of Congress classification] :dewey: [Dewey classification] :pg: [Project Gutenberg text number]
@links: { SiSU }http://www.sisudoc.org { FSF }http://www.fsf.org
@make: :skin: skin_name [skins change default settings related to the appearance of documents generated] :num_top: 1 :headings: [text to match for each level (e.g. PART; Chapter; Section; Article; or another: none; BOOK|FIRST|SECOND; none; CHAPTER;) :breaks: new=:C; break=1 :promo: sisu, ruby, sisu_search_libre, open_society :bold: [regular expression of words/phrases to be made bold] :italics: [regular expression of words/phrases to italicise]
@original: :language: [language]
@notes: :comment: :prefix: [prefix is placed just after table of contents]
Heading levels are :A~ ,:B~ ,:C~ ,1~ ,2~ ,3~ ... :A - :C being part / section
headings, followed by other heading levels, and 1 -6 being headings followed
by substantive text or sub-headings. :A~ usually the title :A~? conditional
level 1 heading (used where a stand-alone document may be imported into
another)
:A~ [heading text]
Top level heading [this usually has similar content to the title
@title: ] NOTE: the heading levels described here are in 0.38 notation, see
heading
:B~ [heading text]
Second level heading [this is a heading level divider]
:C~ [heading text]
Third level heading [this is a heading level divider]
1~ [heading text]
Top level heading preceding substantive text of document or sub-heading 2, the
heading level that would normally be marked 1. or 2. or 3. etc. in a document,
and the level on which sisu by default would break html output into named
segments, names are provided automatically if none are given (a number),
otherwise takes the form 1~my_filename_for_this_segment
2~ [heading text]
Second level heading preceding substantive text of document or sub-heading 3,
the heading level that would normally be marked 1.1 or 1.2 or 1.3 or 2.1 etc.
in a document.
3~ [heading text]
Third level heading preceding substantive text of document, that would normally
be marked 1.1.1 or 1.1.2 or 1.2.1 or 2.1.1 etc. in a document
1~filename level 1 heading, % the primary division such as Chapter that is followed by substantive text, % and may be further subdivided (this is the level on which by default html % segments are made)
markup example:
normal text, *{emphasis}*, !{bold text}!, /{italics}/, _{underscore}_, "{citation}", ^{superscript}^, ,{subscript},, +{inserted text}+, -{strikethrough}-, #{monospace}# normal text
*{emphasis}* [note: can be configured to be represented by bold, italics or underscore]
!{bold text}!
_{underscore}_
/{italics}/
"{citation}"
^{superscript}^
,{subscript},
+{inserted text}+
-{strikethrough}-
#{monospace}#
resulting output:
normal text,
emphasis,
bold text,
italics,
underscore,
"citation", ^superscript^, [subscript], ++inserted text++,
--strikethrough--, monospace
normal text
emphasis
[note: can be configured to be represented by bold, italics or underscore]
bold text
italics
underscore
"citation"
^superscript^
[subscript]
++inserted text++
--strikethrough--
monospace
markup example:
ordinary paragraph
_1 indent paragraph one step
_2 indent paragraph two steps
_9 indent paragraph nine steps
resulting output:
ordinary paragraph
indent paragraph one step
indent paragraph two steps
indent paragraph nine steps
markup example:
_* bullet text
_1* bullet text, first indent
_2* bullet text, two step indent
resulting output:
* bullet text
* bullet text, first indent
* bullet text, two step indent
Numbered List (not to be confused with headings/titles, (document structure))
markup example:
# numbered list numbered list 1., 2., 3, etc.
_# numbered list numbered list indented a., b., c., d., etc.
markup example:
_0_1 first line no indent, rest of paragraph indented one step _1_0 first line indented, rest of paragraph no indent in each case level may be 0-9
resulting output:
first line no indent, rest of paragraph indented one step
first line indented, rest of paragraph no indent
in each case level may be 0-9
Footnotes and endnotes are marked up at the location where they would be
indicated within a text. They are automatically numbered. The output type
determines whether footnotes or endnotes will be produced
markup example:
~{ a footnote or endnote }~
resulting output:
[^12]
markup example:
normal text~{ self contained endnote marker & endnote in one }~ continues
resulting output:
normal text[^13] continues
markup example:
normal text ~{* unnumbered asterisk footnote/endnote, insert multiple asterisks if required }~ continues
normal text ~{** another unnumbered asterisk footnote/endnote }~ continues
resulting output:
normal text [^*] continues
normal text [^**] continues
markup example:
normal text ~[* editors notes, numbered asterisk footnote/endnote series ]~ continues
normal text ~[+ editors notes, numbered asterisk footnote/endnote series ]~ continues
resulting output:
normal text [^*3] continues
normal text [^+2] continues
Alternative endnote pair notation for footnotes/endnotes:
% note the endnote marker "~^" normal text~^ continues
^~ endnote text following the paragraph in which the marker occurs
the standard and pair notation cannot be mixed in the same document
urls found within text are marked up automatically. A url within text is
automatically hyperlinked to itself and by default decorated with angled
braces, unless they are contained within a code block (in which case they are
passed as normal text), or escaped by a preceding underscore (in which case the
decoration is omitted).
markup example:
normal text http://www.sisudoc.org/ continues
resulting output:
normal text <http://www.sisudoc.org/> continues
An escaped url without decoration
markup example:
normal text _http://www.sisudoc.org/ continues deb _http://www.jus.uio.no/sisu/archive unstable main non-free
resulting output:
normal text <_http://www.sisudoc.org/> continues
deb <_http://www.jus.uio.no/sisu/archive> unstable main non-free
where a code block is used there is neither decoration nor hyperlinking, code
blocks are discussed later in this document
resulting output:
deb http://www.jus.uio.no/sisu/archive unstable main non-free
deb-src http://www.jus.uio.no/sisu/archive unstable main non-free
To link text or an image to a url the markup is as follows
markup example:
about { SiSU }http://url.org markup
resulting output:
aboutSiSU <http://www.sisudoc.org/> markup
A shortcut notation is available so the url link may also be provided
automatically as a footnote
markup example:
about {~^ SiSU }http://url.org markup
resulting output:
about SiSU <http://www.sisudoc.org/> [^14] markup
Internal document links to a tagged location, including an ocn
markup example:
{ tux.png 64x80 }image
% various url linked images
{tux.png 64x80 "a better way" }http://www.sisudoc.org/
{GnuDebianLinuxRubyBetterWay.png 100x101 "Way Better - with Gnu/Linux, Debian and Ruby" }http://www.sisudoc.org/
{~^ ruby_logo.png "Ruby" }http://www.ruby-lang.org/en/
markup example: { tux.png 64x80 }image % various url linked images {tux.png 64x80 "a better way" }http://www.sisudoc.org/ {GnuDebianLinuxRubyBetterWay.png 100x101 "Way Better - with Gnu/Linux, Debian and Ruby" }http://www.sisudoc.org/ {~^ ruby_logo.png "Ruby" }http://www.ruby-lang.org/en/
resulting output:
[ tux.png ]
tux.png 64x80 "Gnu/Linux - a better way" <http://www.sisudoc.org/>
GnuDebianLinuxRubyBetterWay.png 100x101 "Way Better - with Gnu/Linux, Debian
and Ruby" <http://www.sisudoc.org/>
[ ruby_logo (png missing) ] [^15]
linked url footnote shortcut
{~^ [text to link] }http://url.org % maps to: { [text to link] }http://url.org ~{ http://url.org }~ % which produces hyper-linked text within a document/paragraph, % with an endnote providing the url for the text location used in the hyperlink
text marker *~name
note at a heading level the same is automatically achieved by providing names
to headings 1, 2 and 3 i.e. 2~[name] and 3~[name] or in the case of
auto-heading numbering, without further intervention.
Tables may be prepared in two either of two forms
markup example:
table{ c3; 40; 30; 30; This is a table this would become column two of row one column three of row one is here And here begins another row column two of row two column three of row two, and so on }table
resulting output:
[table omitted, see other document formats]
a second form may be easier to work with in cases where there is not much
information in each column
markup example:
[^17]
!_ Table 3.1: Contributors to Wikipedia, January 2001 - June 2005 {table~h 24; 12; 12; 12; 12; 12; 12;} |Jan. 2001|Jan. 2002|Jan. 2003|Jan. 2004|July 2004|June 2006 Contributors* | 10| 472| 2,188| 9,653| 25,011| 48,721 Active contributors** | 9| 212| 846| 3,228| 8,442| 16,945 Very active contributors*** | 0| 31| 190| 692| 1,639| 3,016 No. of English language articles| 25| 16,000| 101,000| 190,000| 320,000| 630,000 No. of articles, all languages | 25| 19,000| 138,000| 490,000| 862,000|1,600,000 * Contributed at least ten times; ** at least 5 times in last month; *** more than 100 times in last month.
resulting output:
Table 3.1: Contributors to Wikipedia, January 2001 - June 2005
[table omitted, see other document formats]
* Contributed at least ten times; ** at least 5 times in last month; *** more
than 100 times in last month.
basic markup:
poem{ Your poem here }poem Each verse in a poem is given an object number.
markup example:
poem{ `Fury said to a mouse, That he met in the house, "Let us both go to law: I will prosecute YOU. --Come, I'll take no denial; We must have a trial: For really this morning I've nothing to do." Said the mouse to the cur, "Such a trial, dear Sir, With no jury or judge, would be wasting our breath." "I'll be judge, I'll be jury," Said cunning old Fury: "I'll try the whole cause, and condemn you to death."' }poem
resulting output:
`Fury said to a
mouse, That he
met in the
house,
"Let us
both go to
law: I will
prosecute
YOU. --Come,
I'll take no
denial; We
must have a
trial: For
really this
morning I've
nothing
to do."
Said the
mouse to the
cur, "Such
a trial,
dear Sir,
With
no jury
or judge,
would be
wasting
our
breath."
"I'll be
judge, I'll
be jury,"
Said
cunning
old Fury:
"I'll
try the
whole
cause,
and
condemn
you
to
death."'
basic markup:
group{
Your grouped text here
}group
A group is treated as an object and given a single object number.
markup example:
group{ 'Fury said to a mouse, That he met in the house, "Let us both go to law: I will prosecute YOU. --Come, I'll take no denial; We must have a trial: For really this morning I've nothing to do." Said the mouse to the cur, "Such a trial, dear Sir, With no jury or judge, would be wasting our breath." "I'll be judge, I'll be jury," Said cunning old Fury: "I'll try the whole cause, and condemn you to death."' }group
resulting output:
`Fury said to a
mouse, That he
met in the
house,
"Let us
both go to
law: I will
prosecute
YOU. --Come,
I'll take no
denial; We
must have a
trial: For
really this
morning I've
nothing
to do."
Said the
mouse to the
cur, "Such
a trial,
dear Sir,
With
no jury
or judge,
would be
wasting
our
breath."
"I'll be
judge, I'll
be jury,"
Said
cunning
old Fury:
"I'll
try the
whole
cause,
and
condemn
you
to
death."'
Code tags code{ ... }code (used as with other group tags described above) are
used to escape regular sisu markup, and have been used extensively within this
document to provide examples of
SiSU
markup. You cannot however use code tags to escape code tags. They are however
used in the same way as group or poem tags.
A code-block is treated as an object and given a single object number. [an option to number each line of code may be considered at some later time]
use of code tags instead of poem compared, resulting output:
`Fury said to a mouse, That he met in the house, "Let us both go to law: I will prosecute YOU. --Come, I'll take no denial; We must have a trial: For really this morning I've nothing to do." Said the mouse to the cur, "Such a trial, dear Sir, With no jury or judge, would be wasting our breath." "I'll be judge, I'll be jury," Said cunning old Fury: "I'll try the whole cause, and condemn you to death."'
From
SiSU
2.7.7 on you can number codeblocks by placing a hash after the opening code tag
code{# as demonstrated here:
1 | `Fury said to a 2 | mouse, That he 3 | met in the 4 | house, 5 | "Let us 6 | both go to 7 | law: I will 8 | prosecute 9 | YOU. --Come, 10 | I'll take no 11 | denial; We 12 | must have a 13 | trial: For 14 | really this 15 | morning I've 16 | nothing 17 | to do." 18 | Said the 19 | mouse to the 20 | cur, "Such 21 | a trial, 22 | dear Sir, 23 | With 24 | no jury 25 | or judge, 26 | would be 27 | wasting 28 | our 29 | breath." 30 | "I'll be 31 | judge, I'll 32 | be jury," 33 | Said 34 | cunning 35 | old Fury: 36 | "I'll 37 | try the 38 | whole 39 | cause, 40 | and 41 | condemn 42 | you 43 | to 44 | death."'
To break a line within a "paragraph object", two backslashes \\
with a space before and a space or newline after them
may be used.
To break a line within a "paragraph object", two backslashes \\ with a space before and a space or newline after them \\ may be used.
The html break br enclosed in angle brackets (though undocumented) is available
in versions prior to 3.0.13 and 2.9.7 (it remains available for the time being,
but is depreciated).
Page breaks are only relevant and honored in some output formats. A page break
or a new page may be inserted manually using the following markup on a line on
its own:
<:pb>
or
<:pn>
page new <:pn> breaks the page, starts a new page.
page break <:pb> breaks a column, starts a new column, if using columns, else
breaks the page, starts a new page.
To make an index append to paragraph the book index term relates to it, using
an equal sign and curly braces.
Currently two levels are provided, a main term and if needed a sub-term.
Sub-terms are separated from the main term by a colon.
Paragraph containing main term and sub-term. ={Main term:sub-term}
The index syntax starts on a new line, but there should not be an empty line
between paragraph and index markup.
The structure of the resulting index would be:
Main term, 1 sub-term, 1
Several terms may relate to a paragraph, they are separated by a semicolon. If
the term refers to more than one paragraph, indicate the number of paragraphs.
Paragraph containing main term, second term and sub-term. ={first term; second term: sub-term}
The structure of the resulting index would be:
First term, 1, Second term, 1, sub-term, 1
If multiple sub-terms appear under one paragraph, they are separated under the
main term heading from each other by a pipe symbol.
Paragraph containing main term, second term and sub-term. ={Main term:sub-term+1|second sub-term} A paragraph that continues discussion of the first sub-term
The plus one in the example provided indicates the first sub-term spans one
additional paragraph. The logical structure of the resulting index would be:
Main term, 1, sub-term, 1-3, second sub-term, 1,
It is possible to build a document by creating a master document that requires
other documents. The documents required may be complete documents that could be
generated independently, or they could be markup snippets, prepared so as to be
easily available to be placed within another text. If the calling document is a
master document (built from other documents), it should be named with the
suffix
.ssm
Within this document you would provide information on the other documents that
should be included within the text. These may be other documents that would be
processed in a regular way, or markup bits prepared only for inclusion within a
master document
.sst
regular markup file, or
.ssi
(insert/information) A secondary file of the composite document is built prior
to processing with the same prefix and the suffix
._sst
basic markup for importing a document into a master document
<< filename1.sst << filename2.ssi
The form described above should be relied on. Within the Vim editor it results
in the text thus linked becoming hyperlinked to the document it is calling in
which is convenient for editing. Alternative markup for importation of
documents under consideration, and occasionally supported have been.
<< filename.ssi <<{filename.ssi} % using textlink alternatives << |filename.ssi|@|^|
2.0 introduced new headers and is therefore incompatible with 1.0 though otherwise the same with the addition of a couple of tags (i.e. a superset)
0.38 is substantially current for version 1.0
depreciated 0.16 supported, though file names were changed at 0.37
* sisu --query=[sisu version [0.38] or 'history]
provides a short history of changes to
SiSU
markup
SiSU 2.0
(2010-03-06:09/6) same as 1.0, apart from the changing of headers and the
addition of a monospace tag related headers now grouped, e.g.
@title: :subtitle: @creator: :author: :translator: :illustrator: @rights: :text: :illustrations:
see document markup samples, and sisu --help headers
the monospace tag takes the form of a hash '#'
#{ this enclosed text would be monospaced }#
1.0
(2009-12-19:50/6) same as 0.69
0.69
(2008-09-16:37/2) (same as 1.0) and as previous (0.57) with the addition of
book index tags
/^={.+?}$/
e.g. appended to a paragraph, on a new-line (without a blank line in between)
logical structure produced assuming this is the first text "object"
={GNU/Linux community distribution:Debian+2|Fedora|Gentoo;Free Software Foundation+5}
Free Software Foundation, 1-6 GNU/Linux community distribution, 1 Debian, 1-3 Fedora, 1 Gentoo,
0.66
(2008-02-24:07/7) same as previous, adds semantic tags, [experimental and not-used]
/[:;]{.+?}[:;][a-z+]/
0.57
(2007w34/4)
SiSU
0.57 is the same as 0.42 with the introduction of some a shortcut to use the
headers @title and @creator in the first heading [expanded using the contents of the headers @title: and @author:]
:A~ @title by @author
0.52
(2007w14/6) declared document type identifier at start of text/document:
.B SiSU
0.52
or, backward compatible using the comment marker:
%
SiSU
0.38
variations include '
SiSU
(text|master|insert) [version]' and 'sisu-[version]'
0.51
(2007w13/6) skins changed (simplified), markup unchanged
0.42
(2006w27/4) * (asterisk) type endnotes, used e.g. in relation to author
SiSU
0.42 is the same as 0.38 with the introduction of some additional endnote
types,
Introduces some variations on endnotes, in particular the use of the asterisk
~{* for example for describing an author }~ and ~{** for describing a second author }~
* for example for describing an author
** for describing a second author
and
~[* my note ]~ or ~[+ another note ]~
which numerically increments an asterisk and plus respectively
*1 my note +1 another note
0.38
(2006w15/7) introduced new/alternative notation for headers, e.g. @title:
(instead of 0~title), and accompanying document structure markup,
:A,:B,:C,1,2,3 (maps to previous 1,2,3,4,5,6)
SiSU
0.38 introduced alternative experimental header and heading/structure markers,
@headername: and headers :A~ :B~ :C~ 1~ 2~ 3~
as the equivalent of:
0~headername and headers 1~ 2~ 3~ 4~ 5~ 6~
The internal document markup of
SiSU
0.16 remains valid and standard Though note that
SiSU
0.37 introduced a new file naming convention
SiSU
has in effect two sets of levels to be considered, using 0.38 notation A-C
headings/levels, pre-ordinary paragraphs /pre-substantive text, and 1-3
headings/levels, levels which are followed by ordinary text. This may be
conceptualised as levels A,B,C, 1,2,3, and using such letter number notation,
in effect: A must exist, optional B and C may follow in sequence (not strict) 1
must exist, optional 2 and 3 may follow in sequence i.e. there are two
independent heading level sequences A,B,C and 1,2,3 (using the 0.16 standard
notation 1,2,3 and 4,5,6) on the positive side: the 0.38 A,B,C,1,2,3
alternative makes explicit an aspect of structuring documents in
SiSU
that is not otherwise obvious to the newcomer (though it appears more
complicated, is more in your face and likely to be understood fairly quickly);
the substantive text follows levels 1,2,3 and it is 'nice' to do most work in
those levels
0.37
(2006w09/7) introduced new file naming convention, .sst (text), .ssm
(master), .ssi (insert), markup syntax unchanged
SiSU
0.37 introduced new file naming convention, using the file extensions .sst
.ssm and .ssi to replace .s1 .s2 .s3 .r1 .r2 .r3 and .si
this is captured by the following file 'rename' instruction:
rename 's/\.s[123]$/\.sst/' *.s{1,2,3} rename 's/\.r[123]$/\.ssm/' *.r{1,2,3} rename 's/\.si$/\.ssi/' *.si
The internal document markup remains unchanged, from
SiSU
0.16
0.35
(2005w52/3) sisupod, zipped content file introduced
0.23
(2005w36/2) utf-8 for markup file
0.22
(2005w35/3) image dimensions may be omitted if rmagick is available to be
relied upon
0.20.4
(2005w33/4) header 0~links
0.16
(2005w25/2) substantial changes introduced to make markup cleaner, header
0~title type, and headings [1-6]~ introduced, also percentage sign (%) at
start of a text line as comment marker
SiSU
0.16 (0.15 development branch) introduced the use of
the header 0~ and headings/structure 1~ 2~ 3~ 4~ 5~ 6~
in place of the 0.1 header, heading/structure notation
SiSU
0.1 headers and headings structure represented by header 0{~ and
headings/structure 1{ 2{ 3{ 4{~ 5{ 6{
SiSU
has plaintext and binary filetypes, and can process either type of document.
SiSU
documents are prepared as plain-text (utf-8) files with
SiSU
markup. They may make reference to and contain images (for example), which are
stored in the directory beneath them _sisu/image.
SiSU
plaintext markup files are of three types that may be distinguished by the file
extension used: regular text .sst; master documents, composite documents that
incorporate other text, which can be any regular text or text insert; and
inserts the contents of which are like regular text except these are marked
.ssi and are not processed.
SiSU
processing can be done directly against a sisu documents; which may be located
locally or on a remote server for which a url is provided.
SiSU
source markup can be shared with the command:
sisu -s [filename]
The most common form of document in
SiSU,
see the section on
SiSU
markup.
<http://www.sisudoc.org/sisu/sisu_markup>
<http://www.sisudoc.org/sisu/sisu_manual>
Composite documents which incorporate other
SiSU
documents which may be either regular
SiSU
text .sst which may be generated independently, or inserts prepared solely for
the purpose of being incorporated into one or more master documents.
The mechanism by which master files incorporate other documents is described as
one of the headings under under
SiSU
markup in the
SiSU
manual.
Note: Master documents may be prepared in a similar way to regular documents,
and processing will occur normally if a .sst file is renamed .ssm without
requiring any other documents; the .ssm marker flags that the document may
contain other documents.
Note: a secondary file of the composite document is built prior to processing
with the same prefix and the suffix ._sst [^18]
<http://www.sisudoc.org/sisu/sisu_markup>
<http://www.sisudoc.org/sisu/sisu_manual>
Inserts are documents prepared solely for the purpose of being incorporated
into one or more master documents. They resemble regular
SiSU
text files except they are ignored by the
SiSU
processor. Making a file a .ssi file is a quick and convenient way of flagging
that it is not intended that the file should be processed on its own.
A sisupod is a zipped
SiSU
text file or set of
SiSU
text files and any associated images that they contain (this will be extended
to include sound and multimedia-files)
SiSU
plaintext files rely on a recognised directory structure to find contents such
as images associated with documents, but all images for example for all
documents contained in a directory are located in the sub-directory
_sisu/image. Without the ability to create a sisupod it can be inconvenient to
manually identify all other files associated with a document. A sisupod
automatically bundles all associated files with the document that is turned
into a pod.
The structure of the sisupod is such that it may for example contain a single
document and its associated images; a master document and its associated
documents and anything else; or the zipped contents of a whole directory of
prepared
SiSU
documents.
The command to create a sisupod is:
sisu -S [filename]
Alternatively, make a pod of the contents of a whole directory:
sisu -S
SiSU
processing can be done directly against a sisupod; which may be located locally
or on a remote server for which a url is provided.
<http://www.sisudoc.org/sisu/sisu_commands>
<http://www.sisudoc.org/sisu/sisu_manual>
SiSU
offers alternative XML input representations of documents as a proof of
concept, experimental feature. They are however not strictly maintained, and
incomplete and should be handled with care.
convert from sst to simple xml representations (sax, dom and node):
sisu --to-sax [filename/wildcard] or sisu --to-sxs [filename/wildcard]
sisu --to-dom [filename/wildcard] or sisu --to-sxd [filename/wildcard]
sisu --to-node [filename/wildcard] or sisu --to-sxn [filename/wildcard]
convert to sst from any sisu xml representation (sax, dom and node):
sisu --from-xml2sst [filename/wildcard [.sxs.xml,.sxd.xml,sxn.xml]]
or the same:
sisu --from-sxml [filename/wildcard [.sxs.xml,.sxd.xml,sxn.xml]]
To convert from sst to simple xml (sax) representation:
sisu --to-sax [filename/wildcard] or sisu --to-sxs [filename/wildcard]
To convert from any sisu xml representation back to sst
sisu --from-xml2sst [filename/wildcard [.sxs.xml,.sxd.xml,sxn.xml]]
or the same:
sisu --from-sxml [filename/wildcard [.sxs.xml,.sxd.xml,sxn.xml]]
To convert from sst to simple xml (dom) representation:
sisu --to-dom [filename/wildcard] or sisu --to-sxd [filename/wildcard]
To convert from any sisu xml representation back to sst
sisu --from-xml2sst [filename/wildcard [.sxs.xml,.sxd.xml,sxn.xml]]
or the same:
sisu --from-sxml [filename/wildcard [.sxs.xml,.sxd.xml,sxn.xml]]
To convert from sst to simple xml (node) representation:
sisu --to-node [filename/wildcard] or sisu --to-sxn [filename/wildcard]
To convert from any sisu xml representation back to sst
sisu --from-xml2sst [filename/wildcard [.sxs.xml,.sxd.xml,sxn.xml]]
or the same:
sisu --from-sxml [filename/wildcard [.sxs.xml,.sxd.xml,sxn.xml]]
Information on the current configuration of
SiSU
should be available with the help command:
sisu -v
which is an alias for:
sisu --help env
Either of these should be executed from within a directory that contains sisu
markup source documents.
SiSU
configration parameters are adjusted in the configuration file, which can be
used to override the defaults set. This includes such things as which directory
interim processing should be done in and where the generated output should be
placed.
The
SiSU
configuration file is a yaml file, which means indentation is significant.
SiSU
resource configuration is determined by looking at the following files if they
exist:
./_sisu/sisurc.yml
~/.sisu/sisurc.yml
/etc/sisu/sisurc.yml
The search is in the order listed, and the first one found is used.
In the absence of instructions in any of these it falls back to the internal
program defaults.
Configuration determines the output and processing directories and the database
access details.
If
SiSU
is installed a sample sisurc.yml may be found in /etc/sisu/sisurc.yml
Skins modify the default appearance of document output on a document,
directory, or site wide basis. Skins are looked for in the following locations:
./_sisu/skin
~/.sisu/skin
/etc/sisu/skin
Within the skin directory
are the following the default sub-directories for document skins:
./skin/doc
./skin/dir
./skin/site
A skin is placed in the appropriate directory and the file named skin_[name].rb
The skin itself is a ruby file which modifies the default appearances set in
the program.
Documents take on a document skin, if the header of the document specifies a
skin to be used.
@skin: skin_united_nations
A directory may be mapped on to a particular skin, so all documents within that
directory take on a particular appearance. If a skin exists in the skin/dir
with the same name as the document directory, it will automatically be used for
each of the documents in that directory, (except where a document specifies the
use of another skin, in the skin/doc directory).
A personal habit is to place all skins within the doc directory, and symbolic
links as needed from the site, or dir directories as required.
A site skin, modifies the program default skin.
With
SiSU
installed sample skins may be found in:
/etc/sisu/skin/doc and
/usr/share/doc/sisu/markup-samples/samples/_sisu/skin/doc
(or equivalent directory) and if sisu-markup-samples is installed also under:
/usr/share/doc/sisu/markup-samples-non-free/samples/_sisu/skin/doc
Samples of list.yml and promo.yml (which are used to create the right column
list) may be found in:
/usr/share/doc/sisu/markup-samples-non-free/samples/_sisu/skin/yml (or
equivalent directory)
CSS files to modify the appearance of
SiSU
html, XHTML or XML may be placed in the configuration directory: ./_sisu/css;
~/.sisu/css or; /etc/sisu/css and these will be copied to the output
directories with the command sisu -CC.
The basic CSS file for html output is html.css, placing a file of that name in
directory _sisu/css or equivalent will result in the default file of that name
being overwritten.
HTML: html.css
XML DOM: dom.css
XML SAX: sax.css
XHTML: xhtml.css
The default homepage may use homepage.css or html.css
Under consideration is to permit the placement of a CSS file with a different
name in directory _sisu/css directory or equivalent, and change the default CSS
file that is looked for in a skin.[^19]
SiSU
v3 has new options for the source directory tree, and output directory
structures of which there are 3 alternatives.
The document source directory is the directory in which sisu processing
commands are given. It contains the sisu source files (.sst .ssm .ssi), or
(for sisu v3 may contain) subdirectories with language codes which contain the
sisu source files, so all English files would go in subdirectory en/, French in
fr/, Spanish in es/ and so on. ISO 639-1 codes are used (as varied by po4a). A
list of available languages (and possible sub-directory names) can be obtained
with the command "sisu --help lang" The list of languages is limited to
langagues supported by XeTeX polyglosia.
% files stored at this level e.g. sisu_manual.sst or % for sisu v3 may be under language sub-directories % e.g. % configuration file e.g. sisurc.yml % skins in various skin directories doc, dir, site, yml
The output directory root can be set in the sisurc.yml file. Under the root,
subdirectories are made for each directory in which a document set resides. If
you have a directory named poems or conventions, that directory will be created
under the output directory root and the output for all documents contained in
the directory of a particular name will be generated to subdirectories beneath
that directory (poem or conventions). A document will be placed in a
subdirectory of the same name as the document with the filetype identifier
stripped (.sst .ssm)
The last part of a directory path, representing the sub-directory in which a
document set resides, is the directory name that will be used for the output
directory. This has implications for the organisation of document collections
as it could make sense to place documents of a particular subject, or type
within a directory identifying them. This grouping as suggested could be by
subject (sales_law, english_literature); or just as conveniently by some other
classification (X University). The mapping means it is also possible to place
in the same output directory documents that are for organisational purposes
kept separately, for example documents on a given subject of two different
institutions may be kept in two different directories of the same name, under a
directory named after each institution, and these would be output to the same
output directory. Skins could be associated with each institution on a
directory basis and resulting documents will take on the appropriate different
appearance.
There are 3 possibile output structures described as being, by language, by
filetype or by filename, the selection is made in sisurc.yml
#% output_dir_structure_by: language; filetype; or filename output_dir_structure_by: language #(language & filetype, preferred?) #output_dir_structure_by: filetype #output_dir_structure_by: filename #(default, closest to original v1 & v2)
The by language directory structure places output files
The by language directory structure separates output files by language code
(all files of a given language), and within the language directory by filetype.
Its selection is configured in sisurc.yml
output_dir_structure_by: language
|-- en |-- epub |-- hashes |-- html | |-- viral_spiral.david_bollier | |-- manifest | |-- qrcode | |-- odt | |-- pdf | |-- sitemaps | |-- txt | |-- xhtml | `-- xml |-- po4a | `-- live-manual | |-- po | |-- fr | `-- pot `-- _sisu |-- css |-- image |-- image_sys -> ../../_sisu/image_sys `-- xml |-- rnc |-- rng `-- xsd
#by: language subject_dir/en/manifest/filename.html
The by filetype directory structure separates output files by filetype, all
html files in one directory pdfs in another and so on. Filenames are given a
language extension.
Its selection is configured in sisurc.yml
output_dir_structure_by: filetype
|-- epub |-- hashes |-- html |-- viral_spiral.david_bollier |-- manifest |-- qrcode |-- odt |-- pdf |-- po4a |-- live-manual | |-- po | |-- fr | `-- pot |-- _sisu | |-- css | |-- image | |-- image_sys -> ../../_sisu/image_sys | `-- xml | |-- rnc | |-- rng | `-- xsd |-- sitemaps |-- txt |-- xhtml `-- xml
#by: filetype subject_dir/html/filename/manifest.en.html
The by filename directory structure places most output of a particular file
(the different filetypes) in a common directory.
Its selection is configured in sisurc.yml
output_dir_structure_by: filename
|-- epub |-- po4a |-- live-manual | |-- po | |-- fr | `-- pot |-- _sisu | |-- css | |-- image | |-- image_sys -> ../../_sisu/image_sys | `-- xml | |-- rnc | |-- rng | `-- xsd |-- sitemaps |-- src |-- pod `-- viral_spiral.david_bollier
#by: filename subject_dir/filename/manifest.en.html
./subject_name/ % containing sub_directories named after the generated files from which they are made ./subject_name/src % contains shared source files text and binary e.g. sisu_manual.sst and sisu_manual.sst.zip ./subject_name/_sisu % configuration file e.g. sisurc.yml ./subject_name/_sisu/skin % skins in various skin directories doc, dir, site, yml ./subject_name/_sisu/css ./subject_name/_sisu/image % images for documents contained in this directory ./subject_name/_sisu/mm
./sisupod/ % files stored at this level e.g. sisu_manual.sst ./sisupod/_sisu % configuration file e.g. sisurc.yml ./sisupod/_sisu/skin % skins in various skin directories doc, dir, site, yml ./sisupod/_sisu/css ./sisupod/_sisu/image % images for documents contained in this directory ./sisupod/_sisu/mm
SiSU
is about the ability to auto-generate documents. Home pages are regarded as
custom built items, and are not created by
SiSU.
More accurately,
SiSU
has a default home page, which will not be appropriate for use with other
sites, and the means to provide your own home page instead in one of two ways
as part of a site's configuration, these being:
1. through placing your home page and other custom built documents in the
subdirectory _sisu/home/ (this probably being the easier and more convenient
option)
2. through providing what you want as the home page in a skin,
Document sets are contained in directories, usually organised by site or
subject. Each directory can/should have its own homepage. See the section on
directory structure and organisation of content.
Custom built pages, including the home page index.html may be placed within the
configuration directory _sisu/home/ in any of the locations that is searched
for the configuration directory, namely ./_sisu ; ~/_sisu ; /etc/sisu From
there they are copied to the root of the output directory with the command:
sisu -CC
Skins are described in a separate section, but basically are a file written in
the programming language
Ruby
that may be provided to change the defaults that are provided with sisu with
respect to individual documents, a directories contents or for a site.
If you wish to provide a homepage within a skin the skin should be in the
directory _sisu/skin/dir and have the name of the directory for which it is to
become the home page. Documents in the directory commercial_law would have the
homepage modified in skin_commercial law.rb; or the directory poems in
skin_poems.rb
class Home def homepage # place the html content of your homepage here, this will become index.html <<HOME <html> <head></head> <doc> <p>this is my new homepage.</p> </doc> </html> HOME end end
Current markup examples and document output samples are provided at
<http://www.jus.uio.no/sisu/SiSU/examples.html>
For some documents hardly any markup at all is required at all, other than a
header, and an indication that the levels to be taken into account by the
program in generating its output are.
SiSU
output can easily and conveniently be indexed by a number of standalone
indexing tools, such as Lucene, Hyperestraier.
Because the document structure of sites created is clearly defined, and the
text object citation system is available hypothetically at least, for all forms
of output, it is possible to search the sql database, and either read results
from that database, or just as simply map the results to the html output, which
has richer text markup.
In addition to this
SiSU
has the ability to populate a relational sql type database with documents at an
object level, with objects numbers that are shared across different output
types, which make them searchable with that degree of granularity. Basically,
your match criteria is met by these documents and at these locations within
each document, which can be viewed within the database directly or in various
output formats.
SiSU
feeds sisu markupd documents into sql type databases PostgreSQL[^20] and/or
SQLite[^21] database together with information related to document structure.
This is one of the more interesting output forms, as all the structural data of
the documents are retained (though can be ignored by the user of the database
should they so choose). All site texts/documents are (currently) streamed to
four tables:
* one containing semantic (and other) headers, including, title, author,
subject, (the Dublin Core...);
* another the substantive texts by individual "paragraph" (or object) -
along with structural information, each paragraph being identifiable by its
paragraph number (if it has one which almost all of them do), and the
substantive text of each paragraph quite naturally being searchable (both in
formatted and clean text versions for searching); and
* a third containing endnotes cross-referenced back to the paragraph from
which they are referenced (both in formatted and clean text versions for
searching).
* a fourth table with a one to one relation with the headers table contains
full text versions of output, eg. pdf, html, xml, and ascii.
There is of course the possibility to add further structures.
At this level
SiSU
loads a relational database with documents chunked into objects, their smallest
logical structurally constituent parts, as text objects, with their object
citation number and all other structural information needed to construct the
document. Text is stored (at this text object level) with and without
elementary markup tagging, the stripped version being so as to facilitate ease
of searching.
Being able to search a relational database at an object level with the
SiSU
citation system is an effective way of locating content generated by
SiSU.
As individual text objects of a document stored (and indexed) together with
object numbers, and all versions of the document have the same numbering,
complex searches can be tailored to return just the locations of the search
results relevant for all available output formats, with live links to the
precise locations in the database or in html/xml documents; or, the structural
information provided makes it possible to search the full contents of the
database and have headings in which search content appears, or to search only
headings etc. (as the Dublin Core is incorporated it is easy to make use of
that as well).
SiSU
- Structured information, Serialized Units - a document publishing system,
postgresql dependency package
Information related to using postgresql with sisu (and related to the
sisu_postgresql dependency package, which is a dummy package to install
dependencies needed for
SiSU
to populate a postgresql database, this being part of
SiSU
- man sisu).
sisu -D [instruction] [filename/wildcard if required]
sisu -D --pg --[instruction] [filename/wildcard if required]
Mappings to two databases are provided by default, postgresql and sqlite, the
same commands are used within sisu to construct and populate databases however
-d (lowercase) denotes sqlite and -D (uppercase) denotes postgresql,
alternatively --sqlite or --pgsql may be used
-D or --pgsql
may be used interchangeably.
SiSU
- Structured information, Serialized Units - a document publishing system.
Information related to using sqlite with sisu (and related to the sisu_sqlite
dependency package, which is a dummy package to install dependencies needed for
SiSU
to populate an sqlite database, this being part of
SiSU
- man sisu).
sisu -d [instruction] [filename/wildcard if required]
sisu -d --(sqlite|pg) --[instruction] [filename/wildcard if required]
Mappings to two databases are provided by default, postgresql and sqlite, the
same commands are used within sisu to construct and populate databases however
-d (lowercase) denotes sqlite and -D (uppercase) denotes postgresql,
alternatively --sqlite or --pgsql may be used
-d or --sqlite
may be used interchangeably.
Sample search frontend <http://search.sisudoc.org> [^22] A small database and
sample query front-end (search from) that makes use of the citation system,
object citation numbering
to demonstrates functionality.[^23]
SiSU
can provide information on which documents are matched and at what locations
within each document the matches are found. These results are relevant across
all outputs using object citation numbering, which includes html, XML, EPUB,
LaTeX, PDF and indeed the SQL database. You can then refer to one of the other
outputs or in the SQL database expand the text within the matched objects
(paragraphs) in the documents matched.
Note you may set results either for documents matched and object number
locations within each matched document meeting the search criteria; or display
the names of the documents matched along with the objects (paragraphs) that
meet the search criteria.[^24]
The following is feedback on the setup on a machine provided by the help
command:
sisu --help sql
Postgresql user: ralph current db set: SiSU_sisu port: 5432 dbi connect: DBI:Pg:database=SiSU_sisu;port=5432 sqlite current db set: /home/ralph/sisu_www/sisu/sisu_sqlite.db dbi connect DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db
Note on databases built
By default, [unless otherwise specified] databases are built on a directory
basis, from collections of documents within that directory. The name of the
directory you choose to work from is used as the database name, i.e. if you are
working in a directory called /home/ralph/ebook the database SiSU_ebook is
used. [otherwise a manual mapping for the collection is necessary]
The generated search form must be copied manually to the webserver directory as
instructed
SiSU
- Structured information, Serialized Units - a document publishing system
sisu_webrick [port]
or
sisu -W [port]
sisu_webrick is part of
SiSU
(man sisu) sisu_webrick starts
Ruby
SiSU
output is written, providing a list of these directories (assuming
SiSU
is in use and they exist).
The default port for sisu_webrick is set to 8081, this may be modified in the
yaml file: ~/.sisu/sisurc.yml a sample of which is provided as
/etc/sisu/sisurc.yml (or in the equivalent directory on your system).
sisu_webrick, may be started on it's own with the command: sisu_webrick [port]
or using the sisu command with the -W flag: sisu -W [port]
where no port is given and settings are unchanged the default port is 8081
sisu -W [port] starts
Ruby
Webrick web-server, serving
SiSU
output directories, on the port provided, or if no port is provided and the
defaults have not been changed in ~/.sisu/sisurc.yaml then on port 8081
For more information on
SiSU
see: <http://www.sisudoc.org/> or <http://www.jus.uio.no/sisu>
or man sisu
Ralph Amissah <ralph@amissah.com> or <ralph.amissah@gmail.com>
sisu(1)
sisu_vim(7)
SiSU
processing instructions can be run against remote source documents by providing
the url of the documents against which the processing instructions are to be
carried out. The remote
SiSU
documents can either be sisu marked up files in plaintext .sst or .ssm or;
zipped sisu files, sisupod.zip or filename.ssp
.sst / .ssm - sisu text files
SiSU
can be run against source text files on a remote machine, provide the
processing instruction and the url. The source file and any associated parts
(such as images) will be downloaded and generated locally.
sisu -3 http://[provide url to valid .sst or .ssm file]
Any of the source documents in the sisu examples page can be used in this way,
see <http://www.jus.uio.no/sisu/SiSU/examples.html> and use the url to the
.sst for the desired document.
NOTE: to set up a remote machine to serve
SiSU
documents in this way, images should be in the directory relative to the
document source ../_sisu/image
sisupod - zipped sisu files
A sisupod is the zipped content of a sisu marked up text or texts and any other
associated parts to the document such as images.
SiSU
can be run against a sisupod on a (local or) remote machine, provide the
processing instruction and the url, the sisupod will be downloaded and the
documents it contains generated locally.
sisu -3 http://[provide url to valid sisupod.zip or .ssp file]
Any of the source documents in the sisu examples page can be used in this way,
see <http://www.jus.uio.no/sisu/SiSU/examples.html> and use the url for the
desired document.
Once properly configured
SiSU
output can be automatically posted once generated to a designated remote
machine using either rsync, or scp.
In order to do this some ssh authentication agent and keychain or similar tool
will need to be configured. Once that is done the placement on a remote host
can be done seamlessly with the -r (for scp) or -R (for rsync) flag, which
may be used in conjunction with other processing flags, e.g.
sisu -3R sisu_remote.sst
[expand on the setting up of an ssh-agent / keychain]
As
SiSU
is generally operated using the command line, and works within a Unix type
environment,
SiSU
the program and all documents can just as easily be on a remote server, to
which you are logged on using a terminal, and commands and operations would be
pretty much the same as they would be on your local machine.
Installation is currently most straightforward and tested on the
Debian
platform, as there are packages for the installation of sisu and all
requirements for what it does.
SiSU
is available directly from the
Debian
Sid and testing archives (and possibly Ubuntu), assuming your
/etc/apt/sources.list is set accordingly:
aptitude update aptitude install sisu-complete
The following /etc/apt/sources.list setting permits the download of additional
markup samples:
#/etc/apt/sources.list
deb http://ftp.fi.debian.org/debian/ unstable main non-free contrib
deb-src http://ftp.fi.debian.org/debian/ unstable main non-free contrib
The aptitude commands become:
aptitude update
aptitude install sisu-complete sisu-markup-samples
If there are newer versions of
SiSU
upstream of the
Debian
archives, they will be available by adding the following to your
/etc/apt/sources.list
#/etc/apt/sources.list deb http://www.jus.uio.no/sisu/archive unstable main non-free deb-src http://www.jus.uio.no/sisu/archive unstable main non-free
repeat the aptitude commands
aptitude update aptitude install sisu-complete sisu-markup-samples
Note however that it is not necessary to install sisu-complete if not all
components of sisu are to be used. Installing just the package sisu will
provide basic functionality.
RPMs are provided though untested, they are prepared by running alien against
the source package, and against the debs.
They may be downloaded from:
<http://www.jus.uio.no/sisu/SiSU/download.html#rpm>
as root type:
rpm -i [rpm package name]
To install
SiSU
from source check information at:
<http://www.jus.uio.no/sisu/SiSU/download.html#current>
* download the source package
* Unpack the source
Two alternative modes of installation from source are provided, setup.rb (by
Minero Aoki) and a rant(by Stefan Lang) built install file, in either case: the
first steps are the same, download and unpack the source file:
For basic use
SiSU
is only dependent on the programming language in which it is written
Ruby,
and
SiSU
will be able to generate html, EPUB, various XMLs, including ODF (and will also
produce LaTeX). Dependencies required for further actions, though it relies on
the installation of additional dependencies which the source tarball does not
take care of, for things like using a database (postgresql or sqlite)[^25] or
converting LaTeX to pdf.
setup.rb
This is a standard ruby installer, using setup.rb is a three step process. In
the root directory of the unpacked
SiSU
as root type:
ruby setup.rb config ruby setup.rb setup #[and as root:] ruby setup.rb install
further information on setup.rb is available from:
<http://i.loveruby.net/en/projects/setup/>
<http://i.loveruby.net/en/projects/setup/doc/usage.html>
install
The "install" file provided is an installer prepared using "rant". In the root
directory of the unpacked
SiSU
as root type:
ruby install base
or for a more complete installation:
ruby install
or
ruby install base
This makes use of Rant (by Stefan Lang) and the provided Rantfile. It has been
configured to do post installation setup setup configuration and generation of
first test file. Note however, that additional external package dependencies,
such as tetex-extra are not taken care of for you.
Further information on "rant" is available from:
<http://rubyforge.org/frs/?group_id=615>
For a list of alternative actions you may type:
ruby install help
ruby install -T
To check which version of sisu is installed:
sisu -v
Depending on your mode of installation one or a number of markup sample files
may be found either in the directory:
or
change directory to the appropriate one:
cd /usr/share/doc/sisu/markup-samples/samples
Having moved to the directory that contains the markup samples (see
instructions above if necessary), choose a file and run sisu against it
sisu -NhwoabxXyv free_as_in_freedom.rms_and_free_software.sam_williams.sst
this will generate html including a concordance file, opendocument text format,
plaintext, XHTML and various forms of XML, and OpenDocument text
Assuming a LaTeX engine such as tetex or texlive is installed with the required
modules (done automatically on selection of sisu-pdf in
Debian
)
Having moved to the directory that contains the markup samples (see
instructions above if necessary), choose a file and run sisu against it
sisu -pv free_as_in_freedom.rms_and_free_software.sam_williams.sst
sisu -3 free_as_in_freedom.rms_and_free_software.sam_williams.sst
should generate most available output formats: html including a concordance
file, opendocument text format, plaintext, XHTML and various forms of XML, and
OpenDocument text and pdf
Relational databases need some setting up - you must have permission to create
the database and write to it when you run sisu.
Assuming you have the database installed and the requisite permissions
sisu --sqlite --recreate
sisu --sqlite -v --import
free_as_in_freedom.rms_and_free_software.sam_williams.sst
sisu --pgsql --recreate
sisu --pgsql -v --import
free_as_in_freedom.rms_and_free_software.sam_williams.sst
Type:
man sisu
The man pages are also available online, though not always kept as up to date
as within the package itself:
* sisu.1 <http://www.jus.uio.no/sisu/man/sisu.1.html> [^26]
* sisu.8 <http://www.jus.uio.no/sisu/man/sisu.8.html> [^27]
* man directory <http://www.jus.uio.no/sisu/man> [^28]
sisu --help
sisu --help --env
sisu --help --commands
sisu --help --markup
<http://www.jus.uio.no/sisu/SiSU>
A number of markup samples (along with output) are available off:
<http://www.jus.uio.no/sisu/SiSU/examples.html>
Additional markup samples are packaged separately in the file:
***
On
Debian
they are available in non-free[^29] to include them it is necessary to include
non-free in your /etc/apt/source.list or obtain them from the sisu home site.
The directory:
./data/sisu/v2/conf/editor-syntax-etc/
./data/sisu/v3/conf/editor-syntax-etc/
/usr/share/sisu/v2/conf/editor-syntax-etc
/usr/share/sisu/v3/conf/editor-syntax-etc
contains rudimentary sisu syntax highlighting files for:
* (g)vim <http://www.vim.org>
package: sisu-vim
status: largely done
there is a vim syntax highlighting and folds component
* gedit <http://www.gnome.org/projects/gedit>
* gobby <http://gobby.0x539.de/>
file: sisu.lang
place in:
/usr/share/gtksourceview-1.0/language-specs
or
~/.gnome2/gtksourceview-1.0/language-specs
status: very basic syntax highlighting
comments: this editor features display line wrap and is used by Goby!
* nano <http://www.nano-editor.org>
file: nanorc
save as:
~/.nanorc
status: basic syntax highlighting
comments: assumes dark background; no display line-wrap; does line breaks
* diakonos (an editor written in ruby) <http://purepistos.net/diakonos>
file: diakonos.conf
save as:
~/.diakonos/diakonos.conf
includes:
status: basic syntax highlighting
comments: assumes dark background; no display line-wrap
* kate & kwrite <http://kate.kde.org>
file: sisu.xml
place in:
/usr/share/apps/katepart/syntax
or
~/.kde/share/apps/katepart/syntax
[settings::configure kate::{highlighting,filetypes}]
[tools::highlighting::{markup,scripts}:: .B SiSU ]
* nedit <http://www.nedit.org>
file: sisu_nedit.pats
nedit -import sisu_nedit.pats
status: a very clumsy first attempt [not really done]
comments: this editor features display line wrap
* emacs <http://www.gnu.org/software/emacs/emacs.html>
files: sisu-mode.el
to file ~/.emacs add the following 2 lines:
(add-to-list 'load-path
"/usr/share/sisu/v2/conf/editor-syntax-etc/emacs")
(require 'sisu-mode.el)
[not done / not yet included]
* vim & gvim <http://www.vim.org>
files:
package is the most comprehensive sisu syntax highlighting and editor
environment provided to date (is for vim/ gvim, and is separate from the
contents of this directory)
status: this includes: syntax highlighting; vim folds; some error checking
comments: this editor features display line wrap
NOTE:
[ .B SiSU parses files with long lines or line breaks, but, display linewrap (without line-breaks) is a convenient editor feature to have for sisu markup]
SiSU
markup is fairly minimalistic, it consists of: a (largely optional) document
header, made up of information about the document (such as when it was
published, who authored it, and granting what rights) and any processing
instructions; and markup within the substantive text of the document, which is
related to document structure and typeface.
SiSU
must be able to discern the structure of a document, (text headings and their
levels in relation to each other), either from information provided in the
document header or from markup within the text (or from a combination of both).
Processing is done against an abstraction of the document comprising of
information on the document's structure and its objects,[2] which the program
serializes (providing the object numbers) and which are assigned hash sum
values based on their content. This abstraction of information about document
structure, objects, (and hash sums), provides considerable flexibility in
representing documents different ways and for different purposes (e.g. search,
document layout, publishing, content certification, concordance etc.), and
makes it possible to take advantage of some of the strengths of established
ways of representing documents, (or indeed to create new ones).
* sparse/minimal markup (clean utf-8 source texts). Documents are prepared in
a single UTF-8 file using a minimalistic mnemonic syntax. Typical literature,
documents like "War and Peace" require almost no markup, and most of the
headers are optional.
* markup is easily readable/parsable by the human eye, (basic markup is simpler
and more sparse than the most basic HTML), [this may also be converted
to XML representations of the same input/source document].
* markup defines document structure (this may be done once in a header
pattern-match description, or for heading levels individually); basic text
attributes (bold, italics, underscore, strike-through etc.) as required; and
semantic information related to the document (header information, extended
beyond the Dublin core and easily further extended as required); the headers
may also contain processing instructions.
SiSU
markup is primarily an abstraction of document structure and document metadata
to permit taking advantage of the basic strengths of existing alternative
practical standard ways of representing documents [be that browser viewing, paper publication, sql search etc.] (html, epub, xml, odf,
latex, pdf, sql)
* for output produces reasonably elegant output of established industry and
institutionally accepted open standard formats.[3] takes advantage of the
different strengths of various standard formats for representing documents,
amongst the output formats currently supported are:
* html - both as a single scrollable text and a segmented document
* xhtml
* epub
* XML - both in sax and dom style xml structures for further development as
required
* ODF - open document format, the iso standard for document storage
* LaTeX - used to generate pdf
* pdf (via LaTeX)
* sql - population of an sql database, (at the same object level that is
used to cite text within a document)
Also produces: concordance files; document content certificates (md5 or sha256
digests of headings, paragraphs, images etc.) and html manifests (and sitemaps
of content). (b) takes advantage of the strengths implicit in these very
different output types, (e.g. PDFs produced using typesetting of LaTeX,
databases populated with documents at an individual object/paragraph level,
making possible granular search (and related possibilities))
* ensuring content can be cited in a meaningful way regardless of selected
output format. Online publishing (and publishing in multiple document formats)
lacks a useful way of citing text internally within documents (important to
academics generally and to lawyers) as page numbers are meaningless across
browsers and formats. sisu seeks to provide a common way of pinpoint the text
within a document, (which can be utilized for citation and by search engines).
The outputs share a common numbering system that is meaningful (to man and
machine) across all digital outputs whether paper, screen, or database
oriented, (pdf, HTML, EPUB, xml, sqlite, postgresql), this numbering system can
be used to reference content.
* Granular search within documents. SQL databases are populated at an object
level (roughly headings, paragraphs, verse, tables) and become searchable with
that degree of granularity, the output information provides the
object/paragraph numbers which are relevant across all generated outputs; it is
also possible to look at just the matching paragraphs of the documents in the
database; [output indexing also work well with search indexing tools like hyperestraier].
* long term maintainability of document collections in a world of changing
formats, having a very sparsely marked-up source document base. there is a
considerable degree of future-proofing, output representations are
"upgradeable", and new document formats may be added. e.g. addition of odf
(open document text) module in 2006, epub in 2009 and in future html5 output
sometime in future, without modification of existing prepared texts
* SQL search aside, documents are generated as required and static once
generated.
* documents produced are static files, and may be batch processed, this needs
to be done only once but may be repeated for various reasons as desired
(updated content, addition of new output formats, updated technology document
presentations/representations)
* document source (plaintext utf-8) if shared on the net may be used as input
and processed locally to produce the different document outputs
* document source may be bundled together (automatically) with associated
documents (multiple language versions or master document with inclusions) and
images and sent as a zip file called a sisupod, if shared on the net these too
may be processed locally to produce the desired document outputs
* generated document outputs may automatically be posted to remote sites.
* for basic document generation, the only software dependency is
Ruby,
and a few standard Unix tools (this covers plaintext, HTML, EPUB, XML, ODF,
LaTeX). To use a database you of course need that, and to convert the LaTeX
generated to pdf, a latex processor like tetex or texlive.
* as a developers tool it is flexible and extensible
Syntax highlighting for
SiSU
markup is available for a number of text editors.
SiSU
is less about document layout than about finding a way with little markup to be
able to construct an abstract representation of a document that makes it
possible to produce multiple representations of it which may be rather
different from each other and used for different purposes, whether layout and
publishing, or search of content
i.e. to be able to take advantage from this minimal preparation starting point
of some of the strengths of rather different established ways of representing
documents for different purposes, whether for search (relational database, or
indexed flat files generated for that purpose whether of complete documents, or
say of files made up of objects), online viewing (e.g. html, xml, pdf), or
paper publication (e.g. pdf)...
the solution arrived at is by extracting structural information about the
document (about headings within the document) and by tracking objects (which
are serialized and also given hash values) in the manner described. It makes
possible representations that are quite different from those offered at
present. For example objects could be saved individually and identified by
their hashes, with an index of how the objects relate to each other to form a
document.
man sisu
man sisu-concordance
man sisu-epub
man sisu-git
man sisu-harvest
man sisu-html
man sisu-odf
man sisu-pdf
man sisu-pg
man sisu-po
man sisu-sqlite
man sisu-txt
man 7 sisu_complete
man 7 sisu_pdf
man 7 sisu_postgresql
man 7 sisu_sqlite
man sisu_termsheet
man sisu_webrick
Note
SiSU
documentation is prepared in
SiSU
and output is available in multiple formats including amongst others html, pdf,
odf and epub, which may be also be accessed via the html pages[^30]
<http://sisudoc.org/sisu/sisu_manual/index.html>
<http://sisudoc.org/sisu/sisu_manual/index.html>
file:///usr/share/doc/sisu/html/sisu.1.html
file:///usr/share/doc/sisu/html/sisu.1.html
/usr/share/doc/sisu/html/sisu_pdf.7.html
/usr/share/doc/sisu/html/sisu_postgresql.7.html
/usr/share/doc/sisu/html/sisu_sqlite.7.html
/usr/share/doc/sisu/html/sisu_webrick.1.html
<http://www.jus.uio.no/sisu/man/sisu.1.html>
<http://www.jus.uio.no/sisu/man/sisu.1.html>
<http://www.jus.uio.no/sisu/man/sisu_complete.7.html>
<http://www.jus.uio.no/sisu/man/sisu_pdf.7.html>
<http://www.jus.uio.no/sisu/man/sisu_postgresql.7.html>
<http://www.jus.uio.no/sisu/man/sisu_sqlite.7.html>
<http://www.jus.uio.no/sisu/man/sisu_webrick.1.html>
<http://www.linux-watch.com/news/NS7542722606.html>
<http://www.jus.uio.no/sisu/the_wealth_of_networks.yochai_benkler>
<http://advocacy.postgresql.org/>
<http://en.wikipedia.org/wiki/Postgresql>
29. the
Debian
Free Software guidelines require that everything distributed within
Debian
can be changed - and the documents are authors' works that while freely
distributable are not freely changeable.
30. named index.html or more extensively through sisu_manifest.html