aboutsummaryrefslogtreecommitdiffhomepage
path: root/data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt
diff options
context:
space:
mode:
Diffstat (limited to 'data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt')
-rw-r--r--data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt600
1 files changed, 600 insertions, 0 deletions
diff --git a/data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt b/data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt
new file mode 100644
index 00000000..e8413379
--- /dev/null
+++ b/data/doc/manuals_generated/sisu_manual/sisu_search/plain.txt
@@ -0,0 +1,600 @@
+SISU - SISU INFORMATION STRUCTURING UNIVERSE - SEARCH [0.58],
+RALPH AMISSAH
+****************************************************************************
+
+SISU SEARCH
+===========
+
+1. SISU SEARCH - INTRODUCTION
+-----------------------------
+
+*SiSU* output can easily and conveniently be indexed by a number of standalone
+indexing tools, such as Lucene, Hyperestraier.
+
+
+Because the document structure of sites created is clearly defined, and the
+text object citation system is available hypothetically at least, for all forms
+of output, it is possible to search the sql database, and either read results
+from that database, or just as simply map the results to the html output, which
+has richer text markup.
+
+
+In addition to this *SiSU* has the ability to populate a relational sql type
+database with documents at an object level, with objects numbers that are
+shared across different output types, which make them searchable with that
+degree of granularity. Basically, your match criteria is met by these documents
+and at these locations within each document, which can be viewed within the
+database directly or in various output formats.
+
+
+2. SQL
+------
+
+2.1 POPULATING SQL TYPE DATABASES
+.................................
+
+*SiSU* feeds sisu markupd documents into sql type databases PostgreSQL[^1]
+and/or SQLite[^2] database together with information related to document
+structure.
+
+
+- [1]: <http://www.postgresql.org/>
+
+- <http://advocacy.postgresql.org/>
+
+- <http://en.wikipedia.org/wiki/Postgresql>
+
+- [2]: <http://www.hwaci.com/sw/sqlite/>
+
+- <http://en.wikipedia.org/wiki/Sqlite>
+
+This is one of the more interesting output forms, as all the structural data of
+the documents are retained (though can be ignored by the user of the database
+should they so choose). All site texts/documents are (currently) streamed to
+four tables:
+
+
+ * one containing semantic (and other) headers, including, title, author,
+ subject, (the Dublin Core...);
+
+
+ * another the substantive texts by individual "paragraph" (or object) - along
+ with structural information, each paragraph being identifiable by its
+ paragraph number (if it has one which almost all of them do), and the
+ substantive text of each paragraph quite naturally being searchable (both in
+ formatted and clean text versions for searching); and
+
+
+ * a third containing endnotes cross-referenced back to the paragraph from
+ which they are referenced (both in formatted and clean text versions for
+ searching).
+
+
+ * a fourth table with a one to one relation with the headers table contains
+ full text versions of output, eg. pdf, html, xml, and ascii.
+
+
+There is of course the possibility to add further structures.
+
+
+At this level *SiSU* loads a relational database with documents chunked into
+objects, their smallest logical structurally constituent parts, as text
+objects, with their object citation number and all other structural information
+needed to construct the document. Text is stored (at this text object level)
+with and without elementary markup tagging, the stripped version being so as to
+facilitate ease of searching.
+
+
+Being able to search a relational database at an object level with the *SiSU*
+citation system is an effective way of locating content generated by *SiSU*. As
+individual text objects of a document stored (and indexed) together with object
+numbers, and all versions of the document have the same numbering, complex
+searches can be tailored to return just the locations of the search results
+relevant for all available output formats, with live links to the precise
+locations in the database or in html/xml documents; or, the structural
+information provided makes it possible to search the full contents of the
+database and have headings in which search content appears, or to search only
+headings etc. (as the Dublin Core is incorporated it is easy to make use of
+that as well).
+
+
+3. POSTGRESQL
+-------------
+
+3.1 NAME
+........
+
+*SiSU* - Structured information, Serialized Units - a document publishing
+system, postgresql dependency package
+
+
+3.2 DESCRIPTION
+...............
+
+Information related to using postgresql with sisu (and related to the
+sisu_postgresql dependency package, which is a dummy package to install
+dependencies needed for *SiSU* to populate a postgresql database, this being
+part of *SiSU* - man sisu).
+
+
+3.3 SYNOPSIS
+............
+
+ sisu -D [instruction] [filename/wildcard if required]
+
+
+ sisu -D --pg --[instruction] [filename/wildcard if required]
+
+
+3.4 COMMANDS
+............
+
+Mappings to two databases are provided by default, postgresql and sqlite, the
+same commands are used within sisu to construct and populate databases however
+-d (lowercase) denotes sqlite and -D (uppercase) denotes postgresql,
+alternatively --sqlite or --pgsql may be used
+
+
+*-D or --pgsql* may be used interchangeably.
+
+
+3.4.1 CREATE AND DESTROY DATABASE
+.................................
+
+*--pgsql --createall*
+initial step, creates required relations (tables, indexes) in existing
+(postgresql) database (a database should be created manually and given the same
+name as working directory, as requested) (rb.dbi)
+
+
+*sisu -D --createdb*
+creates database where no database existed before
+
+
+*sisu -D --create*
+creates database tables where no database tables existed before
+
+
+*sisu -D --Dropall*
+destroys database (including all its content)! kills data and drops tables,
+indexes and database associated with a given directory (and directories of the
+same name).
+
+
+*sisu -D --recreate*
+destroys existing database and builds a new empty database structure
+
+
+3.4.2 IMPORT AND REMOVE DOCUMENTS
+.................................
+
+*sisu -D --import -v [filename/wildcard]*
+populates database with the contents of the file. Imports documents(s)
+specified to a postgresql database (at an object level).
+
+
+*sisu -D --update -v [filename/wildcard]*
+updates file contents in database
+
+
+*sisu -D --remove -v [filename/wildcard]*
+removes specified document from postgresql database.
+
+
+4. SQLITE
+---------
+
+4.1 NAME
+........
+
+*SiSU* - Structured information, Serialized Units - a document publishing
+system.
+
+
+4.2 DESCRIPTION
+...............
+
+Information related to using sqlite with sisu (and related to the sisu_sqlite
+dependency package, which is a dummy package to install dependencies needed for
+*SiSU* to populate an sqlite database, this being part of *SiSU* - man sisu).
+
+
+4.3 SYNOPSIS
+............
+
+ sisu -d [instruction] [filename/wildcard if required]
+
+
+ sisu -d --(sqlite|pg) --[instruction] [filename/wildcard if required]
+
+
+4.4 COMMANDS
+............
+
+Mappings to two databases are provided by default, postgresql and sqlite, the
+same commands are used within sisu to construct and populate databases however
+-d (lowercase) denotes sqlite and -D (uppercase) denotes postgresql,
+alternatively --sqlite or --pgsql may be used
+
+
+*-d or --sqlite* may be used interchangeably.
+
+
+4.4.1 CREATE AND DESTROY DATABASE
+.................................
+
+*--sqlite --createall*
+initial step, creates required relations (tables, indexes) in existing
+(sqlite) database (a database should be created manually and given the same
+name as working directory, as requested) (rb.dbi)
+
+
+*sisu -d --createdb*
+creates database where no database existed before
+
+
+*sisu -d --create*
+creates database tables where no database tables existed before
+
+
+*sisu -d --dropall*
+destroys database (including all its content)! kills data and drops tables,
+indexes and database associated with a given directory (and directories of the
+same name).
+
+
+*sisu -d --recreate*
+destroys existing database and builds a new empty database structure
+
+
+4.4.2 IMPORT AND REMOVE DOCUMENTS
+.................................
+
+*sisu -d --import -v [filename/wildcard]*
+populates database with the contents of the file. Imports documents(s)
+specified to an sqlite database (at an object level).
+
+
+*sisu -d --update -v [filename/wildcard]*
+updates file contents in database
+
+
+*sisu -d --remove -v [filename/wildcard]*
+removes specified document from sqlite database.
+
+
+5. INTRODUCTION
+---------------
+
+5.1 SEARCH - DATABASE FRONTEND SAMPLE, UTILISING DATABASE AND SISU FEATURES,
+INCLUDING OBJECT CITATION NUMBERING (BACKEND CURRENTLY POSTGRESQL)
+..............................................................................
+
+Sample search frontend [link:] <http://search.sisudoc.org> [^3] A small
+database and sample query front-end (search from) that makes use of the
+citation system, _object citation numbering_ to demonstrates functionality.[^4]
+
+
+- [3]: <http://search.sisudoc.org>
+
+- [4]: (which could be extended further with current back-end). As regards scaling
+ of the database, it is as scalable as the database (here Postgresql) and
+ hardware allow.
+
+*SiSU* can provide information on which documents are matched and at what
+locations within each document the matches are found. These results are
+relevant across all outputs using object citation numbering, which includes
+html, XML, LaTeX, PDF and indeed the SQL database. You can then refer to one of
+the other outputs or in the SQL database expand the text within the matched
+objects (paragraphs) in the documents matched.
+
+
+Note you may set results either for documents matched and object number
+locations within each matched document meeting the search criteria; or display
+the names of the documents matched along with the objects (paragraphs) that
+meet the search criteria.[^5]
+
+
+- [5]: of this feature when demonstrated to an IBM software innovations evaluator
+ in 2004 he said to paraphrase: this could be of interest to us. We have large
+ document management systems, you can search hundreds of thousands of documents
+ and we can tell you which documents meet your search criteria, but there is no
+ way we can tell you without opening each document where within each your
+ matches are found.
+
+*sisu -F --webserv-webrick*
+builds a cgi web search frontend for the database created
+
+
+The following is feedback on the setup on a machine provided by the help
+command:
+
+
+ sisu --help sql
+
+
+
+ Postgresql
+ user: ralph
+ current db set: SiSU_sisu
+ port: 5432
+ dbi connect: DBI:Pg:database=SiSU_sisu;port=5432
+ sqlite
+ current db set: /home/ralph/sisu_www/sisu/sisu_sqlite.db
+ dbi connect DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db
+
+Note on databases built
+
+
+By default, [unless otherwise specified] databases are built on a directory
+basis, from collections of documents within that directory. The name of the
+directory you choose to work from is used as the database name, i.e. if you are
+working in a directory called /home/ralph/ebook the database SiSU_ebook is
+used. [otherwise a manual mapping for the collection is necessary]
+
+
+5.2 SEARCH FORM
+...............
+
+*sisu -F*
+generates a sample search form, which must be copied to the web-server cgi
+directory
+
+
+*sisu -F --webserv-webrick*
+generates a sample search form for use with the webrick server, which must be
+copied to the web-server cgi directory
+
+
+*sisu -Fv*
+as above, and provides some information on setting up hyperestraier
+
+
+*sisu -W*
+starts the webrick server which should be available wherever sisu is properly
+installed
+
+
+The generated search form must be copied manually to the webserver directory as
+instructed
+
+
+6. HYPERESTRAIER
+----------------
+
+See the documentation for hyperestraier:
+
+
+ <http://hyperestraier.sourceforge.net/>
+
+
+ /usr/share/doc/hyperestraier/index.html
+
+
+ man estcmd
+
+
+on sisu_hyperestraier:
+
+
+ man sisu_hyperestraier
+
+
+ /usr/share/doc/sisu/sisu_markup/sisu_hyperestraier/index.html
+
+
+NOTE: the examples that follow assume that sisu output is placed in the
+directory /home/ralph/sisu_www
+
+
+(A) to generate the index within the webserver directory to be indexed:
+
+
+ estcmd gather -sd [index name] [directory path to index]
+
+
+the following are examples that will need to be tailored according to your
+needs:
+
+
+ cd /home/ralph/sisu_www
+
+
+ estcmd gather -sd casket /home/ralph/sisu_www
+
+
+you may use the 'find' command together with 'egrep' to limit indexing to
+particular document collection directories within the web server directory:
+
+
+ find /home/ralph/sisu_www -type f | egrep
+ '/home/ralph/sisu_www/sisu/.+?.html$' |estcmd gather -sd casket -
+
+
+Check which directories in the webserver/output directory (~/sisu_www or
+elsewhere depending on configuration) you wish to include in the search index.
+
+
+As sisu duplicates output in multiple file formats, it it is probably
+preferable to limit the estraier index to html output, and as it may also be
+desirable to exclude files 'plain.txt', 'toc.html' and 'concordance.html', as
+these duplicate information held in other html output e.g.
+
+
+ find /home/ralph/sisu_www -type f | egrep
+ '/sisu_www/(sisu|bookmarks)/.+?.html$' | egrep -v '(doc|concordance).html$'
+ |estcmd gather -sd casket -
+
+
+from your current document preparation/markup directory, you would construct a
+rune along the following lines:
+
+
+ find /home/ralph/sisu_www -type f | egrep '/home/ralph/sisu_www/([specify
+ first directory for inclusion]|[specify second directory for
+ inclusion]|[another directory for inclusion? ...])/.+?.html$' | egrep -v
+ '(doc|concordance).html$' |estcmd gather -sd /home/ralph/sisu_www/casket -
+
+
+(B) to set up the search form
+
+
+(i) copy estseek.cgi to your cgi directory and set file permissions to 755:
+
+
+ sudo cp -vi /usr/lib/estraier/estseek.cgi /usr/lib/cgi-bin
+
+
+ sudo chmod -v 755 /usr/lib/cgi-bin/estseek.cgi
+
+
+ sudo cp -v /usr/share/hyperestraier/estseek.* /usr/lib/cgi-bin
+
+
+ [see estraier documentation for paths]
+
+
+(ii) edit estseek.conf, with attention to the lines starting 'indexname:' and
+'replace:':
+
+
+ indexname: /home/ralph/sisu_www/casket
+
+
+ replace: ^file:///home/ralph/sisu_www{!} [link:] http://localhost
+
+
+ replace: /index.html?${{!}}/
+
+
+(C) to test using webrick, start webrick:
+
+
+ sisu -W
+
+
+and try open the url: <http://localhost:8081/cgi-bin/estseek.cgi>
+
+
+DOCUMENT INFORMATION (METADATA)
+*******************************
+
+METADATA
+--------
+
+Document Manifest @
+<http://www.jus.uio.no/sisu/sisu_manual/sisu_search/sisu_manifest.html>
+
+
+*Dublin Core* (DC)
+
+
+/DC tags included with this document are provided here./
+
+
+DC Title: _SiSU - SiSU information Structuring Universe - Search [0.58]_
+
+
+DC Creator: _Ralph Amissah_
+
+
+DC Rights: _Copyright (C) Ralph Amissah 2007, part of SiSU documentation,
+License GPL 3_
+
+
+DC Type: _information_
+
+
+DC Date created: _2002-08-28_
+
+
+DC Date issued: _2002-08-28_
+
+
+DC Date available: _2002-08-28_
+
+
+DC Date modified: _2007-09-16_
+
+
+DC Date: _2007-09-16_
+
+
+*Version Information*
+
+
+Sourcefile: _sisu_search._sst_
+
+
+Filetype: _SiSU text insert 0.58_
+
+
+Sourcefile Digest, MD5(sisu_search._sst)= _52c1d6d3c3082e6b236c65debc733a05_
+
+
+Skin_Digest:
+MD5(/home/ralph/grotto/theatre/dbld/sisu-dev/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)=
+_20fc43cf3eb6590bc3399a1aef65c5a9_
+
+
+*Generated*
+
+
+Document (metaverse) last generated: _Sun Sep 23 04:11:05 +0100 2007_
+
+
+Generated by: _SiSU_ _0.59.0_ of 2007w38/0 (2007-09-23)
+
+
+Ruby version: _ ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]_
+
+
+
+==============================================================================
+
+ title: SiSU - SiSU information Structuring Universe - Search [0.58]
+
+ creator: Ralph Amissah
+
+ rights: Copyright (C) Ralph Amissah 2007, part of SiSU documentation,
+ License GPL 3
+
+ type: information
+
+ subject: ebook, epublishing, electronic book, electronic publishing,
+ electronic document, electronic citation, data structure,
+ citation systems, search
+
+ date.created: 2002-08-28
+
+ date.issued: 2002-08-28
+
+ date.available: 2002-08-28
+
+ date.modified: 2007-09-16
+
+ date: 2007-09-16
+
+
+
+
+
+==============================================================================
+nil
+
+Other versions of this document:
+manifest:
+ http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html
+html:
+ http://www.jus.uio.no/sisu/sisu_search/toc.html
+pdf:
+ http://www.jus.uio.no/sisu/sisu_search/portrait.pdf
+ http://www.jus.uio.no/sisu/sisu_search/landscape.pdf
+plaintext (plain text):
+ http://www.jus.uio.no/sisu/sisu_search/plain.txt
+at:
+ http://www.jus.uio.no/sisu
+* Generated by: SiSU 0.59.0 of 2007w38/0 (2007-09-23)
+* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
+* Last Generated on: Sun Sep 23 04:11:52 +0100 2007
+* SiSU http://www.jus.uio.no/sisu