aboutsummaryrefslogtreecommitdiffhomepage
path: root/man/man8/sisu_search.8
diff options
context:
space:
mode:
Diffstat (limited to 'man/man8/sisu_search.8')
-rw-r--r--man/man8/sisu_search.8639
1 files changed, 639 insertions, 0 deletions
diff --git a/man/man8/sisu_search.8 b/man/man8/sisu_search.8
new file mode 100644
index 00000000..039f6e7c
--- /dev/null
+++ b/man/man8/sisu_search.8
@@ -0,0 +1,639 @@
+.TH "sisu_search" "1" "2007-09-16" "0.58.3" "SiSU - SiSU information Structuring Universe"
+.SH
+SISU \- SISU INFORMATION STRUCTURING UNIVERSE \- SEARCH \ [0.58],
+RALPH AMISSAH
+.BR
+
+.SH
+SISU SEARCH
+.BR
+
+.SH
+1. SISU SEARCH \- INTRODUCTION
+.BR
+
+.BR
+.B SiSU
+output can easily and conveniently be indexed by a number of standalone
+indexing tools, such as Lucene, Hyperestraier.
+
+.BR
+Because the document structure of sites created is clearly defined, and the
+text object citation system is available hypothetically at least, for all forms
+of output, it is possible to search the sql database, and either read results
+from that database, or just as simply map the results to the html output, which
+has richer text markup.
+
+.BR
+In addition to this
+.B SiSU
+has the ability to populate a relational sql type database with documents at
+an object level, with objects numbers that are shared across different output
+types, which make them searchable with that degree of granularity. Basically,
+your match criteria is met by these documents and at these locations within
+each document, which can be viewed within the database directly or in various
+output formats.
+
+.SH
+2. SQL
+.BR
+
+.SH
+2.1 POPULATING SQL TYPE DATABASES
+
+.BR
+.B SiSU
+feeds sisu markupd documents into sql type databases PostgreSQL[^1] and/or
+SQLite[^2] database together with information related to document structure.
+
+.BR
+This is one of the more interesting output forms, as all the structural data of
+the documents are retained (though can be ignored by the user of the database
+should they so choose). All site texts/documents are (currently) streamed to
+four tables:
+
+.BR
+ * one containing semantic (and other) headers, including, title, author,
+ subject, (the Dublin Core...);
+
+.BR
+ * another the substantive texts by individual \"paragraph\" (or object) \-
+ along with structural information, each paragraph being identifiable by its
+ paragraph number (if it has one which almost all of them do), and the
+ substantive text of each paragraph quite naturally being searchable (both in
+ formatted and clean text versions for searching); and
+
+.BR
+ * a third containing endnotes cross\-referenced back to the paragraph from
+ which they are referenced (both in formatted and clean text versions for
+ searching).
+
+.BR
+ * a fourth table with a one to one relation with the headers table contains
+ full text versions of output, eg. pdf, html, xml, and ascii.
+
+.BR
+There is of course the possibility to add further structures.
+
+.BR
+At this level
+.B SiSU
+loads a relational database with documents chunked into objects, their
+smallest logical structurally constituent parts, as text objects, with their
+object citation number and all other structural information needed to construct
+the document. Text is stored (at this text object level) with and without
+elementary markup tagging, the stripped version being so as to facilitate ease
+of searching.
+
+.BR
+Being able to search a relational database at an object level with the
+.B SiSU
+citation system is an effective way of locating content generated by
+.B SiSU
+. As individual text objects of a document stored (and indexed) together with
+object numbers, and all versions of the document have the same numbering,
+complex searches can be tailored to return just the locations of the search
+results relevant for all available output formats, with live links to the
+precise locations in the database or in html/xml documents; or, the structural
+information provided makes it possible to search the full contents of the
+database and have headings in which search content appears, or to search only
+headings etc. (as the Dublin Core is incorporated it is easy to make use of
+that as well).
+
+.SH
+3. POSTGRESQL
+.BR
+
+.SH
+3.1 NAME
+
+.BR
+.B SiSU
+\- Structured information, Serialized Units \- a document publishing system,
+postgresql dependency package
+
+.SH
+3.2 DESCRIPTION
+
+.BR
+Information related to using postgresql with sisu (and related to the
+sisu_postgresql dependency package, which is a dummy package to install
+dependencies needed for
+.B SiSU
+to populate a postgresql database, this being part of
+.B SiSU
+\- man sisu).
+
+.SH
+3.3 SYNOPSIS
+
+.BR
+ sisu \-D \ [instruction] \ [filename/wildcard \ if \ required]
+
+.BR
+ sisu \-D \-\-pg \-\-[instruction] \ [filename/wildcard \ if \ required]
+
+.SH
+3.4 COMMANDS
+
+.BR
+Mappings to two databases are provided by default, postgresql and sqlite, the
+same commands are used within sisu to construct and populate databases however
+\-d (lowercase) denotes sqlite and \-D (uppercase) denotes postgresql,
+alternatively \-\-sqlite or \-\-pgsql may be used
+
+.BR
+.B \-D or \-\-pgsql
+may be used interchangeably.
+
+.SH
+3.4.1 CREATE AND DESTROY DATABASE
+
+.TP
+.B \ \-\-pgsql \ \-\-createall
+\ initial \ step, \ creates \ required \ relations \ (tables, \ indexes) \ in
+\ existing \ (postgresql) \ database \ (a \ database \ should \ be \ created \
+manually \ and \ given \ the \ same \ name \ as \ working \ directory, \ as \
+requested) \ (rb.dbi) \
+
+.TP
+.B \ sisu \ \-D \ \-\-createdb
+\ creates \ database \ where \ no \ database \ existed \ before \
+
+.TP
+.B \ sisu \ \-D \ \-\-create
+\ creates \ database \ tables \ where \ no \ database \ tables \ existed \
+before \
+
+.TP
+.B \ sisu \ \-D \ \-\-Dropall
+\ destroys \ database \ (including \ all \ its \ content)! \ kills \ data \
+and \ drops \ tables, \ indexes \ and \ database \ associated \ with \ a \
+given \ directory \ (and \ directories \ of \ the \ same \ name). \
+
+.TP
+.B \ sisu \ \-D \ \-\-recreate
+\ destroys \ existing \ database \ and \ builds \ a \ new \ empty \ database
+\ structure \
+
+.SH
+3.4.2 IMPORT AND REMOVE DOCUMENTS
+
+.TP
+.B \ sisu \ \-D \ \-\-import \ \-v \ \ [filename/wildcard]
+populates database with the contents of the file. Imports documents(s)
+specified to a postgresql database (at an object level).
+
+.TP
+.B \ sisu \ \-D \ \-\-update \ \-v \ \ [filename/wildcard]
+updates file contents in database
+
+.TP
+.B \ sisu \ \-D \ \-\-remove \ \-v \ \ [filename/wildcard]
+removes specified document from postgresql database.
+
+.SH
+4. SQLITE
+.BR
+
+.SH
+4.1 NAME
+
+.BR
+.B SiSU
+\- Structured information, Serialized Units \- a document publishing system.
+
+.SH
+4.2 DESCRIPTION
+
+.BR
+Information related to using sqlite with sisu (and related to the sisu_sqlite
+dependency package, which is a dummy package to install dependencies needed for
+.B SiSU
+to populate an sqlite database, this being part of
+.B SiSU
+\- man sisu).
+
+.SH
+4.3 SYNOPSIS
+
+.BR
+ sisu \-d \ [instruction] \ [filename/wildcard \ if \ required]
+
+.BR
+ sisu \-d \-\-(sqlite|pg) \-\-[instruction] \ [filename/wildcard \ if \
+ required]
+
+.SH
+4.4 COMMANDS
+
+.BR
+Mappings to two databases are provided by default, postgresql and sqlite, the
+same commands are used within sisu to construct and populate databases however
+\-d (lowercase) denotes sqlite and \-D (uppercase) denotes postgresql,
+alternatively \-\-sqlite or \-\-pgsql may be used
+
+.BR
+.B \-d or \-\-sqlite
+may be used interchangeably.
+
+.SH
+4.4.1 CREATE AND DESTROY DATABASE
+
+.TP
+.B \ \-\-sqlite \ \-\-createall
+\ initial \ step, \ creates \ required \ relations \ (tables, \ indexes) \ in
+\ existing \ (sqlite) \ database \ (a \ database \ should \ be \ created \
+manually \ and \ given \ the \ same \ name \ as \ working \ directory, \ as \
+requested) \ (rb.dbi) \
+
+.TP
+.B \ sisu \ \-d \ \-\-createdb
+\ creates \ database \ where \ no \ database \ existed \ before \
+
+.TP
+.B \ sisu \ \-d \ \-\-create
+\ creates \ database \ tables \ where \ no \ database \ tables \ existed \
+before \
+
+.TP
+.B \ sisu \ \-d \ \-\-dropall
+\ destroys \ database \ (including \ all \ its \ content)! \ kills \ data \
+and \ drops \ tables, \ indexes \ and \ database \ associated \ with \ a \
+given \ directory \ (and \ directories \ of \ the \ same \ name). \
+
+.TP
+.B \ sisu \ \-d \ \-\-recreate
+\ destroys \ existing \ database \ and \ builds \ a \ new \ empty \ database
+\ structure \
+
+.SH
+4.4.2 IMPORT AND REMOVE DOCUMENTS
+
+.TP
+.B \ sisu \ \-d \ \-\-import \ \-v \ \ [filename/wildcard]
+populates database with the contents of the file. Imports documents(s)
+specified to an sqlite database (at an object level).
+
+.TP
+.B \ sisu \ \-d \ \-\-update \ \-v \ \ [filename/wildcard]
+updates file contents in database
+
+.TP
+.B \ sisu \ \-d \ \-\-remove \ \-v \ \ [filename/wildcard]
+removes specified document from sqlite database.
+
+.SH
+5. INTRODUCTION
+.BR
+
+.SH
+5.1 SEARCH \- DATABASE FRONTEND SAMPLE, UTILISING DATABASE AND SISU FEATURES,
+INCLUDING OBJECT CITATION NUMBERING (BACKEND CURRENTLY POSTGRESQL)
+
+.BR
+Sample search frontend <http://search.sisudoc.org> \ [^3] A small database and
+sample query front\-end (search from) that makes use of the citation system,
+.I object citation numbering
+to demonstrates functionality.[^4]
+
+.BR
+.B SiSU
+can provide information on which documents are matched and at what locations
+within each document the matches are found. These results are relevant across
+all outputs using object citation numbering, which includes html, XML, LaTeX,
+PDF and indeed the SQL database. You can then refer to one of the other outputs
+or in the SQL database expand the text within the matched objects (paragraphs)
+in the documents matched.
+
+.BR
+Note you may set results either for documents matched and object number
+locations within each matched document meeting the search criteria; or display
+the names of the documents matched along with the objects (paragraphs) that
+meet the search criteria.[^5]
+
+.TP
+.B \ sisu \ \-F \ \-\-webserv\-webrick
+\ builds \ a \ cgi \ web \ search \ frontend \ for \ the \ database \ created
+\
+
+.BR
+The following is feedback on the setup on a machine provided by the help
+command:
+
+.BR
+ sisu \-\-help sql
+
+
+.nf
+ Postgresql
+ user: ralph
+ current db set: SiSU_sisu
+ port: 5432
+ dbi connect: DBI:Pg:database=SiSU_sisu;port=5432
+ sqlite
+ current db set: /home/ralph/sisu_www/sisu/sisu_sqlite.db
+ dbi connect DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db
+.fi
+
+.BR
+Note on databases built
+
+.BR
+By default, \ [unless \ otherwise \ specified] databases are built on a
+directory basis, from collections of documents within that directory. The name
+of the directory you choose to work from is used as the database name, i.e. if
+you are working in a directory called /home/ralph/ebook the database SiSU_ebook
+is used. \ [otherwise \ a \ manual \ mapping \ for \ the \ collection \ is \
+necessary]
+
+.SH
+5.2 SEARCH FORM
+
+.TP
+.B \ sisu \ \-F
+\ generates \ a \ sample \ search \ form, \ which \ must \ be \ copied \ to \
+the \ web\-server \ cgi \ directory \
+
+.TP
+.B \ sisu \ \-F \ \-\-webserv\-webrick
+\ generates \ a \ sample \ search \ form \ for \ use \ with \ the \ webrick \
+server, \ which \ must \ be \ copied \ to \ the \ web\-server \ cgi \ directory
+\
+
+.TP
+.B \ sisu \ \-Fv
+\ as \ above, \ and \ provides \ some \ information \ on \ setting \ up \
+hyperestraier \
+
+.TP
+.B \ sisu \ \-W
+\ starts \ the \ webrick \ server \ which \ should \ be \ available \
+wherever \ sisu \ is \ properly \ installed \
+
+.BR
+The generated search form must be copied manually to the webserver directory as
+instructed
+
+.SH
+6. HYPERESTRAIER
+.BR
+
+.BR
+See the documentation for hyperestraier:
+
+.BR
+ <http://hyperestraier.sourceforge.net/>
+
+.BR
+ /usr/share/doc/hyperestraier/index.html
+
+.BR
+ man estcmd
+
+.BR
+on sisu_hyperestraier:
+
+.BR
+ man sisu_hyperestraier
+
+.BR
+ /usr/share/doc/sisu/sisu_markup/sisu_hyperestraier/index.html
+
+.BR
+NOTE: the examples that follow assume that sisu output is placed in the
+directory /home/ralph/sisu_www
+
+.BR
+(A) to generate the index within the webserver directory to be indexed:
+
+.BR
+ estcmd gather \-sd \ [index \ name] \ [directory \ path \ to \ index]
+
+.BR
+the following are examples that will need to be tailored according to your
+needs:
+
+.BR
+ cd /home/ralph/sisu_www
+
+.BR
+ estcmd gather \-sd casket /home/ralph/sisu_www
+
+.BR
+you may use the \'find\' command together with \'egrep\' to limit indexing to
+particular document collection directories within the web server directory:
+
+.BR
+ find /home/ralph/sisu_www \-type f | egrep
+ \'/home/ralph/sisu_www/sisu/.+?.html$\' |estcmd gather \-sd casket \-
+
+.BR
+Check which directories in the webserver/output directory (~/sisu_www or
+elsewhere depending on configuration) you wish to include in the search index.
+
+.BR
+As sisu duplicates output in multiple file formats, it it is probably
+preferable to limit the estraier index to html output, and as it may also be
+desirable to exclude files \'plain.txt\', \'toc.html\' and
+\'concordance.html\', as these duplicate information held in other html output
+e.g.
+
+.BR
+ find /home/ralph/sisu_www \-type f | egrep
+ \'/sisu_www/(sisu|bookmarks)/.+?.html$\' | egrep \-v
+ \'(doc|concordance).html$\' |estcmd gather \-sd casket \-
+
+.BR
+from your current document preparation/markup directory, you would construct a
+rune along the following lines:
+
+.BR
+ find /home/ralph/sisu_www \-type f | egrep \'/home/ralph/sisu_www/([specify \
+ first \ directory \ for \ inclusion]|[specify \ second \ directory \ for \
+ inclusion]|[another \ directory \ for \ inclusion? \ ...])/.+?.html$\' |
+ egrep \-v \'(doc|concordance).html$\' |estcmd gather \-sd
+ /home/ralph/sisu_www/casket \-
+
+.BR
+(B) to set up the search form
+
+.BR
+(i) copy estseek.cgi to your cgi directory and set file permissions to 755:
+
+.BR
+ sudo cp \-vi /usr/lib/estraier/estseek.cgi /usr/lib/cgi\-bin
+
+.BR
+ sudo chmod \-v 755 /usr/lib/cgi\-bin/estseek.cgi
+
+.BR
+ sudo cp \-v /usr/share/hyperestraier/estseek.* /usr/lib/cgi\-bin
+
+.BR
+ \ [see \ estraier \ documentation \ for \ paths]
+
+.BR
+(ii) edit estseek.conf, with attention to the lines starting \'indexname:\' and
+\'replace:\':
+
+.BR
+ indexname: /home/ralph/sisu_www/casket
+
+.BR
+ replace: ^file:///home/ralph/sisu_www{{!}}http://localhost
+
+.BR
+ replace: /index.html?${{!}}/
+
+.BR
+(C) to test using webrick, start webrick:
+
+.BR
+ sisu \-W
+
+.BR
+and try open the url: <http://localhost:8081/cgi\-bin/estseek.cgi>
+
+.SH
+DOCUMENT INFORMATION (METADATA)
+.BR
+
+.SH
+METADATA
+.BR
+
+.BR
+Document Manifest @
+<http://www.jus.uio.no/sisu/sisu_manual/sisu_search/sisu_manifest.html>
+
+.BR
+.B Dublin Core
+(DC)
+
+.BR
+.I DC tags included with this document are provided here.
+
+.BR
+DC Title:
+.I SiSU \- SiSU information Structuring Universe \- Search \ [0.58]
+
+.BR
+DC Creator:
+.I Ralph Amissah
+
+.BR
+DC Rights:
+.I Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL
+3
+
+.BR
+DC Type:
+.I information
+
+.BR
+DC Date created:
+.I 2002\-08\-28
+
+.BR
+DC Date issued:
+.I 2002\-08\-28
+
+.BR
+DC Date available:
+.I 2002\-08\-28
+
+.BR
+DC Date modified:
+.I 2007\-09\-16
+
+.BR
+DC Date:
+.I 2007\-09\-16
+
+.BR
+.B Version Information
+
+.BR
+Sourcefile:
+.I sisu_search._sst
+
+.BR
+Filetype:
+.I SiSU text insert 0.58
+
+.BR
+Sourcefile Digest, MD5(sisu_search._sst)=
+.I 52c1d6d3c3082e6b236c65debc733a05
+
+.BR
+Skin_Digest:
+MD5(/home/ralph/grotto/theatre/dbld/sisu\-dev/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)=
+.I 20fc43cf3eb6590bc3399a1aef65c5a9
+
+.BR
+.B Generated
+
+.BR
+Document (metaverse) last generated:
+.I Sun Sep 23 01:14:04 +0100 2007
+
+.BR
+Generated by:
+.I SiSU
+.I 0.58.3
+of 2007w36/4 (2007\-09\-06)
+
+.BR
+Ruby version:
+.I ruby 1.8.6 (2007\-06\-07 patchlevel 36) \ [i486\-linux]
+
+.TP
+.BI 1.
+<http://www.postgresql.org/>
+ <http://advocacy.postgresql.org/>
+ <http://en.wikipedia.org/wiki/Postgresql>
+.TP
+.BI 2.
+<http://www.hwaci.com/sw/sqlite/>
+ <http://en.wikipedia.org/wiki/Sqlite>
+.TP
+.BI 3.
+<http://search.sisudoc.org>
+.TP
+.BI 4.
+(which could be extended further with current back-end). As regards scaling
+of the database, it is as scalable as the database (here Postgresql) and
+hardware allow.
+.TP
+.BI 5.
+of this feature when demonstrated to an IBM software innovations evaluator in
+2004 he said to paraphrase: this could be of interest to us. We have large
+document management systems, you can search hundreds of thousands of documents
+and we can tell you which documents meet your search criteria, but there is no
+way we can tell you without opening each document where within each your
+matches are found.
+
+.TP
+Other versions of this document:
+.TP
+manifest: <http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html>
+.TP
+html: <http://www.jus.uio.no/sisu/sisu_search/toc.html>
+.TP
+pdf: <http://www.jus.uio.no/sisu/sisu_search/portrait.pdf>
+.TP
+pdf: <http://www.jus.uio.no/sisu/sisu_search/landscape.pdf>
+." .TP
+." manpage: http://www.jus.uio.no/sisu/sisu_search/sisu_search.1
+.TP
+at: <http://www.jus.uio.no/sisu>
+.TP
+.TP
+* Generated by: SiSU 0.58.3 of 2007w36/4 (2007-09-06)
+.TP
+* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
+.TP
+* Last Generated on: Sun Sep 23 01:14:07 +0100 2007
+.TP
+* SiSU http://www.jus.uio.no/sisu