aboutsummaryrefslogtreecommitdiffhomepage
path: root/man/man8/sisu_search.8
diff options
context:
space:
mode:
Diffstat (limited to 'man/man8/sisu_search.8')
-rw-r--r--man/man8/sisu_search.8639
1 files changed, 0 insertions, 639 deletions
diff --git a/man/man8/sisu_search.8 b/man/man8/sisu_search.8
deleted file mode 100644
index bbe444e4..00000000
--- a/man/man8/sisu_search.8
+++ /dev/null
@@ -1,639 +0,0 @@
-.TH "sisu_search" "1" "2007-09-16" "0.59.1" "SiSU"
-.SH
-SISU \- SEARCH,
-RALPH AMISSAH
-.BR
-
-.SH
-SISU SEARCH
-.BR
-
-.SH
-1. SISU SEARCH \- INTRODUCTION
-.BR
-
-.BR
-.B SiSU
-output can easily and conveniently be indexed by a number of standalone
-indexing tools, such as Lucene, Hyperestraier.
-
-.BR
-Because the document structure of sites created is clearly defined, and the
-text object citation system is available hypothetically at least, for all forms
-of output, it is possible to search the sql database, and either read results
-from that database, or just as simply map the results to the html output, which
-has richer text markup.
-
-.BR
-In addition to this
-.B SiSU
-has the ability to populate a relational sql type database with documents at
-an object level, with objects numbers that are shared across different output
-types, which make them searchable with that degree of granularity. Basically,
-your match criteria is met by these documents and at these locations within
-each document, which can be viewed within the database directly or in various
-output formats.
-
-.SH
-2. SQL
-.BR
-
-.SH
-2.1 POPULATING SQL TYPE DATABASES
-
-.BR
-.B SiSU
-feeds sisu markupd documents into sql type databases PostgreSQL[^1] and/or
-SQLite[^2] database together with information related to document structure.
-
-.BR
-This is one of the more interesting output forms, as all the structural data of
-the documents are retained (though can be ignored by the user of the database
-should they so choose). All site texts/documents are (currently) streamed to
-four tables:
-
-.BR
- * one containing semantic (and other) headers, including, title, author,
- subject, (the Dublin Core...);
-
-.BR
- * another the substantive texts by individual \"paragraph\" (or object) \-
- along with structural information, each paragraph being identifiable by its
- paragraph number (if it has one which almost all of them do), and the
- substantive text of each paragraph quite naturally being searchable (both in
- formatted and clean text versions for searching); and
-
-.BR
- * a third containing endnotes cross\-referenced back to the paragraph from
- which they are referenced (both in formatted and clean text versions for
- searching).
-
-.BR
- * a fourth table with a one to one relation with the headers table contains
- full text versions of output, eg. pdf, html, xml, and ascii.
-
-.BR
-There is of course the possibility to add further structures.
-
-.BR
-At this level
-.B SiSU
-loads a relational database with documents chunked into objects, their
-smallest logical structurally constituent parts, as text objects, with their
-object citation number and all other structural information needed to construct
-the document. Text is stored (at this text object level) with and without
-elementary markup tagging, the stripped version being so as to facilitate ease
-of searching.
-
-.BR
-Being able to search a relational database at an object level with the
-.B SiSU
-citation system is an effective way of locating content generated by
-.B SiSU
-. As individual text objects of a document stored (and indexed) together with
-object numbers, and all versions of the document have the same numbering,
-complex searches can be tailored to return just the locations of the search
-results relevant for all available output formats, with live links to the
-precise locations in the database or in html/xml documents; or, the structural
-information provided makes it possible to search the full contents of the
-database and have headings in which search content appears, or to search only
-headings etc. (as the Dublin Core is incorporated it is easy to make use of
-that as well).
-
-.SH
-3. POSTGRESQL
-.BR
-
-.SH
-3.1 NAME
-
-.BR
-.B SiSU
-\- Structured information, Serialized Units \- a document publishing system,
-postgresql dependency package
-
-.SH
-3.2 DESCRIPTION
-
-.BR
-Information related to using postgresql with sisu (and related to the
-sisu_postgresql dependency package, which is a dummy package to install
-dependencies needed for
-.B SiSU
-to populate a postgresql database, this being part of
-.B SiSU
-\- man sisu).
-
-.SH
-3.3 SYNOPSIS
-
-.BR
- sisu \-D \ [instruction] \ [filename/wildcard \ if \ required]
-
-.BR
- sisu \-D \-\-pg \-\-[instruction] \ [filename/wildcard \ if \ required]
-
-.SH
-3.4 COMMANDS
-
-.BR
-Mappings to two databases are provided by default, postgresql and sqlite, the
-same commands are used within sisu to construct and populate databases however
-\-d (lowercase) denotes sqlite and \-D (uppercase) denotes postgresql,
-alternatively \-\-sqlite or \-\-pgsql may be used
-
-.BR
-.B \-D or \-\-pgsql
-may be used interchangeably.
-
-.SH
-3.4.1 CREATE AND DESTROY DATABASE
-
-.TP
-.B \ \-\-pgsql \ \-\-createall
-\ initial \ step, \ creates \ required \ relations \ (tables, \ indexes) \ in
-\ existing \ (postgresql) \ database \ (a \ database \ should \ be \ created \
-manually \ and \ given \ the \ same \ name \ as \ working \ directory, \ as \
-requested) \ (rb.dbi) \
-
-.TP
-.B \ sisu \ \-D \ \-\-createdb
-\ creates \ database \ where \ no \ database \ existed \ before \
-
-.TP
-.B \ sisu \ \-D \ \-\-create
-\ creates \ database \ tables \ where \ no \ database \ tables \ existed \
-before \
-
-.TP
-.B \ sisu \ \-D \ \-\-Dropall
-\ destroys \ database \ (including \ all \ its \ content)! \ kills \ data \
-and \ drops \ tables, \ indexes \ and \ database \ associated \ with \ a \
-given \ directory \ (and \ directories \ of \ the \ same \ name). \
-
-.TP
-.B \ sisu \ \-D \ \-\-recreate
-\ destroys \ existing \ database \ and \ builds \ a \ new \ empty \ database
-\ structure \
-
-.SH
-3.4.2 IMPORT AND REMOVE DOCUMENTS
-
-.TP
-.B \ sisu \ \-D \ \-\-import \ \-v \ \ [filename/wildcard]
-populates database with the contents of the file. Imports documents(s)
-specified to a postgresql database (at an object level).
-
-.TP
-.B \ sisu \ \-D \ \-\-update \ \-v \ \ [filename/wildcard]
-updates file contents in database
-
-.TP
-.B \ sisu \ \-D \ \-\-remove \ \-v \ \ [filename/wildcard]
-removes specified document from postgresql database.
-
-.SH
-4. SQLITE
-.BR
-
-.SH
-4.1 NAME
-
-.BR
-.B SiSU
-\- Structured information, Serialized Units \- a document publishing system.
-
-.SH
-4.2 DESCRIPTION
-
-.BR
-Information related to using sqlite with sisu (and related to the sisu_sqlite
-dependency package, which is a dummy package to install dependencies needed for
-.B SiSU
-to populate an sqlite database, this being part of
-.B SiSU
-\- man sisu).
-
-.SH
-4.3 SYNOPSIS
-
-.BR
- sisu \-d \ [instruction] \ [filename/wildcard \ if \ required]
-
-.BR
- sisu \-d \-\-(sqlite|pg) \-\-[instruction] \ [filename/wildcard \ if \
- required]
-
-.SH
-4.4 COMMANDS
-
-.BR
-Mappings to two databases are provided by default, postgresql and sqlite, the
-same commands are used within sisu to construct and populate databases however
-\-d (lowercase) denotes sqlite and \-D (uppercase) denotes postgresql,
-alternatively \-\-sqlite or \-\-pgsql may be used
-
-.BR
-.B \-d or \-\-sqlite
-may be used interchangeably.
-
-.SH
-4.4.1 CREATE AND DESTROY DATABASE
-
-.TP
-.B \ \-\-sqlite \ \-\-createall
-\ initial \ step, \ creates \ required \ relations \ (tables, \ indexes) \ in
-\ existing \ (sqlite) \ database \ (a \ database \ should \ be \ created \
-manually \ and \ given \ the \ same \ name \ as \ working \ directory, \ as \
-requested) \ (rb.dbi) \
-
-.TP
-.B \ sisu \ \-d \ \-\-createdb
-\ creates \ database \ where \ no \ database \ existed \ before \
-
-.TP
-.B \ sisu \ \-d \ \-\-create
-\ creates \ database \ tables \ where \ no \ database \ tables \ existed \
-before \
-
-.TP
-.B \ sisu \ \-d \ \-\-dropall
-\ destroys \ database \ (including \ all \ its \ content)! \ kills \ data \
-and \ drops \ tables, \ indexes \ and \ database \ associated \ with \ a \
-given \ directory \ (and \ directories \ of \ the \ same \ name). \
-
-.TP
-.B \ sisu \ \-d \ \-\-recreate
-\ destroys \ existing \ database \ and \ builds \ a \ new \ empty \ database
-\ structure \
-
-.SH
-4.4.2 IMPORT AND REMOVE DOCUMENTS
-
-.TP
-.B \ sisu \ \-d \ \-\-import \ \-v \ \ [filename/wildcard]
-populates database with the contents of the file. Imports documents(s)
-specified to an sqlite database (at an object level).
-
-.TP
-.B \ sisu \ \-d \ \-\-update \ \-v \ \ [filename/wildcard]
-updates file contents in database
-
-.TP
-.B \ sisu \ \-d \ \-\-remove \ \-v \ \ [filename/wildcard]
-removes specified document from sqlite database.
-
-.SH
-5. INTRODUCTION
-.BR
-
-.SH
-5.1 SEARCH \- DATABASE FRONTEND SAMPLE, UTILISING DATABASE AND SISU FEATURES,
-INCLUDING OBJECT CITATION NUMBERING (BACKEND CURRENTLY POSTGRESQL)
-
-.BR
-Sample search frontend <http://search.sisudoc.org> \ [^3] A small database and
-sample query front\-end (search from) that makes use of the citation system,
-.I object citation numbering
-to demonstrates functionality.[^4]
-
-.BR
-.B SiSU
-can provide information on which documents are matched and at what locations
-within each document the matches are found. These results are relevant across
-all outputs using object citation numbering, which includes html, XML, LaTeX,
-PDF and indeed the SQL database. You can then refer to one of the other outputs
-or in the SQL database expand the text within the matched objects (paragraphs)
-in the documents matched.
-
-.BR
-Note you may set results either for documents matched and object number
-locations within each matched document meeting the search criteria; or display
-the names of the documents matched along with the objects (paragraphs) that
-meet the search criteria.[^5]
-
-.TP
-.B \ sisu \ \-F \ \-\-webserv\-webrick
-\ builds \ a \ cgi \ web \ search \ frontend \ for \ the \ database \ created
-\
-
-.BR
-The following is feedback on the setup on a machine provided by the help
-command:
-
-.BR
- sisu \-\-help sql
-
-
-.nf
- Postgresql
- user: ralph
- current db set: SiSU_sisu
- port: 5432
- dbi connect: DBI:Pg:database=SiSU_sisu;port=5432
- sqlite
- current db set: /home/ralph/sisu_www/sisu/sisu_sqlite.db
- dbi connect DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db
-.fi
-
-.BR
-Note on databases built
-
-.BR
-By default, \ [unless \ otherwise \ specified] databases are built on a
-directory basis, from collections of documents within that directory. The name
-of the directory you choose to work from is used as the database name, i.e. if
-you are working in a directory called /home/ralph/ebook the database SiSU_ebook
-is used. \ [otherwise \ a \ manual \ mapping \ for \ the \ collection \ is \
-necessary]
-
-.SH
-5.2 SEARCH FORM
-
-.TP
-.B \ sisu \ \-F
-\ generates \ a \ sample \ search \ form, \ which \ must \ be \ copied \ to \
-the \ web\-server \ cgi \ directory \
-
-.TP
-.B \ sisu \ \-F \ \-\-webserv\-webrick
-\ generates \ a \ sample \ search \ form \ for \ use \ with \ the \ webrick \
-server, \ which \ must \ be \ copied \ to \ the \ web\-server \ cgi \ directory
-\
-
-.TP
-.B \ sisu \ \-Fv
-\ as \ above, \ and \ provides \ some \ information \ on \ setting \ up \
-hyperestraier \
-
-.TP
-.B \ sisu \ \-W
-\ starts \ the \ webrick \ server \ which \ should \ be \ available \
-wherever \ sisu \ is \ properly \ installed \
-
-.BR
-The generated search form must be copied manually to the webserver directory as
-instructed
-
-.SH
-6. HYPERESTRAIER
-.BR
-
-.BR
-See the documentation for hyperestraier:
-
-.BR
- <http://hyperestraier.sourceforge.net/>
-
-.BR
- /usr/share/doc/hyperestraier/index.html
-
-.BR
- man estcmd
-
-.BR
-on sisu_hyperestraier:
-
-.BR
- man sisu_hyperestraier
-
-.BR
- /usr/share/doc/sisu/sisu_markup/sisu_hyperestraier/index.html
-
-.BR
-NOTE: the examples that follow assume that sisu output is placed in the
-directory /home/ralph/sisu_www
-
-.BR
-(A) to generate the index within the webserver directory to be indexed:
-
-.BR
- estcmd gather \-sd \ [index \ name] \ [directory \ path \ to \ index]
-
-.BR
-the following are examples that will need to be tailored according to your
-needs:
-
-.BR
- cd /home/ralph/sisu_www
-
-.BR
- estcmd gather \-sd casket /home/ralph/sisu_www
-
-.BR
-you may use the \'find\' command together with \'egrep\' to limit indexing to
-particular document collection directories within the web server directory:
-
-.BR
- find /home/ralph/sisu_www \-type f | egrep
- \'/home/ralph/sisu_www/sisu/.+?.html$\' |estcmd gather \-sd casket \-
-
-.BR
-Check which directories in the webserver/output directory (~/sisu_www or
-elsewhere depending on configuration) you wish to include in the search index.
-
-.BR
-As sisu duplicates output in multiple file formats, it it is probably
-preferable to limit the estraier index to html output, and as it may also be
-desirable to exclude files \'plain.txt\', \'toc.html\' and
-\'concordance.html\', as these duplicate information held in other html output
-e.g.
-
-.BR
- find /home/ralph/sisu_www \-type f | egrep
- \'/sisu_www/(sisu|bookmarks)/.+?.html$\' | egrep \-v
- \'(doc|concordance).html$\' |estcmd gather \-sd casket \-
-
-.BR
-from your current document preparation/markup directory, you would construct a
-rune along the following lines:
-
-.BR
- find /home/ralph/sisu_www \-type f | egrep \'/home/ralph/sisu_www/([specify \
- first \ directory \ for \ inclusion]|[specify \ second \ directory \ for \
- inclusion]|[another \ directory \ for \ inclusion? \ \...])/.+?.html$\' |
- egrep \-v \'(doc|concordance).html$\' |estcmd gather \-sd
- /home/ralph/sisu_www/casket \-
-
-.BR
-(B) to set up the search form
-
-.BR
-(i) copy estseek.cgi to your cgi directory and set file permissions to 755:
-
-.BR
- sudo cp \-vi /usr/lib/estraier/estseek.cgi /usr/lib/cgi\-bin
-
-.BR
- sudo chmod \-v 755 /usr/lib/cgi\-bin/estseek.cgi
-
-.BR
- sudo cp \-v /usr/share/hyperestraier/estseek.* /usr/lib/cgi\-bin
-
-.BR
- \ [see \ estraier \ documentation \ for \ paths]
-
-.BR
-(ii) edit estseek.conf, with attention to the lines starting \'indexname:\' and
-\'replace:\':
-
-.BR
- indexname: /home/ralph/sisu_www/casket
-
-.BR
- replace: ^file:///home/ralph/sisu_www{{!}}http://localhost
-
-.BR
- replace: /index.html?${{!}}/
-
-.BR
-(C) to test using webrick, start webrick:
-
-.BR
- sisu \-W
-
-.BR
-and try open the url: <http://localhost:8081/cgi\-bin/estseek.cgi>
-
-.SH
-DOCUMENT INFORMATION (METADATA)
-.BR
-
-.SH
-METADATA
-.BR
-
-.BR
-Document Manifest @
-<http://www.jus.uio.no/sisu/sisu_manual/sisu_search/sisu_manifest.html>
-
-.BR
-.B Dublin Core
-(DC)
-
-.BR
-.I DC tags included with this document are provided here.
-
-.BR
-DC Title:
-.I SiSU \- Search
-
-.BR
-DC Creator:
-.I Ralph Amissah
-
-.BR
-DC Rights:
-.I Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL
-3
-
-.BR
-DC Type:
-.I information
-
-.BR
-DC Date created:
-.I 2002\-08\-28
-
-.BR
-DC Date issued:
-.I 2002\-08\-28
-
-.BR
-DC Date available:
-.I 2002\-08\-28
-
-.BR
-DC Date modified:
-.I 2007\-09\-16
-
-.BR
-DC Date:
-.I 2007\-09\-16
-
-.BR
-.B Version Information
-
-.BR
-Sourcefile:
-.I sisu_search._sst
-
-.BR
-Filetype:
-.I SiSU text insert 0.58
-
-.BR
-Sourcefile Digest, MD5(sisu_search._sst)=
-.I c085c2eb6d68f1b7d50435f673ede407
-
-.BR
-Skin_Digest:
-MD5(/home/ralph/grotto/theatre/dbld/builds/sisu/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)=
-.I 20fc43cf3eb6590bc3399a1aef65c5a9
-
-.BR
-.B Generated
-
-.BR
-Document (metaverse) last generated:
-.I Tue Sep 25 02:54:48 +0100 2007
-
-.BR
-Generated by:
-.I SiSU
-.I 0.59.1
-of 2007w39/2 (2007\-09\-25)
-
-.BR
-Ruby version:
-.I ruby 1.8.6 (2007\-06\-07 patchlevel 36) \ [i486\-linux]
-
-.TP
-.BI 1.
-<http://www.postgresql.org/>
- <http://advocacy.postgresql.org/>
- <http://en.wikipedia.org/wiki/Postgresql>
-.TP
-.BI 2.
-<http://www.hwaci.com/sw/sqlite/>
- <http://en.wikipedia.org/wiki/Sqlite>
-.TP
-.BI 3.
-<http://search.sisudoc.org>
-.TP
-.BI 4.
-(which could be extended further with current back-end). As regards scaling
-of the database, it is as scalable as the database (here Postgresql) and
-hardware allow.
-.TP
-.BI 5.
-of this feature when demonstrated to an IBM software innovations evaluator in
-2004 he said to paraphrase: this could be of interest to us. We have large
-document management systems, you can search hundreds of thousands of documents
-and we can tell you which documents meet your search criteria, but there is no
-way we can tell you without opening each document where within each your
-matches are found.
-
-.TP
-Other versions of this document:
-.TP
-manifest: <http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html>
-.TP
-html: <http://www.jus.uio.no/sisu/sisu_search/toc.html>
-.TP
-pdf: <http://www.jus.uio.no/sisu/sisu_search/portrait.pdf>
-.TP
-pdf: <http://www.jus.uio.no/sisu/sisu_search/landscape.pdf>
-." .TP
-." manpage: http://www.jus.uio.no/sisu/sisu_search/sisu_search.1
-.TP
-at: <http://www.jus.uio.no/sisu>
-.TP
-.TP
-* Generated by: SiSU 0.59.1 of 2007w39/2 (2007-09-25)
-.TP
-* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux]
-.TP
-* Last Generated on: Tue Sep 25 02:54:52 +0100 2007
-.TP
-* SiSU http://www.jus.uio.no/sisu