From 50d45c6deb0afd2e4222d2e33a45487a9d1fa676 Mon Sep 17 00:00:00 2001 From: Ralph Amissah Date: Sun, 23 Sep 2007 05:16:21 +0100 Subject: primarily todo with sisu documentation, changelog reproduced below: * start documenting sisu using sisu * sisu markup source files in data/doc/sisu/sisu_markup_samples/sisu_manual/ /usr/share/doc/sisu/sisu_markup_samples/sisu_manual/ * default output [sisu -3] in data/doc/manuals_generated/sisu_manual/ /usr/share/doc/manuals_generated/sisu_manual/ (adds substantially to the size of sisu package!) * help related edits * manpage, work on ability to generate manpages, improved * param, exclude footnote mark count when occurs within code block * plaintext changes made * shared_txt, line wrap visited * file:// link option introduced (in addition to existing https?:// and ftp://) a bit arbitrarily, diff here, [double check changes in sysenv and hub] * minor adjustments * html url match refinement * css added tiny_center * plaintext * endnotes fix * footnote adjustment to make more easily distinguishable from substantive text * flag -a only [flags -A -e -E dropped] controlled by modifiers --unix/msdos --footnote/endnote * defaults, homepage * renamed homepage (instead of index) implications for modifying skins, which need likewise to have any homepage entry renamed * added link to sisu_manual in homepage * css the css for the default homepage is renamed homepage.css (instead of index.css) [consider removing this and relying on html.css] * ruby version < ruby1.9 * place stop on installation and working with for now [ruby String.strip broken in ruby 1.9.0 (2007-09-10 patchlevel 0) [i486-linux], 2007-09-18:38/2] * debian/control restrict use to ruby > 1.8.4 and ruby < 1.9 * debian * debian/control restrict use to ruby > 1.8.4 and ruby < 1.9 * sisu-doc new sub-package for sisu documentation debian/control and sisu-doc.install --- man/man8/sisu_search.8 | 639 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 639 insertions(+) create mode 100644 man/man8/sisu_search.8 (limited to 'man/man8/sisu_search.8') diff --git a/man/man8/sisu_search.8 b/man/man8/sisu_search.8 new file mode 100644 index 00000000..039f6e7c --- /dev/null +++ b/man/man8/sisu_search.8 @@ -0,0 +1,639 @@ +.TH "sisu_search" "1" "2007-09-16" "0.58.3" "SiSU - SiSU information Structuring Universe" +.SH +SISU \- SISU INFORMATION STRUCTURING UNIVERSE \- SEARCH \ [0.58], +RALPH AMISSAH +.BR + +.SH +SISU SEARCH +.BR + +.SH +1. SISU SEARCH \- INTRODUCTION +.BR + +.BR +.B SiSU +output can easily and conveniently be indexed by a number of standalone +indexing tools, such as Lucene, Hyperestraier. + +.BR +Because the document structure of sites created is clearly defined, and the +text object citation system is available hypothetically at least, for all forms +of output, it is possible to search the sql database, and either read results +from that database, or just as simply map the results to the html output, which +has richer text markup. + +.BR +In addition to this +.B SiSU +has the ability to populate a relational sql type database with documents at +an object level, with objects numbers that are shared across different output +types, which make them searchable with that degree of granularity. Basically, +your match criteria is met by these documents and at these locations within +each document, which can be viewed within the database directly or in various +output formats. + +.SH +2. SQL +.BR + +.SH +2.1 POPULATING SQL TYPE DATABASES + +.BR +.B SiSU +feeds sisu markupd documents into sql type databases PostgreSQL[^1] and/or +SQLite[^2] database together with information related to document structure. + +.BR +This is one of the more interesting output forms, as all the structural data of +the documents are retained (though can be ignored by the user of the database +should they so choose). All site texts/documents are (currently) streamed to +four tables: + +.BR + * one containing semantic (and other) headers, including, title, author, + subject, (the Dublin Core...); + +.BR + * another the substantive texts by individual \"paragraph\" (or object) \- + along with structural information, each paragraph being identifiable by its + paragraph number (if it has one which almost all of them do), and the + substantive text of each paragraph quite naturally being searchable (both in + formatted and clean text versions for searching); and + +.BR + * a third containing endnotes cross\-referenced back to the paragraph from + which they are referenced (both in formatted and clean text versions for + searching). + +.BR + * a fourth table with a one to one relation with the headers table contains + full text versions of output, eg. pdf, html, xml, and ascii. + +.BR +There is of course the possibility to add further structures. + +.BR +At this level +.B SiSU +loads a relational database with documents chunked into objects, their +smallest logical structurally constituent parts, as text objects, with their +object citation number and all other structural information needed to construct +the document. Text is stored (at this text object level) with and without +elementary markup tagging, the stripped version being so as to facilitate ease +of searching. + +.BR +Being able to search a relational database at an object level with the +.B SiSU +citation system is an effective way of locating content generated by +.B SiSU +. As individual text objects of a document stored (and indexed) together with +object numbers, and all versions of the document have the same numbering, +complex searches can be tailored to return just the locations of the search +results relevant for all available output formats, with live links to the +precise locations in the database or in html/xml documents; or, the structural +information provided makes it possible to search the full contents of the +database and have headings in which search content appears, or to search only +headings etc. (as the Dublin Core is incorporated it is easy to make use of +that as well). + +.SH +3. POSTGRESQL +.BR + +.SH +3.1 NAME + +.BR +.B SiSU +\- Structured information, Serialized Units \- a document publishing system, +postgresql dependency package + +.SH +3.2 DESCRIPTION + +.BR +Information related to using postgresql with sisu (and related to the +sisu_postgresql dependency package, which is a dummy package to install +dependencies needed for +.B SiSU +to populate a postgresql database, this being part of +.B SiSU +\- man sisu). + +.SH +3.3 SYNOPSIS + +.BR + sisu \-D \ [instruction] \ [filename/wildcard \ if \ required] + +.BR + sisu \-D \-\-pg \-\-[instruction] \ [filename/wildcard \ if \ required] + +.SH +3.4 COMMANDS + +.BR +Mappings to two databases are provided by default, postgresql and sqlite, the +same commands are used within sisu to construct and populate databases however +\-d (lowercase) denotes sqlite and \-D (uppercase) denotes postgresql, +alternatively \-\-sqlite or \-\-pgsql may be used + +.BR +.B \-D or \-\-pgsql +may be used interchangeably. + +.SH +3.4.1 CREATE AND DESTROY DATABASE + +.TP +.B \ \-\-pgsql \ \-\-createall +\ initial \ step, \ creates \ required \ relations \ (tables, \ indexes) \ in +\ existing \ (postgresql) \ database \ (a \ database \ should \ be \ created \ +manually \ and \ given \ the \ same \ name \ as \ working \ directory, \ as \ +requested) \ (rb.dbi) \ + +.TP +.B \ sisu \ \-D \ \-\-createdb +\ creates \ database \ where \ no \ database \ existed \ before \ + +.TP +.B \ sisu \ \-D \ \-\-create +\ creates \ database \ tables \ where \ no \ database \ tables \ existed \ +before \ + +.TP +.B \ sisu \ \-D \ \-\-Dropall +\ destroys \ database \ (including \ all \ its \ content)! \ kills \ data \ +and \ drops \ tables, \ indexes \ and \ database \ associated \ with \ a \ +given \ directory \ (and \ directories \ of \ the \ same \ name). \ + +.TP +.B \ sisu \ \-D \ \-\-recreate +\ destroys \ existing \ database \ and \ builds \ a \ new \ empty \ database +\ structure \ + +.SH +3.4.2 IMPORT AND REMOVE DOCUMENTS + +.TP +.B \ sisu \ \-D \ \-\-import \ \-v \ \ [filename/wildcard] +populates database with the contents of the file. Imports documents(s) +specified to a postgresql database (at an object level). + +.TP +.B \ sisu \ \-D \ \-\-update \ \-v \ \ [filename/wildcard] +updates file contents in database + +.TP +.B \ sisu \ \-D \ \-\-remove \ \-v \ \ [filename/wildcard] +removes specified document from postgresql database. + +.SH +4. SQLITE +.BR + +.SH +4.1 NAME + +.BR +.B SiSU +\- Structured information, Serialized Units \- a document publishing system. + +.SH +4.2 DESCRIPTION + +.BR +Information related to using sqlite with sisu (and related to the sisu_sqlite +dependency package, which is a dummy package to install dependencies needed for +.B SiSU +to populate an sqlite database, this being part of +.B SiSU +\- man sisu). + +.SH +4.3 SYNOPSIS + +.BR + sisu \-d \ [instruction] \ [filename/wildcard \ if \ required] + +.BR + sisu \-d \-\-(sqlite|pg) \-\-[instruction] \ [filename/wildcard \ if \ + required] + +.SH +4.4 COMMANDS + +.BR +Mappings to two databases are provided by default, postgresql and sqlite, the +same commands are used within sisu to construct and populate databases however +\-d (lowercase) denotes sqlite and \-D (uppercase) denotes postgresql, +alternatively \-\-sqlite or \-\-pgsql may be used + +.BR +.B \-d or \-\-sqlite +may be used interchangeably. + +.SH +4.4.1 CREATE AND DESTROY DATABASE + +.TP +.B \ \-\-sqlite \ \-\-createall +\ initial \ step, \ creates \ required \ relations \ (tables, \ indexes) \ in +\ existing \ (sqlite) \ database \ (a \ database \ should \ be \ created \ +manually \ and \ given \ the \ same \ name \ as \ working \ directory, \ as \ +requested) \ (rb.dbi) \ + +.TP +.B \ sisu \ \-d \ \-\-createdb +\ creates \ database \ where \ no \ database \ existed \ before \ + +.TP +.B \ sisu \ \-d \ \-\-create +\ creates \ database \ tables \ where \ no \ database \ tables \ existed \ +before \ + +.TP +.B \ sisu \ \-d \ \-\-dropall +\ destroys \ database \ (including \ all \ its \ content)! \ kills \ data \ +and \ drops \ tables, \ indexes \ and \ database \ associated \ with \ a \ +given \ directory \ (and \ directories \ of \ the \ same \ name). \ + +.TP +.B \ sisu \ \-d \ \-\-recreate +\ destroys \ existing \ database \ and \ builds \ a \ new \ empty \ database +\ structure \ + +.SH +4.4.2 IMPORT AND REMOVE DOCUMENTS + +.TP +.B \ sisu \ \-d \ \-\-import \ \-v \ \ [filename/wildcard] +populates database with the contents of the file. Imports documents(s) +specified to an sqlite database (at an object level). + +.TP +.B \ sisu \ \-d \ \-\-update \ \-v \ \ [filename/wildcard] +updates file contents in database + +.TP +.B \ sisu \ \-d \ \-\-remove \ \-v \ \ [filename/wildcard] +removes specified document from sqlite database. + +.SH +5. INTRODUCTION +.BR + +.SH +5.1 SEARCH \- DATABASE FRONTEND SAMPLE, UTILISING DATABASE AND SISU FEATURES, +INCLUDING OBJECT CITATION NUMBERING (BACKEND CURRENTLY POSTGRESQL) + +.BR +Sample search frontend \ [^3] A small database and +sample query front\-end (search from) that makes use of the citation system, +.I object citation numbering +to demonstrates functionality.[^4] + +.BR +.B SiSU +can provide information on which documents are matched and at what locations +within each document the matches are found. These results are relevant across +all outputs using object citation numbering, which includes html, XML, LaTeX, +PDF and indeed the SQL database. You can then refer to one of the other outputs +or in the SQL database expand the text within the matched objects (paragraphs) +in the documents matched. + +.BR +Note you may set results either for documents matched and object number +locations within each matched document meeting the search criteria; or display +the names of the documents matched along with the objects (paragraphs) that +meet the search criteria.[^5] + +.TP +.B \ sisu \ \-F \ \-\-webserv\-webrick +\ builds \ a \ cgi \ web \ search \ frontend \ for \ the \ database \ created +\ + +.BR +The following is feedback on the setup on a machine provided by the help +command: + +.BR + sisu \-\-help sql + + +.nf + Postgresql + user: ralph + current db set: SiSU_sisu + port: 5432 + dbi connect: DBI:Pg:database=SiSU_sisu;port=5432 + sqlite + current db set: /home/ralph/sisu_www/sisu/sisu_sqlite.db + dbi connect DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db +.fi + +.BR +Note on databases built + +.BR +By default, \ [unless \ otherwise \ specified] databases are built on a +directory basis, from collections of documents within that directory. The name +of the directory you choose to work from is used as the database name, i.e. if +you are working in a directory called /home/ralph/ebook the database SiSU_ebook +is used. \ [otherwise \ a \ manual \ mapping \ for \ the \ collection \ is \ +necessary] + +.SH +5.2 SEARCH FORM + +.TP +.B \ sisu \ \-F +\ generates \ a \ sample \ search \ form, \ which \ must \ be \ copied \ to \ +the \ web\-server \ cgi \ directory \ + +.TP +.B \ sisu \ \-F \ \-\-webserv\-webrick +\ generates \ a \ sample \ search \ form \ for \ use \ with \ the \ webrick \ +server, \ which \ must \ be \ copied \ to \ the \ web\-server \ cgi \ directory +\ + +.TP +.B \ sisu \ \-Fv +\ as \ above, \ and \ provides \ some \ information \ on \ setting \ up \ +hyperestraier \ + +.TP +.B \ sisu \ \-W +\ starts \ the \ webrick \ server \ which \ should \ be \ available \ +wherever \ sisu \ is \ properly \ installed \ + +.BR +The generated search form must be copied manually to the webserver directory as +instructed + +.SH +6. HYPERESTRAIER +.BR + +.BR +See the documentation for hyperestraier: + +.BR + + +.BR + /usr/share/doc/hyperestraier/index.html + +.BR + man estcmd + +.BR +on sisu_hyperestraier: + +.BR + man sisu_hyperestraier + +.BR + /usr/share/doc/sisu/sisu_markup/sisu_hyperestraier/index.html + +.BR +NOTE: the examples that follow assume that sisu output is placed in the +directory /home/ralph/sisu_www + +.BR +(A) to generate the index within the webserver directory to be indexed: + +.BR + estcmd gather \-sd \ [index \ name] \ [directory \ path \ to \ index] + +.BR +the following are examples that will need to be tailored according to your +needs: + +.BR + cd /home/ralph/sisu_www + +.BR + estcmd gather \-sd casket /home/ralph/sisu_www + +.BR +you may use the \'find\' command together with \'egrep\' to limit indexing to +particular document collection directories within the web server directory: + +.BR + find /home/ralph/sisu_www \-type f | egrep + \'/home/ralph/sisu_www/sisu/.+?.html$\' |estcmd gather \-sd casket \- + +.BR +Check which directories in the webserver/output directory (~/sisu_www or +elsewhere depending on configuration) you wish to include in the search index. + +.BR +As sisu duplicates output in multiple file formats, it it is probably +preferable to limit the estraier index to html output, and as it may also be +desirable to exclude files \'plain.txt\', \'toc.html\' and +\'concordance.html\', as these duplicate information held in other html output +e.g. + +.BR + find /home/ralph/sisu_www \-type f | egrep + \'/sisu_www/(sisu|bookmarks)/.+?.html$\' | egrep \-v + \'(doc|concordance).html$\' |estcmd gather \-sd casket \- + +.BR +from your current document preparation/markup directory, you would construct a +rune along the following lines: + +.BR + find /home/ralph/sisu_www \-type f | egrep \'/home/ralph/sisu_www/([specify \ + first \ directory \ for \ inclusion]|[specify \ second \ directory \ for \ + inclusion]|[another \ directory \ for \ inclusion? \ ...])/.+?.html$\' | + egrep \-v \'(doc|concordance).html$\' |estcmd gather \-sd + /home/ralph/sisu_www/casket \- + +.BR +(B) to set up the search form + +.BR +(i) copy estseek.cgi to your cgi directory and set file permissions to 755: + +.BR + sudo cp \-vi /usr/lib/estraier/estseek.cgi /usr/lib/cgi\-bin + +.BR + sudo chmod \-v 755 /usr/lib/cgi\-bin/estseek.cgi + +.BR + sudo cp \-v /usr/share/hyperestraier/estseek.* /usr/lib/cgi\-bin + +.BR + \ [see \ estraier \ documentation \ for \ paths] + +.BR +(ii) edit estseek.conf, with attention to the lines starting \'indexname:\' and +\'replace:\': + +.BR + indexname: /home/ralph/sisu_www/casket + +.BR + replace: ^file:///home/ralph/sisu_www{{!}}http://localhost + +.BR + replace: /index.html?${{!}}/ + +.BR +(C) to test using webrick, start webrick: + +.BR + sisu \-W + +.BR +and try open the url: + +.SH +DOCUMENT INFORMATION (METADATA) +.BR + +.SH +METADATA +.BR + +.BR +Document Manifest @ + + +.BR +.B Dublin Core +(DC) + +.BR +.I DC tags included with this document are provided here. + +.BR +DC Title: +.I SiSU \- SiSU information Structuring Universe \- Search \ [0.58] + +.BR +DC Creator: +.I Ralph Amissah + +.BR +DC Rights: +.I Copyright (C) Ralph Amissah 2007, part of SiSU documentation, License GPL +3 + +.BR +DC Type: +.I information + +.BR +DC Date created: +.I 2002\-08\-28 + +.BR +DC Date issued: +.I 2002\-08\-28 + +.BR +DC Date available: +.I 2002\-08\-28 + +.BR +DC Date modified: +.I 2007\-09\-16 + +.BR +DC Date: +.I 2007\-09\-16 + +.BR +.B Version Information + +.BR +Sourcefile: +.I sisu_search._sst + +.BR +Filetype: +.I SiSU text insert 0.58 + +.BR +Sourcefile Digest, MD5(sisu_search._sst)= +.I 52c1d6d3c3082e6b236c65debc733a05 + +.BR +Skin_Digest: +MD5(/home/ralph/grotto/theatre/dbld/sisu\-dev/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)= +.I 20fc43cf3eb6590bc3399a1aef65c5a9 + +.BR +.B Generated + +.BR +Document (metaverse) last generated: +.I Sun Sep 23 01:14:04 +0100 2007 + +.BR +Generated by: +.I SiSU +.I 0.58.3 +of 2007w36/4 (2007\-09\-06) + +.BR +Ruby version: +.I ruby 1.8.6 (2007\-06\-07 patchlevel 36) \ [i486\-linux] + +.TP +.BI 1. + + + +.TP +.BI 2. + + +.TP +.BI 3. + +.TP +.BI 4. +(which could be extended further with current back-end). As regards scaling +of the database, it is as scalable as the database (here Postgresql) and +hardware allow. +.TP +.BI 5. +of this feature when demonstrated to an IBM software innovations evaluator in +2004 he said to paraphrase: this could be of interest to us. We have large +document management systems, you can search hundreds of thousands of documents +and we can tell you which documents meet your search criteria, but there is no +way we can tell you without opening each document where within each your +matches are found. + +.TP +Other versions of this document: +.TP +manifest: +.TP +html: +.TP +pdf: +.TP +pdf: +." .TP +." manpage: http://www.jus.uio.no/sisu/sisu_search/sisu_search.1 +.TP +at: +.TP +.TP +* Generated by: SiSU 0.58.3 of 2007w36/4 (2007-09-06) +.TP +* Ruby version: ruby 1.8.6 (2007-06-07 patchlevel 36) [i486-linux] +.TP +* Last Generated on: Sun Sep 23 01:14:07 +0100 2007 +.TP +* SiSU http://www.jus.uio.no/sisu -- cgit v1.2.3