diff options
Diffstat (limited to 'data/doc/sisu/html/sisu_search.8.html')
-rw-r--r-- | data/doc/sisu/html/sisu_search.8.html | 513 |
1 files changed, 513 insertions, 0 deletions
diff --git a/data/doc/sisu/html/sisu_search.8.html b/data/doc/sisu/html/sisu_search.8.html new file mode 100644 index 00000000..e7ec9e8a --- /dev/null +++ b/data/doc/sisu/html/sisu_search.8.html @@ -0,0 +1,513 @@ +<!-- manual page source format generated by PolyglotMan v3.2, --> +<!-- available at http://polyglotman.sourceforge.net/ --> + +<html> +<head> +<title>"sisu_search"("1") manual page</title> +</head> +<body bgcolor='white'> +<a href='#toc'>Table of Contents</a><p> +SISU - SISU INFORMATION STRUCTURING UNIVERSE - SEARCH [0.58], RALPH AMISSAH + +<p> SISU SEARCH +<p> 1. SISU SEARCH - INTRODUCTION +<p> <b>SiSU</b> output can easily and +conveniently be indexed by a number of standalone indexing tools, such +as Lucene, Hyperestraier. +<p> Because the document structure of sites created +is clearly defined, and the text object citation system is available hypothetically +at least, for all forms of output, it is possible to search the sql database, +and either read results from that database, or just as simply map the results +to the html output, which has richer text markup. +<p> In addition to this +<b>SiSU</b> has the ability to populate a relational sql type database with documents +at an object level, with objects numbers that are shared across different +output types, which make them searchable with that degree of granularity. +Basically, your match criteria is met by these documents and at these locations +within each document, which can be viewed within the database directly +or in various output formats. +<p> 2. SQL +<p> 2.1 POPULATING SQL TYPE DATABASES + +<p> <b>SiSU</b> feeds sisu markupd documents into sql type databases PostgreSQL[^1] +and/or SQLite[^2] database together with information related to document +structure. +<p> This is one of the more interesting output forms, as all the +structural data of the documents are retained (though can be ignored by +the user of the database should they so choose). All site texts/documents +are (currently) streamed to four tables: +<p> * one containing semantic +(and other) headers, including, title, author,<br> + subject, (the Dublin Core...);<br> + +<p> * another the substantive texts by individual<br> + along with structural information, each paragraph being identifiable +by its<br> + paragraph number (if it has one which almost all of them do), and the<br> + substantive text of each paragraph quite naturally being searchable +(both in<br> + formatted and clean text versions for searching); and<br> + +<p> * a third containing endnotes cross-referenced back to the paragraph +from<br> + which they are referenced (both in formatted and clean text versions +for<br> + searching).<br> + +<p> * a fourth table with a one to one relation with the headers table +contains<br> + full text versions of output, eg. pdf, html, xml, and ascii.<br> + +<p> There is of course the possibility to add further structures. +<p> At this +level <b>SiSU</b> loads a relational database with documents chunked into objects, +their smallest logical structurally constituent parts, as text objects, +with their object citation number and all other structural information +needed to construct the document. Text is stored (at this text object level) +with and without elementary markup tagging, the stripped version being +so as to facilitate ease of searching. +<p> Being able to search a relational +database at an object level with the <b>SiSU</b> citation system is an effective +way of locating content generated by <b>SiSU</b> object numbers, and all versions +of the document have the same numbering, complex searches can be tailored +to return just the locations of the search results relevant for all available +output formats, with live links to the precise locations in the database +or in html/xml documents; or, the structural information provided makes +it possible to search the full contents of the database and have headings +in which search content appears, or to search only headings etc. (as the +Dublin Core is incorporated it is easy to make use of that as well). +<p> 3. +POSTGRESQL +<p> 3.1 NAME +<p> <b>SiSU</b> - Structured information, Serialized Units - +a document publishing system, postgresql dependency package +<p> 3.2 DESCRIPTION + +<p> Information related to using postgresql with sisu (and related to the +sisu_postgresql dependency package, which is a dummy package to install +dependencies needed for <b>SiSU</b> to populate a postgresql database, this being +part of <b>SiSU</b> - man sisu). +<p> 3.3 SYNOPSIS +<p> sisu -D [instruction] [filename/wildcard + if required]<br> + +<p> sisu -D --pg --[instruction] [filename/wildcard if required]<br> + +<p> 3.4 COMMANDS +<p> Mappings to two databases are provided by default, postgresql +and sqlite, the same commands are used within sisu to construct and populate +databases however -d (lowercase) denotes sqlite and -D (uppercase) denotes +postgresql, alternatively --sqlite or --pgsql may be used +<p> <b>-D or --pgsql</b> may +be used interchangeably. +<p> 3.4.1 CREATE AND DESTROY DATABASE +<p> +<dl> + +<dt><b> --pgsql --createall</b> +</dt> +<dd> initial step, creates required relations (tables, indexes) in existing + (postgresql) database (a database should be created manually and given + the same name as working directory, as requested) (rb.dbi) the same name + as working directory, as +<p> </dd> + +<dt><b> sisu -D --createdb</b> </dt> +<dd> creates database where no database + existed before as +<p> </dd> + +<dt><b> sisu -D --create</b> </dt> +<dd> creates database tables where no database + tables existed before database tables where no database tables existed + +<p> </dd> + +<dt><b> sisu -D --Dropall</b> </dt> +<dd> destroys database (including all its content)! kills data +and drops tables, indexes and database associated with a given directory + (and directories of the same name). a +<p> </dd> + +<dt><b> sisu -D --recreate</b> </dt> +<dd> destroys existing + +<p> database and builds a new empty database structure +<p> </dd> +</dl> +3.4.2 IMPORT AND REMOVE + +<p>DOCUMENTS +<p> +<dl> + +<dt><b> sisu -D --import -v [filename/wildcard]</b> </dt> +<dd>populates database with +the contents of the file. Imports documents(s) specified to a postgresql +database (at an object level). +<p> </dd> + +<dt><b> sisu -D --update -v [filename/wildcard]</b> </dt> +<dd>updates + +<p>file contents in database +<p> </dd> + +<dt><b> sisu -D --remove -v [filename/wildcard]</b> </dt> +<dd>removes +specified document from postgresql database. +<p> </dd> +</dl> +4. SQLITE +<p> 4.1 NAME +<p> <b>SiSU</b> +- Structured information, Serialized Units - a document publishing system. + +<p> 4.2 DESCRIPTION +<p> Information related to using sqlite with sisu (and related +to the sisu_sqlite dependency package, which is a dummy package to install +dependencies needed for <b>SiSU</b> to populate an sqlite database, this being +part of <b>SiSU</b> - man sisu). +<p> 4.3 SYNOPSIS +<p> sisu -d [instruction] [filename/wildcard + if required]<br> + +<p> sisu -d --(sqlite|pg) --[instruction] [filename/wildcard if <br> + required]<br> + +<p> 4.4 COMMANDS +<p> Mappings to two databases are provided by default, postgresql +and sqlite, the same commands are used within sisu to construct and populate +databases however -d (lowercase) denotes sqlite and -D (uppercase) denotes +postgresql, alternatively --sqlite or --pgsql may be used +<p> <b>-d or --sqlite</b> may +be used interchangeably. +<p> 4.4.1 CREATE AND DESTROY DATABASE +<p> +<dl> + +<dt><b> --sqlite --createall</b> +</dt> +<dd> initial step, creates required relations (tables, indexes) in existing + (sqlite) database (a database should be created as requested) (rb.dbi) the + same name as working directory, as +<p> </dd> + +<dt><b> sisu -d --createdb</b> </dt> +<dd> creates database where + no database existed before as +<p> </dd> + +<dt><b> sisu -d --create</b> </dt> +<dd> creates database tables where + no database tables existed before database tables where no database tables + existed +<p> </dd> + +<dt><b> sisu -d --dropall</b> </dt> +<dd> destroys database (including all its content)! + kills data and drops tables, indexes and database associated with a given + directory (and directories of the same name). a +<p> </dd> + +<dt><b> sisu -d --recreate</b> </dt> +<dd> destroys + +<p> existing database and builds a new empty database structure +<p> </dd> +</dl> +4.4.2 IMPORT + +<p>AND REMOVE DOCUMENTS +<p> +<dl> + +<dt><b> sisu -d --import -v [filename/wildcard]</b> </dt> +<dd>populates database +with the contents of the file. Imports documents(s) specified to an sqlite +database (at an object level). +<p> </dd> + +<dt><b> sisu -d --update -v [filename/wildcard]</b> </dt> +<dd>updates + +<p>file contents in database +<p> </dd> + +<dt><b> sisu -d --remove -v [filename/wildcard]</b> </dt> +<dd>removes +specified document from sqlite database. +<p> </dd> +</dl> +5. INTRODUCTION +<p> 5.1 SEARCH - DATABASE +FRONTEND SAMPLE, UTILISING DATABASE AND SISU FEATURES, INCLUDING OBJECT +CITATION NUMBERING (BACKEND CURRENTLY POSTGRESQL) +<p> Sample search frontend +<<a href='http://search.sisudoc.org'>http://search.sisudoc.org</a> +> [^3] A small database and sample query front-end +(search from) that makes use of the citation system, <i>object</i> citation numbering +to demonstrates functionality.[^4] +<p> <b>SiSU</b> can provide information on which +documents are matched and at what locations within each document the matches +are found. These results are relevant across all outputs using object citation +numbering, which includes html, XML, LaTeX, PDF and indeed the SQL database. +You can then refer to one of the other outputs or in the SQL database expand +the text within the matched objects (paragraphs) in the documents matched. + +<p> Note you may set results either for documents matched and object number +locations within each matched document meeting the search criteria; or +display the names of the documents matched along with the objects (paragraphs) +that meet the search criteria.[^5] +<p> +<dl> + +<dt><b> sisu -F --webserv-webrick</b> </dt> +<dd> builds a cgi web + +<p> search frontend for the database created +<p> The following is feedback on +the setup on a machine provided by the help command: +<p> sisu --help sql<br> + +<p> +<p> <br> +<pre> Postgresql + user: ralph + current db set: SiSU_sisu + port: 5432 + dbi connect: DBI:Pg:database=SiSU_sisu;port=5432 + sqlite + current db set: /home/ralph/sisu_www/sisu/sisu_sqlite.db + dbi connect DBI:SQLite:/home/ralph/sisu_www/sisu/sisu_sqlite.db +</pre> +<p> Note on databases built +<p> By default, [unless otherwise specified] databases +are built on a directory basis, from collections of documents within that +directory. The name of the directory you choose to work from is used as +the database name, i.e. if you are working in a directory called /home/ralph/ebook +the database SiSU_ebook is used. [otherwise a manual mapping for the collection + is +<p> </dd> +</dl> +5.2 SEARCH FORM +<p> +<dl> + +<dt><b> sisu -F</b> </dt> +<dd> generates a sample search form, which must be + copied to which must be copied to +<p> </dd> + +<dt><b> sisu -F --webserv-webrick</b> </dt> +<dd> generates a sample + search form for use with the webrick which must be copied to the web-server + cgi directory which must be copied to the web-server cgi directory +<p> </dd> + +<dt><b> sisu + -Fv</b> </dt> +<dd> as above, and provides some information on setting up +<p> </dd> + +<dt><b> sisu -W</b> </dt> +<dd> starts + +<p> the webrick server which should be available +<p> The generated search form + +<p>must be copied manually to the webserver directory as instructed +<p> </dd> +</dl> +6. HYPERESTRAIER + +<p> See the documentation for hyperestraier: +<p> <<a href='http://hyperestraier.sourceforge.net/'>http://hyperestraier.sourceforge.net/</a> +><br> + +<p> /usr/share/doc/hyperestraier/index.html<br> + +<p> man estcmd<br> + +<p> on sisu_hyperestraier: +<p> man sisu_hyperestraier<br> + +<p> /usr/share/doc/sisu/sisu_markup/sisu_hyperestraier/index.html<br> + +<p> NOTE: the examples that follow assume that sisu output is placed in + +<p>the directory /home/ralph/sisu_www +<p> (A) to generate the index within the +webserver directory to be indexed: +<p> estcmd gather -sd [index name] [directory + path to index]<br> + +<p> the following are examples that will need to be tailored according to +your needs: +<p> cd /home/ralph/sisu_www<br> + +<p> estcmd gather -sd casket /home/ralph/sisu_www<br> + +<p> you may use the ’find’ command together with ’egrep’ to limit indexing to +particular document collection directories within the web server directory: + +<p> find /home/ralph/sisu_www -type f | egrep<br> + ’/home/ralph/sisu_www/sisu/.+?.html$’ |estcmd gather -sd casket -<br> + +<p> Check which directories in the webserver/output directory (~/sisu_www +or elsewhere depending on configuration) you wish to include in the search +index. +<p> As sisu duplicates output in multiple file formats, it it is probably +preferable to limit the estraier index to html output, and as it may also +be desirable to exclude files ’plain.txt’, ’toc.html’ and ’concordance.html’, as +these duplicate information held in other html output e.g. +<p> find /home/ralph/sisu_www +-type f | egrep<br> + ’/sisu_www/(sisu|bookmarks)/.+?.html$’ | egrep -v<br> + ’(doc|concordance).html$’ |estcmd gather -sd casket -<br> + +<p> from your current document preparation/markup directory, you would construct +a rune along the following lines: +<p> find /home/ralph/sisu_www -type f +| egrep ’/home/ralph/sisu_www/([specify Universe"<br> + first directory for inclusion]|[specify second directory for Universe"<br> + inclusion]|[another directory for inclusion? ...])/.+?.html$’ |<br> + egrep -v ’(doc|concordance).html$’ |estcmd gather -sd<br> + /home/ralph/sisu_www/casket -<br> + +<p> (B) to set up the search form +<p> (i) copy estseek.cgi to your cgi directory +and set file permissions to 755: +<p> sudo cp -vi /usr/lib/estraier/estseek.cgi +/usr/lib/cgi-bin<br> + +<p> sudo chmod -v 755 /usr/lib/cgi-bin/estseek.cgi<br> + +<p> sudo cp -v /usr/share/hyperestraier/estseek.* /usr/lib/cgi-bin<br> + +<p> [see estraier documentation for paths]<br> + +<p> (ii) edit estseek.conf, with attention to the lines starting ’indexname:’ +and ’replace:’: +<p> indexname: /home/ralph/sisu_www/casket<br> + +<p> replace: ^file:///home/ralph/sisu_www{{!}}<a href='http://localhost'>http://localhost</a> +<br> + +<p> replace: /index.html?${{!}}/<br> + +<p> (C) to test using webrick, start webrick: +<p> sisu -W<br> + +<p> and try open the url: <<a href='http://localhost:8081/cgi-bin/estseek.cgi'>http://localhost:8081/cgi-bin/estseek.cgi</a> +> +<p> DOCUMENT +INFORMATION (METADATA) +<p> METADATA +<p> Document Manifest @ <<a href='http://www.jus.uio.no/sisu/sisu_manual/sisu_search/sisu_manifest.html'>http://www.jus.uio.no/sisu/sisu_manual/sisu_search/sisu_manifest.html</a> +> + +<p> <b>Dublin Core</b> (DC) +<p> <i>DC</i> tags included with this document are provided here. + +<p> DC Title: <i>SiSU</i> - SiSU information Structuring Universe - Search [0.58] +<p> + DC Creator: <i>Ralph</i> Amissah +<p> DC Rights: <i>Copyright</i> (C) Ralph Amissah 2007, +part of SiSU documentation, License GPL 3 +<p> DC Type: <i>information</i> +<p> DC Date +created: <i>2002-08-28</i> +<p> DC Date issued: <i>2002-08-28</i> +<p> DC Date available: <i>2002-08-28</i> + +<p> DC Date modified: <i>2007-09-16</i> +<p> DC Date: <i>2007-09-16</i> +<p> <b>Version Information</b> + +<p> Sourcefile: <i>sisu_search._sst</i> +<p> Filetype: <i>SiSU</i> text insert 0.58 +<p> Sourcefile +Digest, MD5(sisu_search._sst)= <i>52c1d6d3c3082e6b236c65debc733a05</i> +<p> Skin_Digest: +MD5(/home/ralph/grotto/theatre/dbld/sisu-dev/sisu/data/doc/sisu/sisu_markup_samples/sisu_manual/_sisu/skin/doc/skin_sisu_manual.rb)= + +<p><i>20fc43cf3eb6590bc3399a1aef65c5a9</i> +<p> <b>Generated</b> +<p> Document (metaverse) last +generated: <i>Sun</i> Sep 23 01:14:04 +0100 2007 +<p> Generated by: <i>SiSU</i> <i>0.58.3</i> of +2007w36/4 (2007-09-06) +<p> Ruby version: <i>ruby</i> 1.8.6 (2007-06-07 patchlevel 36) + [i486-linux] +<p> +<ol> +<b>.</b><li><<a href='http://www.postgresql.org/'>http://www.postgresql.org/</a> +> <<a href='http://advocacy.postgresql.org/'>http://advocacy.postgresql.org/</a> +><br> + <<a href='http://en.wikipedia.org/wiki/Postgresql'>http://en.wikipedia.org/wiki/Postgresql</a> +><br> + </li><b>.</b><li><<a href='http://www.hwaci.com/sw/sqlite/'>http://www.hwaci.com/sw/sqlite/</a> +> <<a href='http://en.wikipedia.org/wiki/Sqlite'>http://en.wikipedia.org/wiki/Sqlite</a> +><br> + </li><b>.</b><li><<a href='http://search.sisudoc.org'>http://search.sisudoc.org</a> +> </li><b>.</b><li>(which could be extended further with current +back-end). As regards scaling of the database, it is as scalable as the database +(here Postgresql) and hardware allow. </li><b>.</b><li>of this feature when demonstrated +to an IBM software innovations evaluator in 2004 he said to paraphrase: +this could be of interest to us. We have large document management systems, +you can search hundreds of thousands of documents and we can tell you which +documents meet your search criteria, but there is no way we can tell you +without opening each document where within each your matches are found. + +<p> </dd> + +<dt>Other versions of this document: </dt> +<dd></dd> + +<dt>manifest: <<a href='http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html'><a href='http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html'>http://www.jus.uio.no/sisu/sisu_search/sisu_manifest.html</a> +</a> +> +</dt> +<dd></dd> + +<dt>html: <<a href='http://www.jus.uio.no/sisu/sisu_search/toc.html'><a href='http://www.jus.uio.no/sisu/sisu_search/toc.html'>http://www.jus.uio.no/sisu/sisu_search/toc.html</a> +</a> +> </dt> +<dd></dd> + +<dt>pdf: <<a href='http://www.jus.uio.no/sisu/sisu_search/portrait.pdf'><a href='http://www.jus.uio.no/sisu/sisu_search/portrait.pdf'>http://www.jus.uio.no/sisu/sisu_search/portrait.pdf</a> +</a> +> +</dt> +<dd></dd> + +<dt>pdf: <<a href='http://www.jus.uio.no/sisu/sisu_search/landscape.pdf'><a href='http://www.jus.uio.no/sisu/sisu_search/landscape.pdf'>http://www.jus.uio.no/sisu/sisu_search/landscape.pdf</a> +</a> +> </dt> +<dd> </dd> + +<dt>at: <<a href='http://www.jus.uio.no/sisu'><a href='http://www.jus.uio.no/sisu'>http://www.jus.uio.no/sisu</a> +</a> +> +</dt> +<dd></dd> + +<dt>* Generated by: SiSU 0.58.3 of 2007w36/4 (2007-09-06) </dt> +<dd></dd> + +<dt>* Ruby version: ruby +1.8.6 (2007-06-07 patchlevel 36) [i486-linux] </dt> +<dd></dd> + +<dt>* Last Generated on: Sun Sep 23 +01:14:07 +0100 2007 </dt> +<dd></dd> + +<dt>* SiSU <a href='http://www.jus.uio.no/sisu'>http://www.jus.uio.no/sisu</a> + </dt> +<dd></dd> +</dl> +<p> +</body> +</html> |