documentation related to search, needs further review
[software/sisu] / data / doc / sisu / markup-samples / manual / en / sisu_sql.ssi
1 % SiSU insert 2.0
3 @title: SiSU
4 :subtitle: SQL and Search
6 @creator:
7 :author: Amissah, Ralph
9 @date:
10 :created: 2002-08-28
11 :issued: 2002-08-28
12 :available: 2002-08-28
13 :published: 2007-09-16
14 :modified: 2011-02-07
16 @rights:
17 :copyright: Copyright (C) Ralph Amissah 2007
18 :license: GPL 3 (part of SiSU documentation)
20 @classify:
21 :subject: ebook, epublishing, electronic book, electronic publishing, electronic document, electronic citation, data structure, citation systems, search
23 :A~? @title @creator
25 :B~? SiSU Search
27 :C~? Search
29 1~search_sql SQL
30 ={ SiSU sql; SiSU search }
32 2~ Populate the database
33 ={ SiSU search:populate database }
35 TO populate the sql database, run sisu against a sisu markup file with one of the following sets of flags
37 ``` code
38 sisu --sqlite filename.sst
39 ```
41 creates an sqlite3 database containing searchable content of just the sisu markup document selected
43 ``` code
44 sisu --sqlite --update filename.sst
45 ```
47 creates an sqlite3 database containing searchable content of marked up document(s) selected by the user from a common directory
49 ``` code
50 sisu --pg --update filename.sst
51 ```
53 fills a postgresql database with searchable content of marked up document(s) selected by the user from a common directory
55 For postgresql the first time the command is run in a given directory the user will be prompted to create the requisite database, at the time of writing the prompt sisu provides is as follows:
57 ``` code
58 no connection with pg database established, you may need to run:
59 createdb "SiSU.7a.current"
60 after that don't forget to run:
61 sisu --pg --createall
62 before attempting to populate the database
63 ```
65 The named database that sisu expects to find must exist and if necessary be created using postgresql tools. If the database exist but the database tables do not, sisu will attempt to create the tables it needs, the equivalent of the requested #{sisu --pg --createall}# command.
67 Once this is done, the sql database is populated and ready to be queried.
69 2~ SQL type databases
71 SiSU feeds sisu markup documents into sql type databases PostgreSQL~{ \\ \\ }~ and/or SQLite~{ \\ }~ database together with information related to document structure.
73 This is one of the more interesting output forms, as all the structural data of the documents are retained (though can be ignored by the user of the database should they so choose). All site texts/documents are (currently) streamed to four tables:
75 _1* one containing semantic (and other) headers, including, title, author, subject, (the Dublin Core...);
77 _1* another the substantive texts by individual "paragraph" (or object) - along with structural information, each paragraph being identifiable by its paragraph number (if it has one which almost all of them do), and the substantive text of each paragraph quite naturally being searchable (both in formatted and clean text versions for searching); and
79 _1* a third containing endnotes cross-referenced back to the paragraph from which they are referenced (both in formatted and clean text versions for searching).
81 _1* a fourth table with a one to one relation with the headers table contains full text versions of output, eg. pdf, html, xml, and ascii.
83 There is of course the possibility to add further structures.
85 At this level SiSU loads a relational database with documents chunked into objects, their smallest logical structurally constituent parts, as text objects, with their object citation number and all other structural information needed to construct the document. Text is stored (at this text object level) with and without elementary markup tagging, the stripped version being so as to facilitate ease of searching.
87 Being able to search a relational database at an object level with the SiSU citation system is an effective way of locating content generated by SiSU. As individual text objects of a document stored (and indexed) together with object numbers, and all versions of the document have the same numbering, complex searches can be tailored to return just the locations of the search results relevant for all available output formats, with live links to the precise locations in the database or in html/xml documents; or, the structural information provided makes it possible to search the full contents of the database and have headings in which search content appears, or to search only headings etc. (as the Dublin Core is incorporated it is easy to make use of that as well).