| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Three small follow-ups to the ocda/outputs split:
1. Add src/sisudoc/ocda/package.d (module sisudoc.ocda) as a 2-line
public re-export of sisudoc.ocda.abstraction. Provides downstream
consumers with a canonical "import sisudoc.ocda;" entry point and
a stable handle for eventual peer-repo packaging of the
abstraction library.
2. Fix the D import-path root in dub.json so it matches the declared
module names:
- spine:abstraction sub-package
"importPaths": [ "./src/sisudoc" ] -> [ "./src" ]
- main package buildTypes (dmd, ldc2, ldmd2, gdc, gdmd)
"-I=src/sisudoc" -> "-I=src"
The modules are named sisudoc.ocda.* / sisudoc.outputs.* /
sisudoc.* so the filesystem-based resolver needs to see
./src as the root (so <root>/sisudoc/ocda/X.d resolves).
3. Replace dyaml sub-package's destructive preGenerateCommands
("rm -rf ./src/ext_depends/D-YAML/{examples,testsuite}") with
declarative excludedSourceFiles globs. The two directories do
not exist in the vendored D-YAML tree, so the rm was a no-op
in practice; the glob form is defensive (would silently skip
them if they were ever re-introduced) and removes the
destructive side-effect from every build.
(assisted by Claude-Code)
|
| |
|
|
|
|
|
|
|
|
|
| |
Modules and imports rewritten to sisudoc.ocda.* and
sisudoc.outputs.*; dub.json excludedSourceFiles and the
spine:abstraction sub-package sourcePaths collapsed to
./src/sisudoc/ocda.
Verified: nix build .#spine-overlay-ldc clean.
(assisted by Claude-Code)
|
| |
|
|
|
|
| |
create new directories under ./src/sisudoc ocda & outputs in order to
separate the document abstraction library from downstream output
processing (stuff broken till paths & modules fixed)
|
| |
|
|
| |
(also cgi_sqlite_search_form.d did not belong here)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
css: align body-flow <ul>/<li> & <details>/<summary> with <p>
Not used by sisudoc-spine but for hand-authored body-flow markup such as
the current homepage / body-flow, added block to each of the four html
CSS string heredocs in src/sisudoc/io_out/xmls_css.d
Existing tags are left in place and untouched.
(assisted by Claude-Code)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
phase1 step2: move SSP serialiser into sisudoc.abstraction package
git mv src/sisudoc/io_out/create_abstraction_txt.d to
src/sisudoc/abstraction/ssp.d
Module rename: sisudoc.io_out.create_abstraction_txt
-> sisudoc.abstraction.ssp
Completes phase1: after this commit the sisudoc.abstraction package has
zero outgoing edges into sisudoc.io_out. The library produces both the
in-memory document object model AND the .ssp text serialisation without
referencing any output-side module.
The serialiser previously imported sisudoc.io_out.paths_output for the
single purpose of constructing the .ssp output path. That import is
dropped; the path construction is inlined as three lines of std.path
(chainPath / asNormalizedPath / array) producing
<output_path>/<language>/abstraction/<doc_uid_out>.ssp
- byte-for-byte the same path the previous spineOutPaths!() call
produced.
Updated:
- src/sisudoc/abstraction/ssp.d - module decl + inline path
- src/sisudoc/abstraction/package.d - public import .ssp
- src/sisudoc/spine.d - import sisudoc.abstraction.ssp (x2)
Completes decouple abstraction phase1
(assisted by Claude-Code)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
phase1 step1: introduce sisudoc.abstraction package re-export surface
Create src/sisudoc/abstraction/package.d as a library-facing re-export
module for the document-abstraction stage.
The surface currently re-exports:
- sisudoc.meta.metadoc (spineAbstraction, A-layer entry)
- sisudoc.meta.metadoc_from_src (docAbstraction, B-layer entry)
No code moves; no behaviour change. The package exists so external
consumers can `import sisudoc.abstraction;` and reach the entry points
without depending on spine's internal directory layout.
(assisted by Claude-Code)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
phase0 step2: move curation modules from meta/ to io_out/curate/
Curation modules moved to src/sisudoc/io_out/curate/, module
declarations renamed sisudoc.io_out.curate.metadoc_curate* from
sisudoc.meta.metadoc_curate* and updated spine.d imports. File contents
are otherwise unchanged.
Completes phase0: meta/ now has zero io_out imports - the abstraction
core's outgoing deps are now only:
meta/ internals + io_in/ + ext_depends/D-YAML
(assisted by Claude-Code)
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
phase0: drop vestigial io_out.hub coupling from meta/metadoc.d
phase0 step1: abstraction-library extraction/decoupling: meta/ should
not import io_out/.
Removed unused call to `import sisudoc.io_out.hub;` `mixin outputHub;`
from `template spineAbstraction()`. (the load-bearing UFCS site is
spine.d:92 which has its own `mixin outputHub).
(assisted by Claude-Code)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- include all (doc abstraction) .ssp in pod zip and in digests
- fixed: for multi-language pods built with --pod2, only the last
language's .ssp file was being written into pod.zip and listed in
.digests.txt each languages' .ssp files were on disk in the pod
directory (copied during their own per-language passes) but were not
in final zip as it was being built once for each language and
writing over previous, (only the last one remaining). The solution
is to follow the pattern already used to avoid this by .sstm and
.ssi, namely wait for the last language and iterate the
manifest_list_of_languages internaly.
(assisted by Claude-Code)
|
| |
|
|
| |
(assisted by Claude-Code)
|
| |
|
|
|
|
| |
- fatal error on missing/unwritable --sqlite-db-path
(assisted by Claude-Code)
|
| |
|
|
|
| |
- odd hilighting issue ... must result from my org config, but "fix"
makes things easier for me.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add int[] children_headings field to DocObj_MetaInfo_ and
compute it in the post-processing pass of metadoc_from_src.d,
right after last_descendant_ocn. Single O(n) pass builds a
parent_ocn -> child heading OCNs map, then assigns to each
heading object. Useful for tree-structured output.
The .ssp serializer now reads directly from the abstraction
field instead of pre-computing its own map.
metadoc_object_setter.d: +1 line (field declaration)
metadoc_from_src.d: +17 lines (computation)
create_abstraction_txt.d: -10 lines (simplified)
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Finer-grained control over when .ssp files are produced:
--show-abstraction writes .ssp to OUTPUT/lang/abstraction/
independently of any pod flag
--pod builds pod without .ssp bundled
--pod2 builds pod with .ssp in media/abstraction/
Changes to spine.d:
- show_abstraction() now only responds to its own flag and
pod2, no longer triggered by source_or_pod
- Add pod2 to opts init, getopt, OptActions
- pod() returns true for both --pod and --pod2
- source_or_pod() includes pod2
Changes to source_pod.d:
- Remove per-document pod directory (rmdirRecurse) before
regeneration, ensuring clean slate on every run. This
prevents stale content from previous runs (e.g. a --pod2
run followed by --pod would otherwise leave an outdated
media/abstraction/ directory)
- Gate abstraction directory creation and .ssp bundling on
pod2 flag specifically
Tested: --pod (no .ssp), --pod2 (.ssp in pod + zip),
--show-abstraction (standalone .ssp), --pod after --pod2
(stale abstraction cleaned up). All 35 sample documents pass.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Add empty-string guards to array property loops
(.stow_link, .lev4_subtoc, .anchor_tag) so entries with
zero-length values are not emitted. Empty properties have
no value for PEG parsing - absent lines are faster to skip
than matching a property name to find an empty value.
Removes 1488 empty .anchor_tag: lines from Wealth of
Networks .ssp alone.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Add explicit child heading OCN lists to heading objects,
pre-computed in a single O(n) pass over the body section
before serialization. This makes the document tree directly
navigable without scanning - each heading lists its direct
sub-heading OCNs.
- Example output for a chapter heading:
[10] heading :1
.last_descendant: 65
.children: 14 24 42 57
- Implementation: builds an int[][int] map (parent_ocn ->
child heading OCNs) from one pass over the body objects,
then emits .children: during serialization for headings
that have entries in the map.
- The tree was already reconstructable from parent_ocn +
last_descendant_ocn, but .children makes it immediate -
no scanning required to find a heading's sub-structure.
- Tested against all 35 sample documents - zero failures.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Make the .ssp format a complete representation of the
document abstraction by serializing all remaining fields
from ObjGenericComposite (only omitting ptr.* runtime
indices which are meaningless outside the in-memory context).
- New fields added:
.ancestors_collapsed: - collapsed level ancestor chain
.dom_status: - DOM structure markedup tags status[8]
.dom_status_collapsed: - DOM structure collapsed status[8]
.heading_lev_collapsed: - collapsed heading level
.parent_lev: - parent heading level (markup)
.o_n_type: - object numbering type (0=ocn, 1=non, 2=bkidx)
.is_of_type: - para/block type classification
.attrib: - general attributes string
.meta_lang: - block language (group/block/quote)
.meta_syntax: - codeblock syntax from metainfo
.sha256: - hex-encoded SHA-256 digest of object content
.has: images_no_dim - image without dimensions flag
.table_aligns: - column alignment array
.table_walls: - table walls/borders flag
.stow_link: - extracted URLs (one per line)
.heading_lev_anchor: - heading level anchor tag
.segment_epub: - EPUB segment anchor tag
.heading_ancestors_text: - pipe-separated ancestor headings
.lev4_subtoc: - sub-table-of-contents entries (one per line)
.anchor_tag: - additional anchor tags (one per line)
- Tested against all 35 sample documents - zero failures.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- For heading objects, the identifier was always emitted on the
declaration line (e.g. "[10] heading :1 10") even when it was
just the OCN repeated. Now only emits the identifier when it
differs from the OCN (i.e. when there is a named segment like
"acknowledgments" or "a1"), reducing redundancy.
Before: [10] heading :1 10
After: [10] heading :1
Named segments still appear: [0] heading :1 a1
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- When --source/--pod is used, automatically generate the .ssp
document abstraction and bundle it into the pod at
media/abstraction/{doc_uid}.{lang}.ssp
- This makes show_abstraction implicitly true when source_or_pod
is active, so the .ssp file is generated before the pod
assembler runs (abstraction runs before outputHub, and
source_or_pod is the first task in outputHub).
- Changes:
paths_source.d:
Add abstraction_root() path helper to _PodPaths struct,
following the same pattern as image_root(). Produces
paths like pod/media/abstraction/ for both zpod (inside
zip) and filesystem_open_zpod (open directory).
source_pod.d:
- Create media/abstraction/ directory in
podArchive_directory_tree
- Bundle .ssp file in pod_zipMakeReady: reads from the
abstraction output directory, copies to open pod
directory, adds to zip archive, computes SHA-256 digest
- Write .ssp digest in zipArchiveDigest alongside sstm
and ssi digests
spine.d:
Make show_abstraction() return true when source_or_pod is
active (previously only returned true for explicit
--show-abstraction flag).
- The .ssp is always included when building pods - no exclusion
flag for this experimental feature to keep things simple.
Not generated for non-pod outputs (--text, --html, etc.)
unless --show-abstraction is explicitly passed.
- Tested against all 35 sample documents - zero failures.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
--show-abstraction-db flag to write per-document
- SQLite database of document abstraction
(Claude-Code primary assist)
- Add a new output mode that serializes the in-memory document
abstraction to a per-document SQLite database. This complements
the .ssp text format (--show-abstraction) with a queryable
database representation of the same data.
- Schema:
metadata table - key/value pairs for document metadata
(title, creator, dates, rights, classify, identifiers,
language, notes, make settings, doc_has counts)
objects table - one row per document object with columns:
section, seq (position within section), ocn, is_a,
is_of_part, is_of_type, heading_level, identifier,
parent_ocn, last_descendant_ocn, ancestors,
indent/bullet/lang, has_* flags, segment/anchor tags,
table/code properties, text content
Indexed on: section, ocn, parent_ocn, is_a, heading_level
- Uses prepared statements via d2sqlite3 (existing dependency)
for safe and efficient insertion. Each document produces a
standalone .abstraction.db file in the abstraction/ output
directory.
- New files:
src/sisudoc/io_out/create_abstraction_db.d
Follows the same pattern as create_abstraction_txt.d.
Creates schema, populates metadata via key/value inserts,
then iterates all sections writing objects with prepared
statements within a single transaction.
- Changes to spine.d:
- Add "show-abstraction-db" to opts init, getopt, OptActions
- Add to abstraction(), require_processing_files(), and
meta_processing_general() gates
- Insert call at both spineAbstraction sites
- Tested against all 35 sample documents (including 9-language
live-manual) - zero failures. Works standalone or combined
with --show-abstraction and other output flags.
- Example queries the database supports:
SELECT ocn, heading_level, text FROM objects
WHERE is_a = 'heading' AND section = 'body';
SELECT * FROM objects WHERE parent_ocn = 10;
SELECT key, value FROM metadata WHERE key LIKE 'title.%';
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
--show-abstraction flag to write .ssp document abstraction files
- Add a new output mode that serializes the in-memory document
abstraction (produced by spineAbstraction) to a human-readable,
line-oriented text format (.ssp). This captures the full object
model after parsing and abstraction but before output generation.
- The .ssp format uses unambiguous line prefixes:
@section { } - section boundaries (head/toc/body/endnotes/...)
[N] type - object declaration with OCN
.name: value - object properties (only non-defaults)
| content - text content lines
% comment - comments
- New files:
src/sisudoc/io_out/create_abstraction_txt.d
Serializer module following the same template pattern as
metadoc_show_summary.d. Walks doc.abstraction() section by
section, writing metadata preamble (@meta, @make, @doc_has)
then each object with its properties and text content.
Output goes to {output_path}/{lang}/abstraction/{doc}.ssp
- Changes to spine.d:
- Add "show-abstraction" to opts initialization, getopt, and
OptActions struct
- Add show_abstraction to abstraction(), require_processing_files(),
and meta_processing_general() so the flag triggers full document
processing
- Insert call at both spineAbstraction sites (parallel and serial
branches), gated by show_abstraction flag, following the same
pattern as show_config/show_summary/show_make
- Tested against all 35 sample documents (including multilingual
live-manual in 9 languages) - zero failures. Works standalone
(--show-abstraction) or combined with other output flags
(--show-abstraction --html --text). No effect on existing code
paths when the flag is not used.
Co-Authored-By: Anthropic Claude Opus 4.6 (1M context)
|
| | |
|
| |
|
|
|
|
| |
- claude contributed src
- processes zip from url using (system
installed) curl for download
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- claude contributed src
- Opens the zip with std.zip.ZipArchive (reads the whole file into
memory)
- Locates pod.manifest inside the archive to discover document paths
and languages
- Extracts markup files (.sst/.ssm/.ssi) as in-memory strings
- Extracts images as in-memory byte arrays
- Extracts conf/dr_document_make if present
- Presents these to the existing pipeline as if they were read from
the filesystem
- Some security mitigations:
- Zip Slip / Path Traversal: Reject entries containing `..` or
starting with `/`; canonicalize resolved paths and verify they
fall within extraction root
- Zip Bomb: Check `ArchiveMember.size` before extracting; enforce
per-file (50MB) and total size limits (500MB)
- Entry Count: Limit number of entries (a pod should have at most
~100 files)
- Path depth: limit (Maximum 10 path components).
- Symlinks: Verify no symlinks in extracted content before
processing (post-extraction recursive scan)
- Filename Validation: Only allow expected characters; reject null
bytes
- Malformed Zips: Catch `ZipException` from `std.zip.ZipArchive`
constructor
- Cleanup on error
|
| |
|
|
|
| |
- FIXES issue with .tex files and xetex finding image paths when run
within latex/ output directory
|
| | |
|
| |
|
|
| |
- revisit links (fix later)
|
| |
|
|
|
|
| |
- preferable, endnote parent object number
available for use (as here in text output,
compare "endnotes, add caller ocn" commit)
|
| | |
|
| | |
|
| |
|
|
| |
- spine --text [--output=output path] [markup source]
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
|
|
| |
- appears to work, but needs review
|
| |
|
|
| |
- plus minor housekeeping/tidy
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
| |
- tics a bit cumbersome where single quotes work
just as well
- testing required (special cases not covered)
- diverges from sisu markup which will need an
update sometime
|
| | |
|
| |
|
|
|
|
| |
- struct replaces tuple
- some direct naming of structs returned
(instead of use of auto) - minor
|
| | |
|
| | |
|
| | |
|
| |
|
|
|
| |
- serial processing (need to be built serially)
- multilingual pods, copy all languages before zip
|
| | |
|
| | |
|