Skip to main content

Database

The incremental_graph/database module wraps a LevelDB instance and exposes it as a typed, namespace-scoped key–value store for the incremental graph engine.


Conceptual overview

Namespaces (x / y)

Every key is stored inside a namespace sublevel – currently x (live data) or y (staging namespace used during schema migrations). At the LevelDB level this means all keys are prefixed with !x! or !y!. Callers never deal with these prefixes directly; the RootDatabase class encapsulates them.

Sub-sublevels

Within each namespace there are further typed sublevels:

SublevelPurpose
valuesThe computed output value for each graph node
freshnessWhether a node is up-to-date or potentially-outdated
inputsInput dependency list for each node
revdepsReverse-dependency index (input → list of dependents)
countersMonotonic integer tracking how many times a value changed
timestampsCreation and last-modification ISO timestamps
metaNamespace metadata (currently just the schema version)

There is also a top-level _meta sublevel (outside the x/y namespace) that stores the database format marker.

Key format

Node keys are JSON-serialised objects of the form {"head":"<name>","args":[...]}, for example:

{"head":"all_events","args":[]}
{"head":"event","args":["abc123"]}
{"head":"transcription","args":["/path/to/audio.mp3"]}

At the raw LevelDB level these are concatenated with the sublevel prefixes, e.g.

!x!!values!{"head":"all_events","args":[]}
!x!!freshness!{"head":"all_events","args":[]}
!_meta!format

Filesystem rendering

The database exposes two complementary operations for dumping and restoring its complete state to/from a plain directory tree:

const { renderToFilesystem, scanFromFilesystem } = require('./database');

// Dump every key/value pair to disk
await renderToFilesystem(capabilities, rootDatabase, '/path/to/snapshot');

// Restore the database from a snapshot (clears all existing entries first)
await scanFromFilesystem(capabilities, rootDatabase, '/path/to/snapshot');

Key → file-path mapping

Each raw LevelDB key is translated to a relative file path inside the snapshot directory. The algorithm depends on the key type:

Data sublevels (values, freshness, inputs, revdeps, counters, timestamps)

The stored key is a JSON-serialised NodeKey object {"head":"...","args":[...]}. It is decomposed into human-readable path segments, similar to how /api/graph/nodes encodes graph nodes in URLs:

!x!!values!{"head":"all_events","args":[]}
→ x/values/all_events

!x!!values!{"head":"event","args":["abc123"]}
→ x/values/event/abc123

!x!!values!{"head":"transcription","args":["/audio/x.mp3"]}
→ x/values/transcription/%2Faudio%2Fx.mp3

String arguments are percent-encoded: /%2F, %%25, !%21, and ~%7E. In addition, literal dot-segment path components . and .. are encoded as %2E and %2E%2E to prevent path traversal while keeping the key↔path mapping bijective. Non-string arguments (numbers, booleans, arrays, objects) are JSON-encoded and prefixed with ~ so they remain unambiguous even when string arguments begin with ~.

Meta sublevels (_meta, meta)

The stored key is a plain string (e.g. format, version). It is used as a single percent-encoded path segment:

!_meta!format    → _meta/format
!x!!meta!version → x/meta/version

File-path → key mapping (inverse)

relativePathToKey is the exact inverse of keyToRelativePath:

  1. Determine sublevel depth: if the first segment is _meta → depth 1; otherwise depth 2.
  2. Extract sublevels: first depth segments.
  3. Determine key type: if the last sublevel is _meta or meta → plain string; otherwise NodeKey.
  4. Reconstruct key:
    • Plain string: decode the single remaining segment and reassemble the LevelDB key.
    • NodeKey: first remaining segment is the node head; subsequent segments are decoded arguments; reassemble as JSON.stringify({head, args}) and build the LevelDB key.

Bijection guarantee

For all keys generated by this database the mapping key → path → key is an exact bijection:

relativePathToKey(keyToRelativePath(key)) === key   // for all valid keys

The ! character in argument values is encoded as %21 before splitting, so it can never be mistaken for the LevelDB sublevel separator. This is the P1 fix from the initial implementation.

Stale-key deletion (P2)

scanFromFilesystem clears all existing entries from the database before importing. This ensures that keys present in the database but absent from the snapshot directory (i.e., deleted entries) do not survive the restore, preserving the bijection/restore semantics.

Value serialisation

Values are stored as JSON. renderToFilesystem writes JSON.stringify(value) to each file; scanFromFilesystem reads each file and calls JSON.parse(content) before writing back to the database.

No locking

Neither renderToFilesystem nor scanFromFilesystem acquires any lock. Callers that require atomicity must arrange their own locking around these calls.


Checkpointing and synchronisation

The live LevelDB now lives outside the git repository (<workingDirectory>/generators-leveldb/). The git repository stores a rendered filesystem snapshot under <workingDirectory>/generators-database/rendered/. Two higher-level operations are available:

  • checkpointDatabase(capabilities, message, rootDatabase) – renders the live database into the tracked snapshot directory and commits it (no-op if nothing has changed). Used for single rendered snapshots such as sync.
  • runMigrationInTransaction(capabilities, rootDatabase, preMessage, postMessage, callback) – wraps the whole migration in one gitstore transaction, commits the rendered snapshot before the migration body runs, executes the migration, then commits the rendered post-migration snapshot in the same transaction.
  • synchronizeNoLock(capabilities, options) – renders the current database, synchronises the rendered repository with the remote generators repository, and then scans the updated rendered snapshot back into the live database.

See docs/gitstore.md for the gitstore primitives that back these operations.