Graph Synchronisation

The Problem

After graph.build() creates the initial graph snapshot, your tables continue to receive INSERTs, UPDATEs, and DELETEs. The graph must stay in sync without requiring a full rebuild.

“Sync” means the graph index follows committed changes in registered Postgres tables without users manually babysitting every update. Anything weaker is a manual refresh or manual delta apply, not full sync.

Current implementation note: the open-source V1 path is trigger capture into graph._sync_log, explicit graph.apply_sync() replay, edge-buffer visibility, and graph.vacuum() CSR rebuild. The full-sync contract below is the target contract. WAL/background-worker replay, crash-replay from WAL, and automatic schema-stale handling are not proven by the current implementation.

Full-Sync Contract

For registered graph objects, pgGraph full sync provides:

Registered row changes are captured automatically after commit.
INSERT, UPDATE, and DELETE on registered node tables are reflected without a full rebuild.
Indexed property changes update Bloom signatures and TokenIndex entries.
Deleted rows are tombstoned and excluded from query results.
Relationship/FK changes are captured into an edge delta buffer.
Traversal reads base CSR plus pending edge deltas so committed relationship changes are query-visible before vacuum.
graph.vacuum() merges edge deltas into a rebuilt CSR.
Crash recovery replays unapplied committed changes without losing writes.
Schema changes affecting registered graph objects are detected and reported as stale or invalid.
graph.status() exposes sync lag, pending deltas, stale schema state, and rebuild/vacuum requirements.

Full sync has two separate responsibilities:

Layer	Scope	Behavior
Data sync	Row `INSERT` / `UPDATE` / `DELETE`, indexed property changes, FK/edge-column changes	Automatic capture and apply for registered tables
Schema sync	Table/column/FK drop, rename, type change, PK change, new FK/table discovery	Detect and mark stale/invalid; require explicit refresh/rebuild before trusting the graph

pgGraph should not silently guess a new graph model after DDL. If a registered table, column, primary key, or foreign key changes, the graph metadata becomes stale or invalid until the user runs graph.refresh_schema() / graph.auto_discover() and graph.build().

Sync Strategies

pgGraph supports three sync strategies, each with different trade-offs:

Strategy	Latency	Complexity	Overhead	Best For
Manual	Full rebuild	None	Zero (no triggers)	Batch/analytical workloads
Trigger	Manual apply / query-time freshness where enabled	Low	trigger write per row	Current V1 trigger/apply path
WAL (BGW)	Roadmap	Medium	Near-zero write overhead	High-write-throughput, CDC

Strategy 1: Manual Sync

Set graph.sync_mode = 'manual' in postgresql.conf.

No triggers are installed. No background worker runs. The graph is only updated when you explicitly call graph.build().


-- Rebuild the graph (e.g., nightly cron)
SELECT graph.build();

Use case: Analytics, batch processing, data warehouses. Query the graph as a point-in-time snapshot. Rebuild nightly or after ETL loads.

Strategy 2: Trigger-Based Delta Capture and Apply (Current)

Set graph.sync_mode = 'trigger'.

When trigger sync is enabled, graph.build() or graph.enable_sync() installs AFTER INSERT/UPDATE/DELETE triggers on every registered table. These triggers capture committed row changes in graph._sync_log; current verification covers explicit graph.apply_sync() plus graph.vacuum(), not WAL replay or a crash-replay background worker.

How It Works


INSERT INTO contacts (id, name, company_id) VALUES (99, 'Alice', 42);
        │
        ▼
AFTER INSERT trigger fires → graph.sync_row('contacts', 'INSERT', NEW)
        │
        ├── 1. ResolutionIndex: allocate node_index for ('contacts', '99')
        ├── 2. NodeStore: set is_active = true, compute Bloom signature
        ├── 3. TokenIndex: index 'name:Alice' (case-sensitive exact match)
        ├── 4. EdgeBuffer: push (node_99 → node_42, type='works_at')
        │      (if company_id FK is registered as an edge)
        └── 5. Return — total overhead: 5–20μs


UPDATE contacts SET status = 'inactive' WHERE id = 99;
        │
        ▼
AFTER UPDATE trigger fires → graph.sync_row('contacts', 'UPDATE', NEW, OLD)
        │
        ├── 1. ResolutionIndex: lookup ('contacts', '99') → existing node_index
        ├── 2. NodeStore: recompute Bloom signature with new properties
        ├── 3. TokenIndex: remove old tokens, insert new tokens
        ├── 4. EdgeBuffer: if FK columns changed, buffer edge mutations
        └── 5. Return — overhead: 10–30μs


DELETE FROM contacts WHERE id = 99;
        │
        ▼
AFTER DELETE trigger fires → graph.sync_row('contacts', 'DELETE', OLD)
        │
        ├── 1. ResolutionIndex: lookup ('contacts', '99') → node_index
        ├── 2. NodeStore: set is_active = false (tombstone)
        ├── 3. TokenIndex: remove all tokens for this node
        └── 4. Return — overhead: 3–10μs
              (dead edges cleaned up during next vacuum)

Edge Buffer and Vacuum

Trigger-based sync does NOT rebuild the CSR on every write. That would be catastrophically expensive. Instead:

Node changes (Bloom, TokenIndex, is_active) are applied immediately — these are in-place writes to flat arrays.
Edge changes (new FK values, deleted nodes) are buffered in a lock-free edge mutation buffer.
Periodic vacuum (every 60s by default, configurable) drains the buffer and rebuilds the CSR.

Traversal consults pending edge deltas alongside the base CSR. graph.status() reports needs_vacuum when enough edge deltas have accumulated that they should be merged into a rebuilt CSR.


-- Force an immediate edge rebuild
SELECT graph.vacuum();

Trigger Overhead

Overhead per trigger invocation (V1 spec estimates):

Operation	Overhead	Impact on Write TPS
INSERT (new node)	~15μs	<1% at 10K writes/sec
UPDATE (properties)	~20μs	<1% at 10K writes/sec
DELETE (tombstone)	~8μs	<0.5% at 10K writes/sec

At 10,000 writes/second, the total trigger overhead is ~200ms/second — negligible for OLTP workloads.

Roadmap: WAL-Based Sync (Background Worker)

WAL/BGW sync is explicitly out of the V1 trigger-focused implementation. If graph.sync_mode = 'wal' is selected today, graph.build() fails clearly and asks the operator to use trigger or manual. The design below remains the roadmap target for a later release.

Set graph.sync_mode = 'wal'.

This is the highest-performance option. No triggers are installed. Instead, a pgrx Background Worker (BGW) consumes changes from a logical replication slot.

How It Works


┌──────────────────────────────────────────────────────────────────┐
│  PostgreSQL WAL (Write-Ahead Log)                                │
│                                                                  │
│  LSN 0/1A3B4C → INSERT contacts (99, 'Alice', 42)               │
│  LSN 0/1A3B4D → UPDATE contacts SET status='inactive' WHERE id=1│
│  LSN 0/1A3B4E → DELETE FROM invoices WHERE id=500                │
│                                                                  │
└────────────────────────┬─────────────────────────────────────────┘
                         │
                    Logical Replication Slot
                    (pgoutput protocol)
                         │
                         ▼
┌──────────────────────────────────────────────────────────────────┐
│  Background Worker (Rust, registered in _PG_init)                │
│                                                                  │
│  Loop:                                                           │
│    1. pg_logical_slot_get_changes('graph_slot', ...)            │
│    2. Parse change: table, operation, columns                    │
│    3. Match against registered tables                            │
│    4. Apply to graph (same logic as trigger sync)                │
│    5. Acknowledge LSN → Postgres can reclaim WAL                 │
│                                                                  │
│  Vacuum: same periodic CSR rebuild as trigger mode               │
└──────────────────────────────────────────────────────────────────┘

Setup Requirements


-- PostgreSQL must have logical replication enabled
-- postgresql.conf: wal_level = logical
 
-- The extension creates the replication slot automatically.
ALTER SYSTEM SET wal_level = logical;
-- Restart Postgres, then:
SELECT graph.build();  -- creates slot 'graph_slot' if sync_mode = 'wal'

Advantages Over Triggers

Aspect	Triggers	WAL BGW
Write path overhead	~15μs per row	Zero (WAL is written anyway)
Bulk load performance	Triggers fire per row	Changes consumed async
COPY command support	Triggers fire per row	Consumed from WAL after commit
Transaction visibility	After each statement	After transaction commit
Crash recovery	Triggers are transactional	LSN-based replay from slot

Background Worker Registration


// In _PG_init():
BackgroundWorkerBuilder::new("graph WAL sync")
    .set_function("graph.bgworker_main")
    .set_library("graph")
    .enable_spi_access()
    .set_start_time(BgWorkerStartTime::RecoveryFinished)
    .set_restart_time(Some(Duration::from_secs(5)))
    .load();

LSN Checkpointing

The BGW periodically writes its confirmed LSN to a file ($PGDATA/graph/wal_checkpoint). On crash recovery:

BGW reads the last confirmed LSN from the checkpoint file.
Reconnects to the replication slot.
Postgres replays all WAL events from the last confirmed LSN.
Graph catches up automatically — zero data loss.

Comparison Matrix

Feature	Manual	Trigger	WAL (BGW)
Setup complexity	None	Auto (after build)	Requires `wal_level = logical`
Write overhead	Zero	~15μs/row	Zero
Node property freshness	Stale until rebuild	Instant	Near-instant (async)
Edge freshness	Stale until rebuild	Next vacuum (60s)	Next vacuum (60s)
COPY bulk load	No sync	Per-row triggers	Async (efficient)
Crash recovery	Rebuild from tables	Rebuild from tables	LSN replay (fast)
Multi-database	N/A	N/A	Future: cross-DB via external replication
Recommended for	Analytics, batch	Most OLTP workloads	High write throughput

Hybrid Strategy: Build + WAL

The recommended production deployment for large databases:


1. Create logical replication slot (exports a snapshot name)
2. Build graph using that specific exported snapshot (snapshot-consistent)
3. WAL BGW starts consuming from the slot's confirmed_flush_lsn
4. All changes after the snapshot are replayed by the BGW
5. Zero gap, zero data loss

This is exactly how Postgres physical replicas work:
  base backup + WAL replay = consistent replica

Important: The replication slot MUST be created before the build snapshot is taken. The slot creation exports a snapshot name (pg_export_snapshot()), and graph uses that specific snapshot for COPY TO STDOUT. This guarantees that no mutations are lost between the snapshot and the start of WAL consumption. If you build first and create the slot after, any mutations that occur during the build window would be silently lost.

This hybrid approach runs automatically when sync_mode = 'wal':


-- This single call does everything:
--   1. Creates logical replication slot (exports snapshot)
--   2. Builds the graph from the exported snapshot
--   3. Starts the BGW to consume WAL changes from the slot's LSN
SELECT graph.build();

Schema Change Handling

Row-level sync is not enough for a trustworthy Postgres extension. Users also expect pgGraph to notice when the registered schema no longer matches reality.

Schema sync uses Postgres DDL/event-trigger or catalog validation hooks to detect:

registered table dropped or renamed
registered column dropped, renamed, or type-changed
primary key changed
foreign key added, removed, or changed
registered edge source/target column invalidated
registered filter or weight column no longer numeric

Detected DDL should update graph metadata rather than guessing silently:

Schema state	Meaning	Query behavior
`current`	Registered graph metadata matches Postgres catalogs	Queries run normally
`stale`	New schema information may need discovery/build	Existing graph may serve with warning/status flag
`invalid`	Registered table/column/PK/edge no longer exists or changed incompatibly	Unsafe graph queries return a clear error until rebuild

The user-facing recovery path should be:


SELECT graph.refresh_schema();  -- or graph.auto_discover(...)
SELECT graph.build();