Graph Synchronisation
The Problem
After graph.build() creates the initial graph snapshot, your tables continue to receive INSERTs, UPDATEs, and DELETEs. The graph must stay in sync without requiring a full rebuild.
“Sync” means the graph index follows committed changes in registered Postgres tables without users manually babysitting every update. Anything weaker is a manual refresh or manual delta apply, not full sync.
Current implementation note: the open-source V1 path is trigger capture
into graph._sync_log, explicit graph.apply_sync() replay, edge-buffer
visibility, and graph.vacuum() CSR rebuild. The full-sync contract below is
the target contract. WAL/background-worker replay, crash-replay from WAL, and
automatic schema-stale handling are not proven by the current implementation.
Full-Sync Contract
For registered graph objects, pgGraph full sync provides:
- Registered row changes are captured automatically after commit.
INSERT,UPDATE, andDELETEon registered node tables are reflected without a full rebuild.- Indexed property changes update Bloom signatures and TokenIndex entries.
- Deleted rows are tombstoned and excluded from query results.
- Relationship/FK changes are captured into an edge delta buffer.
- Traversal reads base CSR plus pending edge deltas so committed relationship changes are query-visible before vacuum.
graph.vacuum()merges edge deltas into a rebuilt CSR.- Crash recovery replays unapplied committed changes without losing writes.
- Schema changes affecting registered graph objects are detected and reported as stale or invalid.
graph.status()exposes sync lag, pending deltas, stale schema state, and rebuild/vacuum requirements.
Full sync has two separate responsibilities:
| Layer | Scope | Behavior |
|---|---|---|
| Data sync | Row INSERT / UPDATE / DELETE, indexed property changes, FK/edge-column changes | Automatic capture and apply for registered tables |
| Schema sync | Table/column/FK drop, rename, type change, PK change, new FK/table discovery | Detect and mark stale/invalid; require explicit refresh/rebuild before trusting the graph |
pgGraph should not silently guess a new graph model after DDL. If a registered
table, column, primary key, or foreign key changes, the graph metadata becomes
stale or invalid until the user runs graph.refresh_schema() /
graph.auto_discover() and graph.build().
Sync Strategies
pgGraph supports three sync strategies, each with different trade-offs:
| Strategy | Latency | Complexity | Overhead | Best For |
|---|---|---|---|---|
| Manual | Full rebuild | None | Zero (no triggers) | Batch/analytical workloads |
| Trigger | Manual apply / query-time freshness where enabled | Low | trigger write per row | Current V1 trigger/apply path |
| WAL (BGW) | Roadmap | Medium | Near-zero write overhead | High-write-throughput, CDC |
Strategy 1: Manual Sync
Set graph.sync_mode = 'manual' in postgresql.conf.
No triggers are installed. No background worker runs. The graph is only updated when you explicitly call graph.build().
-- Rebuild the graph (e.g., nightly cron)
SELECT graph.build();Use case: Analytics, batch processing, data warehouses. Query the graph as a point-in-time snapshot. Rebuild nightly or after ETL loads.
Strategy 2: Trigger-Based Delta Capture and Apply (Current)
Set graph.sync_mode = 'trigger'.
When trigger sync is enabled, graph.build() or graph.enable_sync() installs
AFTER INSERT/UPDATE/DELETE triggers on every registered table. These triggers
capture committed row changes in graph._sync_log; current verification covers
explicit graph.apply_sync() plus graph.vacuum(), not WAL replay or a
crash-replay background worker.
How It Works
INSERT INTO contacts (id, name, company_id) VALUES (99, 'Alice', 42);
│
▼
AFTER INSERT trigger fires → graph.sync_row('contacts', 'INSERT', NEW)
│
├── 1. ResolutionIndex: allocate node_index for ('contacts', '99')
├── 2. NodeStore: set is_active = true, compute Bloom signature
├── 3. TokenIndex: index 'name:Alice' (case-sensitive exact match)
├── 4. EdgeBuffer: push (node_99 → node_42, type='works_at')
│ (if company_id FK is registered as an edge)
└── 5. Return — total overhead: 5–20μsUPDATE contacts SET status = 'inactive' WHERE id = 99;
│
▼
AFTER UPDATE trigger fires → graph.sync_row('contacts', 'UPDATE', NEW, OLD)
│
├── 1. ResolutionIndex: lookup ('contacts', '99') → existing node_index
├── 2. NodeStore: recompute Bloom signature with new properties
├── 3. TokenIndex: remove old tokens, insert new tokens
├── 4. EdgeBuffer: if FK columns changed, buffer edge mutations
└── 5. Return — overhead: 10–30μsDELETE FROM contacts WHERE id = 99;
│
▼
AFTER DELETE trigger fires → graph.sync_row('contacts', 'DELETE', OLD)
│
├── 1. ResolutionIndex: lookup ('contacts', '99') → node_index
├── 2. NodeStore: set is_active = false (tombstone)
├── 3. TokenIndex: remove all tokens for this node
└── 4. Return — overhead: 3–10μs
(dead edges cleaned up during next vacuum)Edge Buffer and Vacuum
Trigger-based sync does NOT rebuild the CSR on every write. That would be catastrophically expensive. Instead:
- Node changes (Bloom, TokenIndex, is_active) are applied immediately — these are in-place writes to flat arrays.
- Edge changes (new FK values, deleted nodes) are buffered in a lock-free edge mutation buffer.
- Periodic vacuum (every 60s by default, configurable) drains the buffer and rebuilds the CSR.
Traversal consults pending edge deltas alongside the base CSR. graph.status()
reports needs_vacuum when enough edge deltas have accumulated that they should
be merged into a rebuilt CSR.
-- Force an immediate edge rebuild
SELECT graph.vacuum();Trigger Overhead
Overhead per trigger invocation (V1 spec estimates):
| Operation | Overhead | Impact on Write TPS |
|---|---|---|
| INSERT (new node) | ~15μs | <1% at 10K writes/sec |
| UPDATE (properties) | ~20μs | <1% at 10K writes/sec |
| DELETE (tombstone) | ~8μs | <0.5% at 10K writes/sec |
At 10,000 writes/second, the total trigger overhead is ~200ms/second — negligible for OLTP workloads.
Roadmap: WAL-Based Sync (Background Worker)
WAL/BGW sync is explicitly out of the V1 trigger-focused implementation. If
graph.sync_mode = 'wal' is selected today, graph.build() fails clearly and
asks the operator to use trigger or manual. The design below remains the
roadmap target for a later release.
Set graph.sync_mode = 'wal'.
This is the highest-performance option. No triggers are installed. Instead, a pgrx Background Worker (BGW) consumes changes from a logical replication slot.
How It Works
┌──────────────────────────────────────────────────────────────────┐
│ PostgreSQL WAL (Write-Ahead Log) │
│ │
│ LSN 0/1A3B4C → INSERT contacts (99, 'Alice', 42) │
│ LSN 0/1A3B4D → UPDATE contacts SET status='inactive' WHERE id=1│
│ LSN 0/1A3B4E → DELETE FROM invoices WHERE id=500 │
│ │
└────────────────────────┬─────────────────────────────────────────┘
│
Logical Replication Slot
(pgoutput protocol)
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ Background Worker (Rust, registered in _PG_init) │
│ │
│ Loop: │
│ 1. pg_logical_slot_get_changes('graph_slot', ...) │
│ 2. Parse change: table, operation, columns │
│ 3. Match against registered tables │
│ 4. Apply to graph (same logic as trigger sync) │
│ 5. Acknowledge LSN → Postgres can reclaim WAL │
│ │
│ Vacuum: same periodic CSR rebuild as trigger mode │
└──────────────────────────────────────────────────────────────────┘Setup Requirements
-- PostgreSQL must have logical replication enabled
-- postgresql.conf: wal_level = logical
-- The extension creates the replication slot automatically.
ALTER SYSTEM SET wal_level = logical;
-- Restart Postgres, then:
SELECT graph.build(); -- creates slot 'graph_slot' if sync_mode = 'wal'Advantages Over Triggers
| Aspect | Triggers | WAL BGW |
|---|---|---|
| Write path overhead | ~15μs per row | Zero (WAL is written anyway) |
| Bulk load performance | Triggers fire per row | Changes consumed async |
| COPY command support | Triggers fire per row | Consumed from WAL after commit |
| Transaction visibility | After each statement | After transaction commit |
| Crash recovery | Triggers are transactional | LSN-based replay from slot |
Background Worker Registration
// In _PG_init():
BackgroundWorkerBuilder::new("graph WAL sync")
.set_function("graph.bgworker_main")
.set_library("graph")
.enable_spi_access()
.set_start_time(BgWorkerStartTime::RecoveryFinished)
.set_restart_time(Some(Duration::from_secs(5)))
.load();LSN Checkpointing
The BGW periodically writes its confirmed LSN to a file ($PGDATA/graph/wal_checkpoint). On crash recovery:
- BGW reads the last confirmed LSN from the checkpoint file.
- Reconnects to the replication slot.
- Postgres replays all WAL events from the last confirmed LSN.
- Graph catches up automatically — zero data loss.
Comparison Matrix
| Feature | Manual | Trigger | WAL (BGW) |
|---|---|---|---|
| Setup complexity | None | Auto (after build) | Requires wal_level = logical |
| Write overhead | Zero | ~15μs/row | Zero |
| Node property freshness | Stale until rebuild | Instant | Near-instant (async) |
| Edge freshness | Stale until rebuild | Next vacuum (60s) | Next vacuum (60s) |
| COPY bulk load | No sync | Per-row triggers | Async (efficient) |
| Crash recovery | Rebuild from tables | Rebuild from tables | LSN replay (fast) |
| Multi-database | N/A | N/A | Future: cross-DB via external replication |
| Recommended for | Analytics, batch | Most OLTP workloads | High write throughput |
Hybrid Strategy: Build + WAL
The recommended production deployment for large databases:
1. Create logical replication slot (exports a snapshot name)
2. Build graph using that specific exported snapshot (snapshot-consistent)
3. WAL BGW starts consuming from the slot's confirmed_flush_lsn
4. All changes after the snapshot are replayed by the BGW
5. Zero gap, zero data loss
This is exactly how Postgres physical replicas work:
base backup + WAL replay = consistent replicaImportant: The replication slot MUST be created before the build snapshot is taken. The slot creation exports a snapshot name (
pg_export_snapshot()), and graph uses that specific snapshot forCOPY TO STDOUT. This guarantees that no mutations are lost between the snapshot and the start of WAL consumption. If you build first and create the slot after, any mutations that occur during the build window would be silently lost.
This hybrid approach runs automatically when sync_mode = 'wal':
-- This single call does everything:
-- 1. Creates logical replication slot (exports snapshot)
-- 2. Builds the graph from the exported snapshot
-- 3. Starts the BGW to consume WAL changes from the slot's LSN
SELECT graph.build();Schema Change Handling
Row-level sync is not enough for a trustworthy Postgres extension. Users also expect pgGraph to notice when the registered schema no longer matches reality.
Schema sync uses Postgres DDL/event-trigger or catalog validation hooks to detect:
- registered table dropped or renamed
- registered column dropped, renamed, or type-changed
- primary key changed
- foreign key added, removed, or changed
- registered edge source/target column invalidated
- registered filter or weight column no longer numeric
Detected DDL should update graph metadata rather than guessing silently:
| Schema state | Meaning | Query behavior |
|---|---|---|
current | Registered graph metadata matches Postgres catalogs | Queries run normally |
stale | New schema information may need discovery/build | Existing graph may serve with warning/status flag |
invalid | Registered table/column/PK/edge no longer exists or changed incompatibly | Unsafe graph queries return a clear error until rebuild |
The user-facing recovery path should be:
SELECT graph.refresh_schema(); -- or graph.auto_discover(...)
SELECT graph.build();