SQL API Reference
Overview
graph exposes all functionality through Set-Returning Functions (SRFs) in the graph schema. There is no proprietary query language. Every function returns a virtual table that integrates seamlessly with standard SQL JOIN, WHERE, and ORDER BY.
All functions use REGCLASS for table references, enforcing compile-time table validation. All functions perform ACL pre-flight checks before accessing the graph. All errors are reported as standard SQL ERROR with HINT messages — graph never crashes Postgres.
The “Aha!” Moment
-- Install and graph-query your entire database in 10 seconds
CREATE EXTENSION graph;
SELECT * FROM graph.auto_discover();
-- Done. Now query:
SELECT u.name, g.depth, g.path
FROM graph.traverse('users', 'U-123', 4) g
JOIN users u ON u.id = g.node_id
WHERE u.status = 'active';Discovery Functions
graph.auto_discover(schema_name DEFAULT 'public')
The hero function. Reads information_schema, discovers all tables and FK relationships, registers them, and builds the graph — automatically.
Runs automatically on
CREATE EXTENSION graph;— you don’t need to call this explicitly. The extension bootstrap SQL callsgraph.auto_discover()on thepublicschema during installation. Call it manually to discover additional schemas, or to re-discover after schema changes.
-- Runs automatically on CREATE EXTENSION graph; — no need to call manually.
-- Use explicitly for additional schemas or re-discovery:
SELECT * FROM graph.auto_discover(); -- re-discover public schema
SELECT * FROM graph.auto_discover('sales'); -- discover a specific schema
-- Returns a summary of what was discovered:
-- type | name | details
-- 'table' | 'users' | 'id column: id, 5 property columns'
-- 'table' | 'orders' | 'id column: id, 3 property columns'
-- 'edge' | 'customer' | 'orders.customer_id → users.id'
-- 'edge' | 'product' | 'orders.product_id → products.id'
-- 'build' | 'graph' | '2,500,000 nodes, 8,200,000 edges, 312 MB, 12.4s'What happens:
- Queries
information_schema.tablesfor all user tables. - Queries
information_schema.key_column_usage+table_constraintsfor all FK relationships. - Registers each table via
graph.add_table()(auto-selects id column and property columns). - Registers each FK as an edge via
graph.add_edge()(auto-generates edge labels from FK column names). - Calls
graph.build(). - Returns a summary table of everything discovered.
Column selection heuristic:
- Primary key: Uses the
PRIMARY KEYconstraint. Tables with single-column PKs are registered with the PK column directly. Tables with composite primary keys are classified as either junction tables (if all PK columns are foreign keys → registered as edges, not nodes) or composite entities (if at least one PK column is not a FK → registered as nodes with a JSON array key, e.g.,'["order_123","4"]'). - Property columns: All
text,varchar, andnamecolumns with max length ≤ 128 characters are indexed. Binary columns (bytea), numeric columns, and generated columns are excluded. - Filter columns: No numeric columns are auto-registered as filter columns — use
graph.add_filter_column()manually for those.
Edge label generation: FK column customer_id → edge label customer. FK column parent_category_id → edge label parent_category. The _id suffix is stripped automatically.
Setup Functions
graph.add_table(table_name, id_columns, columns, tenant_column)
Manually register a table as a node source. Use this for fine-grained control (otherwise auto_discover() handles it).
-- Minimal: just table and primary key
SELECT graph.add_table('customers', id_columns := ARRAY['id']);
-- With specific columns to index for property filtering
SELECT graph.add_table(
'customers',
id_columns := ARRAY['id'],
columns := ARRAY['name', 'email', 'status', 'city']
);
-- With tenant isolation
SELECT graph.add_table(
'customers',
id_columns := ARRAY['id'],
columns := ARRAY['name', 'email', 'status'],
tenant_column := 'organisation_id'
);
-- Composite key
SELECT graph.add_table(
'order_lines',
id_columns := ARRAY['order_id', 'line_number'],
columns := ARRAY['sku', 'status']
);Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
table_name | regclass | (required) | Table to register |
id_columns | text[] | (required) | One or more primary/unique key columns. Each must be NOT NULL and the tuple must be unique. |
columns | text[] | NULL | Columns to index. NULL = auto-detect. |
tenant_column | text | NULL | Column for multi-tenant RLS integration. |
Compatibility overload:
SELECT graph.add_table('customers', id_column := 'id');The singular id_column text overload remains for existing single-key callers.
New code should use id_columns text[], including for one-column keys.
Composite primary keys store node ids as canonical JSON array text generated by
jsonb_build_array(col1::text, col2::text, ...)::text, for example
'["order_123","4"]'. Callers may pass that string as seed_id, but generated
SQL and ORM integrations should prefer graph helper constructors when available
instead of hand-concatenating JSON.
graph.add_edge(from_table, from_column, to_table, to_column, label, ...)
Register a relationship as an edge. No FK constraint required — use this for any column that references another table, even without a formal foreign key.
-- Basic relationship
SELECT graph.add_edge(
'orders', 'customer_id',
'customers', 'id',
'customer'
);
-- Self-referential relationship
SELECT graph.add_edge(
'employees', 'manager_id',
'employees', 'id',
'reports_to'
);
-- One-directional edge (default is bidirectional)
-- Unidirectional edges are only traversable in the registered direction.
-- Starting a traverse() from the TARGET node will NOT expand backward
-- through a unidirectional edge. Only the source → target direction works.
SELECT graph.add_edge(
'follows', 'follower_id',
'users', 'id',
'follows',
bidirectional := false
);
-- Edge-table relationship with per-row labels
-- edge_label / edge_path will contain values such as 'officer_of'
-- and 'registered_address', not the generic fallback label.
SELECT graph.add_edge(
'relationships'::regclass, 'node_id_start',
'entities'::regclass, 'node_id_end',
'relationship',
label_column := 'rel_type'
);Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
from_table | regclass | (required) | Source table |
from_column | text | (required) | FK column in source table |
to_table | regclass | (required) | Target table |
to_column | text | (required) | PK column in target table |
label | text | (required) | Human-readable edge name (this is the value you pass to edge_types in traverse()) |
bidirectional | bool | true | Create edges in both directions |
weight_column | text | NULL | Optional numeric source-table column used by weighted shortest path |
label_column | text | NULL | Optional source-table column whose per-row values become edge labels in edge_label and edge_path |
Edge type limit: 254 distinct edge labels (u8). Returns an error if exceeded. Values 0 and 255 are reserved internally (0 = untyped/null, 255 = internal sentinel).
Terminology note: The label you assign here is what you pass in the edge_types TEXT[] array when calling graph.traverse() or other query functions. The two names refer to the same concept — label is used during registration, edge_types during querying.
Dynamic labels: If label_column is set, each edge row uses that column’s
value as its relationship label. Blank or NULL values fall back to label.
This is useful for edge tables such as Panama Papers relationships, where one
table contains many relationship types (officer_of, registered_address,
intermediary_of, …).
Bidirectional labels: bidirectional := true stores a traversable reverse
edge, but it does not invent a second relationship label. The reverse hop uses
the same label as the source row. For example, if an officer_of row connects
an officer to an entity, a traversal from the entity back to the officer still
reports officer_of. Treat the label as the relationship type, and use path
to see which direction the traversal walked.
No FK required: add_edge() works for any column-to-column relationship. If your tables don’t have formal FK constraints, register the logical edge manually:
-- Tables without FK constraints — still works
SELECT graph.add_edge(
'user_profiles', 'user_id',
'users', 'id',
'has_profile'
);graph.add_filter_column(table_name, column_name, column_type)
Register a source column for traversal filter pushdown. This is distinct from
Bloom/TokenIndex search. Registered filter columns are eligible for pruning
inside traversal; unregistered columns remain valid in filter := ... but are
verified during hydration.
-- Register numeric columns for filtering
SELECT graph.add_filter_column('invoices', 'amount', column_type := 'numeric');
SELECT graph.add_filter_column('users', 'risk_score', column_type := 'numeric');
SELECT graph.add_filter_column('transactions', 'created_at', column_type := 'timestamptz');Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
table_name | regclass | (required) | Table that owns the column |
column_name | text | (required) | Column available to filter helpers |
column_type | text | (required) | Filter type: 'text', 'numeric', 'boolean', 'date', 'timestamptz', or 'uuid' |
The implementation may use compact internal representations for hot-path
pushdown, but the SQL contract is type-aware. Any range, precision, scale, or
timezone limitations must be documented by column_type and surfaced as clear
errors.
Sparse array warning: The FilterIndex allocates one u32 per node in the entire graph, not just nodes from the registered table. If you register invoices.amount in a graph with 10M nodes but only 500K invoices, the remaining 9.5M entries are zeroed. Only register filter columns on tables that represent a large fraction of your graph, or accept the RAM overhead (4 bytes × total node count per column).
Use for: Amounts, risk scores, timestamps, status, UUID ownership keys, and other columns you want to filter during traversal without the cost of hydrating every candidate row.
Do NOT confuse with weight columns. Filter columns prune BFS results ($O(1)$ check per node). Weight columns change the pathfinding algorithm from BFS to Dijkstra ($O(V + E \log V)$). These serve completely different mathematical purposes.
Dynamic SQL Safety Contract
Every SQL-facing identifier and value is untrusted input. regclass validates
table references, but it does not validate column names, property keys, edge
labels, JSONB filter payloads, or tenant columns.
Implementation requirements:
- resolve table arguments to OIDs and re-check
pg_catalog.pg_class/pg_catalog.pg_namespace; - validate column names against
pg_catalog.pg_attributefor the resolved table OID before build, search, hydration, trigger, or verification SQL uses them; - quote identifiers with Postgres identifier APIs such as
quote_identifier/quote_ident, never by manual string quoting; - pass values through SPI parameters or typed pgrx values; never interpolate
seed_id,property_value, tenant values, JSONB filter values, or edge labels into SQL strings; - reject raw JSONB filters that reference unregistered columns, unsupported operators, invalid type casts, or unknown table/edge scopes before building any SQL;
- schema-qualify internal references as
graph.*orpg_catalog.*to avoidsearch_pathsurprises.
This contract applies to COPY queries during graph.build(), hydration,
filter verification, trigger generation, search verification, and admin/catalog
helpers.
Build Functions
graph.build(concurrently)
Build (or rebuild) the graph from all registered tables.
SELECT *
FROM graph.build(concurrently := false);
-- Returns:
-- build_id | status | nodes_loaded | edges_loaded | build_time_ms | memory_used_mb | sync_modeWhat happens:
- Estimates memory needed. Returns
ERRORwithHINTif it would exceedgraph.memory_limit_mb. - Scans registered tables using
graph.build_scan_mode. The defaultselectmode uses SPI cursors;copyis reserved until a safe pgrx COPY reader is available. - Builds CSR EdgeStore, NodeStore, BloomFilter, TokenIndex.
- Persists to
$PGDATA/graph/graph.graphwith atomic rename. - Installs sync triggers (if
graph.sync_mode = 'trigger').
Safe to call while serving queries. Old graph continues serving until the new one is ready.
Concurrency: build() acquires an advisory lock (pg_advisory_xact_lock). Concurrent calls to build() or vacuum() from other sessions will block until the lock is released. Only one build/vacuum can run at a time.
Primary key handling: All PK values are cast to their ::text representation internally. integer PKs become '42', uuid PKs are stored in standard hyphenated format ('550e8400-e29b-41d4-a716-446655440000'). Composite primary keys are supported — tables with composite PKs store a JSON array string (e.g., '["order_123","4"]') generated via jsonb_build_array(). Users reference composite-PK nodes using this JSON array format as the seed_id in query functions.
concurrently := false is the default. The function blocks until the snapshot
is built and installed, then returns one completed row.
concurrently := true queues or starts a background build and returns
immediately with status = 'queued' or status = 'running'. Existing queries
continue using the previous graph snapshot until the new graph is complete and
installed atomically.
Capacity note: concurrent builds temporarily hold the old serving graph and the
new graph being built. Plan for old mmap pages, new build-time structures,
per-backend heap indexes, Postgres shared buffers, and OS headroom. As a
practical starting point, provision at least 2x the base graph memory before
using concurrent builds. See 04_memory_model.md.
graph.vacuum()
Rebuild the CSR from pending edge mutations.
SELECT graph.vacuum();The background worker calls this automatically. Call manually after large bulk mutations.
Query Functions
graph.traverse(...)
The core traversal function. Traverse outward from one node or many starting nodes. Returns ordinary SQL rows and hydrates source records as JSONB by default.
graph.traverse(
seed_table REGCLASS,
seed_id TEXT,
max_depth INT DEFAULT 3,
edge_types TEXT[] DEFAULT NULL,
direction TEXT DEFAULT 'any',
node_tables REGCLASS[] DEFAULT NULL,
filter JSONB DEFAULT NULL,
tenant TEXT DEFAULT NULL,
strategy TEXT DEFAULT 'bfs',
uniqueness TEXT DEFAULT 'node_global',
include_start BOOLEAN DEFAULT true,
hydrate BOOLEAN DEFAULT true,
limit INT DEFAULT 1000,
offset INT DEFAULT 0,
max_nodes INT DEFAULT current_setting('graph.max_nodes')::int,
max_frontier INT DEFAULT current_setting('graph.max_frontier')::int
)
RETURNS TABLE (
root_table REGCLASS,
root_id TEXT,
node_table REGCLASS,
node_id TEXT,
depth INT,
path JSONB,
edge_path JSONB,
node JSONB
)Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
seed_table | regclass | (required) | Table of the seed node |
seed_id | text | (required) | Primary key of the seed node |
max_depth | int | 3 | Max traversal depth |
edge_types | text[] | NULL | Edge type filter. NULL = traverse all. |
direction | text | 'any' | 'any' or 'out'. 'in' is reserved for future reverse-adjacency traversal. |
node_tables | regclass[] | NULL | Only return nodes from these tables |
filter | jsonb | NULL | Structured filter built by helpers such as graph.equals(), graph.greater_than(), and graph.all() |
tenant | text | NULL | Tenant/account/workspace scope |
strategy | text | 'bfs' | 'bfs' or 'dfs'. Use graph.weighted_shortest_path() for weighted pathfinding. |
uniqueness | text | 'node_global' | Duplicate visit policy |
include_start | boolean | true | Include starting nodes at depth 0 |
hydrate | boolean | true | Return full source rows as JSONB |
limit | int | 1000 | Page size after traversal-time filtering |
offset | int | 0 | Pagination offset after traversal-time filtering |
max_nodes | int | GUC | Query circuit breaker |
max_frontier | int | GUC | Frontier circuit breaker |
-- Preferred: helper-built structured filter
SELECT *
FROM graph.traverse(
'users',
'U-123',
max_depth := 5,
edge_types := ARRAY['transferred_to'],
filter := graph.all(
graph.greater_than('amount', 10000),
graph.at_least('risk_score', 80)
)
);
-- Advanced/generated-query form
SELECT *
FROM graph.traverse(
'users',
'U-123',
max_depth := 5,
filter := '{"node":{"where":{"jurisdiction":{"eq":"Panama"}}}}'::jsonb
);String filters such as filter_condition := 'amount > 10000' are not part of
the public traversal API. They are hard to validate safely and too close to
dynamic SQL. Public callers use helper functions that return structured JSONB.
Additional GUC-controlled limits (circuit breakers):
graph.max_nodes— max nodes visited per query (default 100,000)graph.max_frontier— max frontier size (default 100,000)
Pagination execution order:
BFS / traversal strategy
-> traversal-time pushdown filters
-> deterministic traversal result ordering
-> OFFSET / LIMIT
-> hydration of the final page
-> hydration-time verification for unregistered filtersOFFSET is applied before hydration, so deep offsets do not hydrate skipped
rows. Filters on unregistered columns are verified during hydration and may
return fewer than limit rows on a page, just as an SRF followed by an outer
WHERE clause can produce a partially filled page. Register frequently used
filter columns when stable page density matters.
Planner note: traversal functions are SRFs with data-dependent cardinality.
The extension should declare conservative COST and ROWS values, but callers
should materialize large coordinate result sets before joining to wide
application tables when the planner chooses a poor nested-loop plan:
WITH nearby AS MATERIALIZED (
SELECT *
FROM graph.traverse('users', 'U-123', max_depth := 5, hydrate := false)
)
SELECT u.*
FROM nearby g
JOIN users u
ON g.node_table = 'users'::regclass
AND u.id::text = g.node_id;For very large hydration jobs, insert traversal coordinates into a temporary
table, ANALYZE it, then join to source tables so the planner has real
statistics.
ORM integration notes:
- Prisma, Hibernate, and SQLAlchemy users should call graph functions through parameterized native SQL, not by string-building table, column, or filter fragments.
- Prefer named parameters for scalar values such as
seed_id,tenant,limit, andoffset. - Pass table names as explicit SQL casts such as
$1::regclassonly when the ORM/driver supports bindingregclasscleanly; otherwise keep table names as static SQL fragments from trusted migration metadata. - For multi-start traversal, ORMs that cannot bind
graph.node_ref[]should usegraph.traverse_search()or populate a temporary table of(node_table regclass, node_id text)coordinates and call the coordinate ingestion API when available. - Hydrated
node jsonbis the most ORM-friendly output for read models. Usehydrate := falseonly when the application intentionally joins coordinates back to typed ORM entities.
Usage Examples
-- Basic: everything connected to user U-123 within 4 hops
SELECT * FROM graph.traverse('users', 'U-123', 4);
-- Filtered: only follow 'transferred_to' and 'shares_device_with' edges
SELECT * FROM graph.traverse('users', 'U-123', 4,
ARRAY['transferred_to', 'shares_device_with']);
-- Numeric filter: only traverse nodes where amount > 10,000
-- (requires graph.add_filter_column('invoices', 'amount', column_type := 'numeric') first)
SELECT * FROM graph.traverse('users', 'U-123', 5,
filter := graph.greater_than('amount', 10000));
-- Combined: edge type filter + numeric filter
SELECT * FROM graph.traverse('users', 'U-123', 5,
edge_types := ARRAY['transferred_to'],
filter := graph.greater_than('risk_score', 80));
-- The hydration pattern: traverse then JOIN for full data
-- This is where Postgres RLS enforcement happens automatically
SELECT u.name, u.email, g.depth, g.path
FROM graph.traverse('users', 'U-123', 4) g
JOIN users u ON u.id = g.node_id
WHERE u.status = 'active';Why path JSONB Is Critical
AI agents don’t just need to know that two entities are connected — they need
the exact chain to explain why. The path column returns ordered node
coordinates, and edge_path returns the edge labels/directions between them:
SELECT g.path
FROM graph.traverse('users', 'U-123', 4) g
WHERE g.node_id = 'C-789';
-- path: [
-- {"table":"users","id":"U-123"},
-- {"table":"transfers","id":"T-456"},
-- {"table":"accounts","id":"A-111"},
-- {"table":"companies","id":"C-789"}
-- ]
-- The AI agent can now explain: "U-123 transferred to T-456,
-- which shares an account with A-111, who is director of C-789"edge_path is aligned to the gaps between path entries:
path[0] --edge_path[0]--> path[1] --edge_path[1]--> path[2]So a depth-2 row with edge_path = ["officer_of", "officer_of"] is not a
duplicate relationship. It means the selected node is two hops from the root,
and both hops used the officer_of relationship type. Keep the full
edge_path when you need to explain the route; add a separate distinct summary
only for display.
SELECT depth,
node_table::regclass::text AS source_table,
node_id,
graph.format_path(path, edge_path) AS readable_path,
path,
edge_path,
node->>'name' AS name,
node
FROM graph.traverse_search(
'name', 'mossack fonseca',
search_mode := 'contains',
search_limit := 1,
max_depth := 4,
hydrate := true
)
ORDER BY depth, source_table, node_id;For SQL table viewers that render JSON objects as [object Object], select only
the display helper:
SELECT depth,
node_table::regclass::text AS source_table,
node_id,
graph.format_path(path, edge_path) AS readable_path,
node->>'name' AS name,
node
FROM graph.traverse_search(
'name', 'mossack fonseca',
search_mode := 'contains',
search_limit := 1,
max_depth := 4,
hydrate := true
)
ORDER BY depth, source_table, node_id;graph.format_path(path, edge_path, separator)
Format traversal coordinates and edge labels as readable hop text. This is a
presentation helper; path and edge_path remain the structured API.
SELECT graph.format_path(
'[{"table":"entities","id":"10000266"},
{"table":"officers","id":"12160432"},
{"table":"entities","id":"10000001"}]'::jsonb,
'["officer_of", "officer_of"]'::jsonb
);
-- entities:10000266 --officer_of--> officers:12160432 |
-- officers:12160432 --officer_of--> entities:10000001| Parameter | Type | Default | Description |
|---|---|---|---|
path | jsonb | (required) | JSONB array from graph.traverse() or graph.traverse_search() |
edge_path | jsonb | (required) | JSONB array of labels aligned between path entries |
separator | text | `' | '` |
graph.shortest_path(source_table, source_id, target_table, target_id, max_depth, hydrate)
Find the shortest path between two specific nodes using bidirectional BFS.
-- Signature
graph.shortest_path(
source_table REGCLASS,
source_id TEXT,
target_table REGCLASS,
target_id TEXT,
max_depth INT DEFAULT 20,
hydrate BOOLEAN DEFAULT true
)
RETURNS TABLE (
step INT, -- Position in path (0 = source)
node_table REGCLASS,
node_id TEXT,
edge_label TEXT, -- Edge that reached this node (NULL for source)
node JSONB -- Hydrated source row when hydrate = true
)-- Find how user U-123 is connected to company C-456
SELECT step, node_table::regclass::text AS table_name, node_id, edge_label, node
FROM graph.shortest_path('users', 'U-123', 'companies', 'C-456');
-- Return coordinates only
SELECT step, node_table::regclass::text AS table_name, node_id, edge_label
FROM graph.shortest_path(
'users', 'U-123', 'companies', 'C-456',
hydrate := false
)
ORDER BY step;If no path exists: Returns empty result set (not an error).
Bidirectional BFS: Expands from both source and target simultaneously. Meets in the middle. Typically 2× faster than single-direction BFS for shortest path.
graph.search(...)
Find nodes by property value and return verified, hydrated results by default.
graph.search(
property_key TEXT,
property_value TEXT,
table_filter REGCLASS DEFAULT NULL,
mode TEXT DEFAULT 'contains',
case_sensitive BOOLEAN DEFAULT false,
limit INT DEFAULT 100,
offset INT DEFAULT 0,
tenant TEXT DEFAULT NULL,
hydrate BOOLEAN DEFAULT true,
allow_bloom_candidates BOOLEAN DEFAULT false
)
RETURNS TABLE (
node_table REGCLASS,
node_id TEXT,
match_type TEXT,
score REAL,
verified BOOLEAN,
node JSONB
)Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
property_key | text | (required) | Property column name to search |
property_value | text | (required) | Value to match |
table_filter | regclass | NULL | Restrict to a specific table. NULL = search all. |
mode | text | 'contains' | 'contains', 'prefix', 'exact', or 'token' |
case_sensitive | boolean | false | Use case-sensitive matching when supported |
limit | int | 100 | Page size |
offset | int | 0 | Search result offset |
tenant | text | NULL | Tenant/account/workspace scope |
hydrate | boolean | true | Return full source rows as JSONB |
allow_bloom_candidates | boolean | false | Opt into lossy Bloom candidate rows |
limit and offset are reserved SQL grammar words. Use positional arguments
or quote them as named arguments, for example "limit" := 20.
Return columns:
| Column | Type | Description |
|---|---|---|
node_table | regclass | Table the matched node belongs to |
node_id | text | Primary key of the matched node |
match_type | text | Search mode or 'bloom_candidate' |
score | real | Mode-specific score. Current non-fuzzy modes return 1.0 |
verified | boolean | Whether the source row matched during verification |
node | jsonb | Hydrated source row when hydrate := true and verification succeeds |
Behaviour:
- Default search is case-insensitive
contains. - Search results are verified against the current source row before being returned in the developer-facing API.
- Bloom candidates are disabled unless
allow_bloom_candidates := true. - Bloom candidates are labeled
match_type = 'bloom_candidate'andverified = falseunless hydration verification succeeds. - Zero results: Returns an empty result set (not an error).
Search modes:
| Mode | Description |
|---|---|
contains | Case-insensitive substring search by default |
prefix | Prefix search, useful for autocomplete |
exact | Full-value exact match after configured case normalization |
token | Tokenized word search, useful for names with reordered parts |
-- Default verified hydrated search
SELECT node_id, node->>'name' AS name
FROM graph.search('name', 'mossack fonseca', table_filter := 'entities');
-- Prefix search for autocomplete
SELECT node_id, node->>'name' AS name
FROM graph.search('name', 'mossack', mode := 'prefix', "limit" := 20);
-- Combine with traversal: search → traverse pipeline
SELECT t.*
FROM graph.search('name', 'Jonathan Chan') s
CROSS JOIN LATERAL graph.traverse(s.node_table, s.node_id, 5) t
WHERE s.verified;Native Type Semantics
Graph coordinates are stored as text, but comparisons and filter pushdown must preserve Postgres type semantics.
| Type | Canonical behavior |
|---|---|
uuid | Store and compare canonical lowercase hyphenated uuid::text. Accept ORM-bound UUID parameters through Postgres casts. |
timestamptz | Compare instants, not display strings. Normalize through Postgres timestamptz; hydration may render offsets. |
date | Compare date values as dates, not midnight timestamps. |
numeric | Preserve numeric precision for hydration-time verification. Hot-path pushdown must document any scale/range limits. |
boolean | Accept true boolean values, not arbitrary text aliases in raw JSONB. |
jsonb | Hydration returns to_jsonb(source_row.*). JSONB columns remain nested JSON values inside node; traversal filters over JSONB require registered/generated columns or explicit JSON path helpers. |
| arrays | Hydrated as JSON arrays. Search/filter semantics are not implicit. |
NULL | Node identity columns must be non-null. Search and filters do not match NULL unless a dedicated is_null helper is added. |
Admin Functions
graph.status()
Return current engine state.
graph.status()
RETURNS TABLE (
node_count INT, -- Total nodes in graph
edge_count INT, -- Total edges in graph
memory_used_mb FLOAT, -- Current memory usage
memory_limit_mb INT, -- graph.memory_limit_mb GUC value
sync_mode TEXT, -- 'manual', 'trigger', or 'wal'
sync_status TEXT, -- 'idle', 'syncing', 'read_only'
last_build TIMESTAMPTZ, -- When build() last completed
last_vacuum TIMESTAMPTZ, -- When vacuum() last completed
edge_types TEXT[], -- List of registered edge labels
edge_buffer_used INT, -- Pending mutations in edge buffer
has_unidirectional_edges BOOL, -- Global flag for shortest_path strategy
-- Full-sync status fields
schema_status TEXT, -- 'current', 'stale', or 'invalid'
sync_lag BIGINT, -- Pending row changes not yet applied
pending_edge_deltas INT, -- Pending relationship mutations
needs_vacuum BOOL, -- Edge deltas should be merged into CSR
needs_rebuild BOOL -- Graph must be rebuilt before safe use
)SELECT * FROM graph.status();graph.reset()
Drop the in-memory graph and remove persisted files.
SELECT graph.reset();graph.remove_table(table_name) / graph.remove_edge(label)
Unregister. Requires graph.build() to take effect.
SELECT graph.remove_table('customers');
SELECT graph.remove_edge('ordered_by');ACL Pre-Flight Check
Every query function performs an O(1) ACL check before touching the graph engine. This ensures users cannot traverse tables they don’t have SELECT access to.
// Pseudocode for every graph query function:
fn check_acl(table_oid: Oid) {
let user_oid = pg_sys::GetUserId();
let result = pg_sys::pg_class_aclcheck(table_oid, user_oid, ACL_SELECT);
if result != ACLCHECK_OK {
return Err(GraphError::AclDenied {
table: table_oid.to_string(),
});
}
Ok(())
}This check happens for:
- The seed table in
graph.traverse() - Both source and target tables in
graph.shortest_path() - The table_filter in
graph.search()when specified - All registered tables that own the searched property when
graph.search()has notable_filter
Note: The ACL check covers the seed table. Nodes discovered during traversal may belong to tables the user doesn’t have access to — but those nodes are only useful when the user JOINs back to the source table, at which point Postgres enforces RLS and table-level permissions. See 09_security.md for the “ghost traversal” discussion.
GUC Parameters
All configurable via postgresql.conf, ALTER SYSTEM, or SET:
# Core
graph.enabled = on # Master kill switch
graph.data_dir = 'graph' # Subdirectory of $PGDATA
graph.memory_limit_mb = 2048 # Hard memory limit (ERROR, not crash)
# Sync
graph.sync_mode = 'trigger' # 'manual', 'trigger', or 'wal'
graph.vacuum_interval_secs = 60 # Auto-vacuum period
graph.edge_buffer_size = 100000 # Max pending edge mutations
# Query defaults
graph.default_max_depth = 5 # Default BFS depth
graph.max_nodes = 100000 # Default circuit breaker
graph.max_frontier = 100000 # Default frontier limit
# Indexing
graph.max_token_length = 128 # Max property value length for TokenIndex
graph.bloom_bits = 64 # Bloom signature width (read-only, fixed at 64-bit u64)
graph.bloom_hash_count = 1 # Number of hash functions per token
# Memory management
graph.auto_load = on # Load graph into RAM on server start
graph.oom_action = 'error' # 'error' (return SQL error) or 'readonly' (degrade)
graph.build_scan_mode = 'select' # 'select'; 'copy' reserved until safe pgrx supportError Codes
graph uses custom SQLSTATE codes so applications can programmatically handle errors rather than parsing message strings.
| SQLSTATE | Name | Raised By | Description |
|---|---|---|---|
PG001 | graph_oom | build(), vacuum() | Memory limit exceeded. HINT: increase graph.memory_limit_mb. |
PG002 | graph_acl_denied | traverse(), shortest_path(), search(), global analytics | User lacks SELECT on the specified table or lacks graph schema admin privileges for global analytics. |
PG003 | graph_not_built | All query functions | Graph not built yet. Call graph.build() first. |
PG004 | graph_edge_type_limit | add_edge() | 254 edge type limit exceeded. |
PG005 | graph_invalid_filter | traverse() | Invalid structured filter payload or unsupported filter helper/operator. |
PG006 | graph_build_locked | build(), vacuum() | Another build() or vacuum() is already in progress. |
PG007 | (reserved) | — | Reserved for future use. |
PG008 | graph_edge_buffer_full | Trigger sync | Edge mutation buffer is full. Graph in read-only mode. |
PG009 | graph_corrupt_file | Startup loader | .graph file failed magic/checksum validation. |
PG010 | graph_node_not_found | traverse(), shortest_path(), weighted_shortest_path() | Seed/source/target node not found in graph. |
All errors include a DETAIL field with context and a HINT field with the recommended fix.
Complete Example: Fraud Investigation
-- 1. Install and auto-discover
CREATE EXTENSION graph;
SELECT * FROM graph.auto_discover();
-- 2. An AI agent asks: "Find all active users connected to Fraudster U-123
-- within 4 hops via money transfers or shared devices"
SELECT u.name, u.email, u.risk_score, g.depth, g.path
FROM graph.traverse('users', 'U-123', 4,
ARRAY['transferred_to', 'shares_device_with']) g
JOIN users u ON u.id = g.node_id
WHERE u.status = 'active'
AND g.node_table = 'users'::regclass
ORDER BY g.depth, u.risk_score DESC;
-- 3. Agent found a suspicious user U-999. Show the exact connection chain.
SELECT p.step,
p.node_table::regclass::text AS table_name,
p.node_id,
p.edge_label,
p.node
FROM graph.shortest_path('users', 'U-123', 'users', 'U-999') p
ORDER BY p.step;
-- 4. Batch screening: check 1,000 entities against a sanctions list
WITH suspects AS (
SELECT id::text AS pk FROM users ORDER BY random() LIMIT 1000
)
SELECT s.pk, count(g.node_id) AS connections
FROM suspects s
CROSS JOIN LATERAL graph.traverse('users', s.pk, 5,
ARRAY['transferred_to']) g
WHERE g.node_table = 'sanctioned_entities'::regclass
GROUP BY s.pk
HAVING count(g.node_id) > 0;Weighted Shortest Path
When an edge is registered with a weight_column, the weighted path functions become available:
-- Register an edge with a weight column
SELECT graph.add_edge('invoices', 'customer_id', 'customers', 'id', 'billed_to',
weight_column := 'amount'
);
-- Weighted shortest path (Dijkstra, O(V + E log V))
graph.weighted_shortest_path(
source_table REGCLASS, source_id TEXT,
target_table REGCLASS, target_id TEXT
)
RETURNS TABLE (path_nodes TEXT[], total_cost INT)Dijkstra uses a BinaryHeap (min-priority queue) over the same CSR arrays with an added parallel weights: Vec<u32> array. See 01_architecture.md for implementation details.