Skip to Content
SQL API

SQL API Reference

Overview

graph exposes all functionality through Set-Returning Functions (SRFs) in the graph schema. There is no proprietary query language. Every function returns a virtual table that integrates seamlessly with standard SQL JOIN, WHERE, and ORDER BY.

All functions use REGCLASS for table references, enforcing compile-time table validation. All functions perform ACL pre-flight checks before accessing the graph. All errors are reported as standard SQL ERROR with HINT messages — graph never crashes Postgres.


The “Aha!” Moment

-- Install and graph-query your entire database in 10 seconds CREATE EXTENSION graph; SELECT * FROM graph.auto_discover(); -- Done. Now query: SELECT u.name, g.depth, g.path FROM graph.traverse('users', 'U-123', 4) g JOIN users u ON u.id = g.node_id WHERE u.status = 'active';

Discovery Functions

graph.auto_discover(schema_name DEFAULT 'public')

The hero function. Reads information_schema, discovers all tables and FK relationships, registers them, and builds the graph — automatically.

Runs automatically on CREATE EXTENSION graph; — you don’t need to call this explicitly. The extension bootstrap SQL calls graph.auto_discover() on the public schema during installation. Call it manually to discover additional schemas, or to re-discover after schema changes.

-- Runs automatically on CREATE EXTENSION graph; — no need to call manually. -- Use explicitly for additional schemas or re-discovery: SELECT * FROM graph.auto_discover(); -- re-discover public schema SELECT * FROM graph.auto_discover('sales'); -- discover a specific schema -- Returns a summary of what was discovered: -- type | name | details -- 'table' | 'users' | 'id column: id, 5 property columns' -- 'table' | 'orders' | 'id column: id, 3 property columns' -- 'edge' | 'customer' | 'orders.customer_id → users.id' -- 'edge' | 'product' | 'orders.product_id → products.id' -- 'build' | 'graph' | '2,500,000 nodes, 8,200,000 edges, 312 MB, 12.4s'

What happens:

  1. Queries information_schema.tables for all user tables.
  2. Queries information_schema.key_column_usage + table_constraints for all FK relationships.
  3. Registers each table via graph.add_table() (auto-selects id column and property columns).
  4. Registers each FK as an edge via graph.add_edge() (auto-generates edge labels from FK column names).
  5. Calls graph.build().
  6. Returns a summary table of everything discovered.

Column selection heuristic:

  • Primary key: Uses the PRIMARY KEY constraint. Tables with single-column PKs are registered with the PK column directly. Tables with composite primary keys are classified as either junction tables (if all PK columns are foreign keys → registered as edges, not nodes) or composite entities (if at least one PK column is not a FK → registered as nodes with a JSON array key, e.g., '["order_123","4"]').
  • Property columns: All text, varchar, and name columns with max length ≤ 128 characters are indexed. Binary columns (bytea), numeric columns, and generated columns are excluded.
  • Filter columns: No numeric columns are auto-registered as filter columns — use graph.add_filter_column() manually for those.

Edge label generation: FK column customer_id → edge label customer. FK column parent_category_id → edge label parent_category. The _id suffix is stripped automatically.


Setup Functions

graph.add_table(table_name, id_columns, columns, tenant_column)

Manually register a table as a node source. Use this for fine-grained control (otherwise auto_discover() handles it).

-- Minimal: just table and primary key SELECT graph.add_table('customers', id_columns := ARRAY['id']); -- With specific columns to index for property filtering SELECT graph.add_table( 'customers', id_columns := ARRAY['id'], columns := ARRAY['name', 'email', 'status', 'city'] ); -- With tenant isolation SELECT graph.add_table( 'customers', id_columns := ARRAY['id'], columns := ARRAY['name', 'email', 'status'], tenant_column := 'organisation_id' ); -- Composite key SELECT graph.add_table( 'order_lines', id_columns := ARRAY['order_id', 'line_number'], columns := ARRAY['sku', 'status'] );

Parameters:

ParameterTypeDefaultDescription
table_nameregclass(required)Table to register
id_columnstext[](required)One or more primary/unique key columns. Each must be NOT NULL and the tuple must be unique.
columnstext[]NULLColumns to index. NULL = auto-detect.
tenant_columntextNULLColumn for multi-tenant RLS integration.

Compatibility overload:

SELECT graph.add_table('customers', id_column := 'id');

The singular id_column text overload remains for existing single-key callers. New code should use id_columns text[], including for one-column keys.

Composite primary keys store node ids as canonical JSON array text generated by jsonb_build_array(col1::text, col2::text, ...)::text, for example '["order_123","4"]'. Callers may pass that string as seed_id, but generated SQL and ORM integrations should prefer graph helper constructors when available instead of hand-concatenating JSON.


graph.add_edge(from_table, from_column, to_table, to_column, label, ...)

Register a relationship as an edge. No FK constraint required — use this for any column that references another table, even without a formal foreign key.

-- Basic relationship SELECT graph.add_edge( 'orders', 'customer_id', 'customers', 'id', 'customer' ); -- Self-referential relationship SELECT graph.add_edge( 'employees', 'manager_id', 'employees', 'id', 'reports_to' ); -- One-directional edge (default is bidirectional) -- Unidirectional edges are only traversable in the registered direction. -- Starting a traverse() from the TARGET node will NOT expand backward -- through a unidirectional edge. Only the source → target direction works. SELECT graph.add_edge( 'follows', 'follower_id', 'users', 'id', 'follows', bidirectional := false ); -- Edge-table relationship with per-row labels -- edge_label / edge_path will contain values such as 'officer_of' -- and 'registered_address', not the generic fallback label. SELECT graph.add_edge( 'relationships'::regclass, 'node_id_start', 'entities'::regclass, 'node_id_end', 'relationship', label_column := 'rel_type' );

Parameters:

ParameterTypeDefaultDescription
from_tableregclass(required)Source table
from_columntext(required)FK column in source table
to_tableregclass(required)Target table
to_columntext(required)PK column in target table
labeltext(required)Human-readable edge name (this is the value you pass to edge_types in traverse())
bidirectionalbooltrueCreate edges in both directions
weight_columntextNULLOptional numeric source-table column used by weighted shortest path
label_columntextNULLOptional source-table column whose per-row values become edge labels in edge_label and edge_path

Edge type limit: 254 distinct edge labels (u8). Returns an error if exceeded. Values 0 and 255 are reserved internally (0 = untyped/null, 255 = internal sentinel).

Terminology note: The label you assign here is what you pass in the edge_types TEXT[] array when calling graph.traverse() or other query functions. The two names refer to the same concept — label is used during registration, edge_types during querying.

Dynamic labels: If label_column is set, each edge row uses that column’s value as its relationship label. Blank or NULL values fall back to label. This is useful for edge tables such as Panama Papers relationships, where one table contains many relationship types (officer_of, registered_address, intermediary_of, …).

Bidirectional labels: bidirectional := true stores a traversable reverse edge, but it does not invent a second relationship label. The reverse hop uses the same label as the source row. For example, if an officer_of row connects an officer to an entity, a traversal from the entity back to the officer still reports officer_of. Treat the label as the relationship type, and use path to see which direction the traversal walked.

No FK required: add_edge() works for any column-to-column relationship. If your tables don’t have formal FK constraints, register the logical edge manually:

-- Tables without FK constraints — still works SELECT graph.add_edge( 'user_profiles', 'user_id', 'users', 'id', 'has_profile' );

graph.add_filter_column(table_name, column_name, column_type)

Register a source column for traversal filter pushdown. This is distinct from Bloom/TokenIndex search. Registered filter columns are eligible for pruning inside traversal; unregistered columns remain valid in filter := ... but are verified during hydration.

-- Register numeric columns for filtering SELECT graph.add_filter_column('invoices', 'amount', column_type := 'numeric'); SELECT graph.add_filter_column('users', 'risk_score', column_type := 'numeric'); SELECT graph.add_filter_column('transactions', 'created_at', column_type := 'timestamptz');

Parameters:

ParameterTypeDefaultDescription
table_nameregclass(required)Table that owns the column
column_nametext(required)Column available to filter helpers
column_typetext(required)Filter type: 'text', 'numeric', 'boolean', 'date', 'timestamptz', or 'uuid'

The implementation may use compact internal representations for hot-path pushdown, but the SQL contract is type-aware. Any range, precision, scale, or timezone limitations must be documented by column_type and surfaced as clear errors.

Sparse array warning: The FilterIndex allocates one u32 per node in the entire graph, not just nodes from the registered table. If you register invoices.amount in a graph with 10M nodes but only 500K invoices, the remaining 9.5M entries are zeroed. Only register filter columns on tables that represent a large fraction of your graph, or accept the RAM overhead (4 bytes × total node count per column).

Use for: Amounts, risk scores, timestamps, status, UUID ownership keys, and other columns you want to filter during traversal without the cost of hydrating every candidate row.

Do NOT confuse with weight columns. Filter columns prune BFS results ($O(1)$ check per node). Weight columns change the pathfinding algorithm from BFS to Dijkstra ($O(V + E \log V)$). These serve completely different mathematical purposes.

Dynamic SQL Safety Contract

Every SQL-facing identifier and value is untrusted input. regclass validates table references, but it does not validate column names, property keys, edge labels, JSONB filter payloads, or tenant columns.

Implementation requirements:

  • resolve table arguments to OIDs and re-check pg_catalog.pg_class / pg_catalog.pg_namespace;
  • validate column names against pg_catalog.pg_attribute for the resolved table OID before build, search, hydration, trigger, or verification SQL uses them;
  • quote identifiers with Postgres identifier APIs such as quote_identifier / quote_ident, never by manual string quoting;
  • pass values through SPI parameters or typed pgrx values; never interpolate seed_id, property_value, tenant values, JSONB filter values, or edge labels into SQL strings;
  • reject raw JSONB filters that reference unregistered columns, unsupported operators, invalid type casts, or unknown table/edge scopes before building any SQL;
  • schema-qualify internal references as graph.* or pg_catalog.* to avoid search_path surprises.

This contract applies to COPY queries during graph.build(), hydration, filter verification, trigger generation, search verification, and admin/catalog helpers.


Build Functions

graph.build(concurrently)

Build (or rebuild) the graph from all registered tables.

SELECT * FROM graph.build(concurrently := false); -- Returns: -- build_id | status | nodes_loaded | edges_loaded | build_time_ms | memory_used_mb | sync_mode

What happens:

  1. Estimates memory needed. Returns ERROR with HINT if it would exceed graph.memory_limit_mb.
  2. Scans registered tables using graph.build_scan_mode. The default select mode uses SPI cursors; copy is reserved until a safe pgrx COPY reader is available.
  3. Builds CSR EdgeStore, NodeStore, BloomFilter, TokenIndex.
  4. Persists to $PGDATA/graph/graph.graph with atomic rename.
  5. Installs sync triggers (if graph.sync_mode = 'trigger').

Safe to call while serving queries. Old graph continues serving until the new one is ready.

Concurrency: build() acquires an advisory lock (pg_advisory_xact_lock). Concurrent calls to build() or vacuum() from other sessions will block until the lock is released. Only one build/vacuum can run at a time.

Primary key handling: All PK values are cast to their ::text representation internally. integer PKs become '42', uuid PKs are stored in standard hyphenated format ('550e8400-e29b-41d4-a716-446655440000'). Composite primary keys are supported — tables with composite PKs store a JSON array string (e.g., '["order_123","4"]') generated via jsonb_build_array(). Users reference composite-PK nodes using this JSON array format as the seed_id in query functions.

concurrently := false is the default. The function blocks until the snapshot is built and installed, then returns one completed row.

concurrently := true queues or starts a background build and returns immediately with status = 'queued' or status = 'running'. Existing queries continue using the previous graph snapshot until the new graph is complete and installed atomically.

Capacity note: concurrent builds temporarily hold the old serving graph and the new graph being built. Plan for old mmap pages, new build-time structures, per-backend heap indexes, Postgres shared buffers, and OS headroom. As a practical starting point, provision at least 2x the base graph memory before using concurrent builds. See 04_memory_model.md.


graph.vacuum()

Rebuild the CSR from pending edge mutations.

SELECT graph.vacuum();

The background worker calls this automatically. Call manually after large bulk mutations.


Query Functions

graph.traverse(...)

The core traversal function. Traverse outward from one node or many starting nodes. Returns ordinary SQL rows and hydrates source records as JSONB by default.

graph.traverse( seed_table REGCLASS, seed_id TEXT, max_depth INT DEFAULT 3, edge_types TEXT[] DEFAULT NULL, direction TEXT DEFAULT 'any', node_tables REGCLASS[] DEFAULT NULL, filter JSONB DEFAULT NULL, tenant TEXT DEFAULT NULL, strategy TEXT DEFAULT 'bfs', uniqueness TEXT DEFAULT 'node_global', include_start BOOLEAN DEFAULT true, hydrate BOOLEAN DEFAULT true, limit INT DEFAULT 1000, offset INT DEFAULT 0, max_nodes INT DEFAULT current_setting('graph.max_nodes')::int, max_frontier INT DEFAULT current_setting('graph.max_frontier')::int ) RETURNS TABLE ( root_table REGCLASS, root_id TEXT, node_table REGCLASS, node_id TEXT, depth INT, path JSONB, edge_path JSONB, node JSONB )

Parameters:

ParameterTypeDefaultDescription
seed_tableregclass(required)Table of the seed node
seed_idtext(required)Primary key of the seed node
max_depthint3Max traversal depth
edge_typestext[]NULLEdge type filter. NULL = traverse all.
directiontext'any''any' or 'out'. 'in' is reserved for future reverse-adjacency traversal.
node_tablesregclass[]NULLOnly return nodes from these tables
filterjsonbNULLStructured filter built by helpers such as graph.equals(), graph.greater_than(), and graph.all()
tenanttextNULLTenant/account/workspace scope
strategytext'bfs''bfs' or 'dfs'. Use graph.weighted_shortest_path() for weighted pathfinding.
uniquenesstext'node_global'Duplicate visit policy
include_startbooleantrueInclude starting nodes at depth 0
hydratebooleantrueReturn full source rows as JSONB
limitint1000Page size after traversal-time filtering
offsetint0Pagination offset after traversal-time filtering
max_nodesintGUCQuery circuit breaker
max_frontierintGUCFrontier circuit breaker
-- Preferred: helper-built structured filter SELECT * FROM graph.traverse( 'users', 'U-123', max_depth := 5, edge_types := ARRAY['transferred_to'], filter := graph.all( graph.greater_than('amount', 10000), graph.at_least('risk_score', 80) ) ); -- Advanced/generated-query form SELECT * FROM graph.traverse( 'users', 'U-123', max_depth := 5, filter := '{"node":{"where":{"jurisdiction":{"eq":"Panama"}}}}'::jsonb );

String filters such as filter_condition := 'amount > 10000' are not part of the public traversal API. They are hard to validate safely and too close to dynamic SQL. Public callers use helper functions that return structured JSONB.

Additional GUC-controlled limits (circuit breakers):

  • graph.max_nodes — max nodes visited per query (default 100,000)
  • graph.max_frontier — max frontier size (default 100,000)

Pagination execution order:

BFS / traversal strategy -> traversal-time pushdown filters -> deterministic traversal result ordering -> OFFSET / LIMIT -> hydration of the final page -> hydration-time verification for unregistered filters

OFFSET is applied before hydration, so deep offsets do not hydrate skipped rows. Filters on unregistered columns are verified during hydration and may return fewer than limit rows on a page, just as an SRF followed by an outer WHERE clause can produce a partially filled page. Register frequently used filter columns when stable page density matters.

Planner note: traversal functions are SRFs with data-dependent cardinality. The extension should declare conservative COST and ROWS values, but callers should materialize large coordinate result sets before joining to wide application tables when the planner chooses a poor nested-loop plan:

WITH nearby AS MATERIALIZED ( SELECT * FROM graph.traverse('users', 'U-123', max_depth := 5, hydrate := false) ) SELECT u.* FROM nearby g JOIN users u ON g.node_table = 'users'::regclass AND u.id::text = g.node_id;

For very large hydration jobs, insert traversal coordinates into a temporary table, ANALYZE it, then join to source tables so the planner has real statistics.

ORM integration notes:

  • Prisma, Hibernate, and SQLAlchemy users should call graph functions through parameterized native SQL, not by string-building table, column, or filter fragments.
  • Prefer named parameters for scalar values such as seed_id, tenant, limit, and offset.
  • Pass table names as explicit SQL casts such as $1::regclass only when the ORM/driver supports binding regclass cleanly; otherwise keep table names as static SQL fragments from trusted migration metadata.
  • For multi-start traversal, ORMs that cannot bind graph.node_ref[] should use graph.traverse_search() or populate a temporary table of (node_table regclass, node_id text) coordinates and call the coordinate ingestion API when available.
  • Hydrated node jsonb is the most ORM-friendly output for read models. Use hydrate := false only when the application intentionally joins coordinates back to typed ORM entities.

Usage Examples

-- Basic: everything connected to user U-123 within 4 hops SELECT * FROM graph.traverse('users', 'U-123', 4); -- Filtered: only follow 'transferred_to' and 'shares_device_with' edges SELECT * FROM graph.traverse('users', 'U-123', 4, ARRAY['transferred_to', 'shares_device_with']); -- Numeric filter: only traverse nodes where amount > 10,000 -- (requires graph.add_filter_column('invoices', 'amount', column_type := 'numeric') first) SELECT * FROM graph.traverse('users', 'U-123', 5, filter := graph.greater_than('amount', 10000)); -- Combined: edge type filter + numeric filter SELECT * FROM graph.traverse('users', 'U-123', 5, edge_types := ARRAY['transferred_to'], filter := graph.greater_than('risk_score', 80)); -- The hydration pattern: traverse then JOIN for full data -- This is where Postgres RLS enforcement happens automatically SELECT u.name, u.email, g.depth, g.path FROM graph.traverse('users', 'U-123', 4) g JOIN users u ON u.id = g.node_id WHERE u.status = 'active';

Why path JSONB Is Critical

AI agents don’t just need to know that two entities are connected — they need the exact chain to explain why. The path column returns ordered node coordinates, and edge_path returns the edge labels/directions between them:

SELECT g.path FROM graph.traverse('users', 'U-123', 4) g WHERE g.node_id = 'C-789'; -- path: [ -- {"table":"users","id":"U-123"}, -- {"table":"transfers","id":"T-456"}, -- {"table":"accounts","id":"A-111"}, -- {"table":"companies","id":"C-789"} -- ] -- The AI agent can now explain: "U-123 transferred to T-456, -- which shares an account with A-111, who is director of C-789"

edge_path is aligned to the gaps between path entries:

path[0] --edge_path[0]--> path[1] --edge_path[1]--> path[2]

So a depth-2 row with edge_path = ["officer_of", "officer_of"] is not a duplicate relationship. It means the selected node is two hops from the root, and both hops used the officer_of relationship type. Keep the full edge_path when you need to explain the route; add a separate distinct summary only for display.

SELECT depth, node_table::regclass::text AS source_table, node_id, graph.format_path(path, edge_path) AS readable_path, path, edge_path, node->>'name' AS name, node FROM graph.traverse_search( 'name', 'mossack fonseca', search_mode := 'contains', search_limit := 1, max_depth := 4, hydrate := true ) ORDER BY depth, source_table, node_id;

For SQL table viewers that render JSON objects as [object Object], select only the display helper:

SELECT depth, node_table::regclass::text AS source_table, node_id, graph.format_path(path, edge_path) AS readable_path, node->>'name' AS name, node FROM graph.traverse_search( 'name', 'mossack fonseca', search_mode := 'contains', search_limit := 1, max_depth := 4, hydrate := true ) ORDER BY depth, source_table, node_id;

graph.format_path(path, edge_path, separator)

Format traversal coordinates and edge labels as readable hop text. This is a presentation helper; path and edge_path remain the structured API.

SELECT graph.format_path( '[{"table":"entities","id":"10000266"}, {"table":"officers","id":"12160432"}, {"table":"entities","id":"10000001"}]'::jsonb, '["officer_of", "officer_of"]'::jsonb ); -- entities:10000266 --officer_of--> officers:12160432 | -- officers:12160432 --officer_of--> entities:10000001
ParameterTypeDefaultDescription
pathjsonb(required)JSONB array from graph.traverse() or graph.traverse_search()
edge_pathjsonb(required)JSONB array of labels aligned between path entries
separatortext`''`

graph.shortest_path(source_table, source_id, target_table, target_id, max_depth, hydrate)

Find the shortest path between two specific nodes using bidirectional BFS.

-- Signature graph.shortest_path( source_table REGCLASS, source_id TEXT, target_table REGCLASS, target_id TEXT, max_depth INT DEFAULT 20, hydrate BOOLEAN DEFAULT true ) RETURNS TABLE ( step INT, -- Position in path (0 = source) node_table REGCLASS, node_id TEXT, edge_label TEXT, -- Edge that reached this node (NULL for source) node JSONB -- Hydrated source row when hydrate = true )
-- Find how user U-123 is connected to company C-456 SELECT step, node_table::regclass::text AS table_name, node_id, edge_label, node FROM graph.shortest_path('users', 'U-123', 'companies', 'C-456'); -- Return coordinates only SELECT step, node_table::regclass::text AS table_name, node_id, edge_label FROM graph.shortest_path( 'users', 'U-123', 'companies', 'C-456', hydrate := false ) ORDER BY step;

If no path exists: Returns empty result set (not an error).

Bidirectional BFS: Expands from both source and target simultaneously. Meets in the middle. Typically 2× faster than single-direction BFS for shortest path.


graph.search(...)

Find nodes by property value and return verified, hydrated results by default.

graph.search( property_key TEXT, property_value TEXT, table_filter REGCLASS DEFAULT NULL, mode TEXT DEFAULT 'contains', case_sensitive BOOLEAN DEFAULT false, limit INT DEFAULT 100, offset INT DEFAULT 0, tenant TEXT DEFAULT NULL, hydrate BOOLEAN DEFAULT true, allow_bloom_candidates BOOLEAN DEFAULT false ) RETURNS TABLE ( node_table REGCLASS, node_id TEXT, match_type TEXT, score REAL, verified BOOLEAN, node JSONB )

Parameters:

ParameterTypeDefaultDescription
property_keytext(required)Property column name to search
property_valuetext(required)Value to match
table_filterregclassNULLRestrict to a specific table. NULL = search all.
modetext'contains''contains', 'prefix', 'exact', or 'token'
case_sensitivebooleanfalseUse case-sensitive matching when supported
limitint100Page size
offsetint0Search result offset
tenanttextNULLTenant/account/workspace scope
hydratebooleantrueReturn full source rows as JSONB
allow_bloom_candidatesbooleanfalseOpt into lossy Bloom candidate rows

limit and offset are reserved SQL grammar words. Use positional arguments or quote them as named arguments, for example "limit" := 20.

Return columns:

ColumnTypeDescription
node_tableregclassTable the matched node belongs to
node_idtextPrimary key of the matched node
match_typetextSearch mode or 'bloom_candidate'
scorerealMode-specific score. Current non-fuzzy modes return 1.0
verifiedbooleanWhether the source row matched during verification
nodejsonbHydrated source row when hydrate := true and verification succeeds

Behaviour:

  • Default search is case-insensitive contains.
  • Search results are verified against the current source row before being returned in the developer-facing API.
  • Bloom candidates are disabled unless allow_bloom_candidates := true.
  • Bloom candidates are labeled match_type = 'bloom_candidate' and verified = false unless hydration verification succeeds.
  • Zero results: Returns an empty result set (not an error).

Search modes:

ModeDescription
containsCase-insensitive substring search by default
prefixPrefix search, useful for autocomplete
exactFull-value exact match after configured case normalization
tokenTokenized word search, useful for names with reordered parts
-- Default verified hydrated search SELECT node_id, node->>'name' AS name FROM graph.search('name', 'mossack fonseca', table_filter := 'entities'); -- Prefix search for autocomplete SELECT node_id, node->>'name' AS name FROM graph.search('name', 'mossack', mode := 'prefix', "limit" := 20); -- Combine with traversal: search → traverse pipeline SELECT t.* FROM graph.search('name', 'Jonathan Chan') s CROSS JOIN LATERAL graph.traverse(s.node_table, s.node_id, 5) t WHERE s.verified;

Native Type Semantics

Graph coordinates are stored as text, but comparisons and filter pushdown must preserve Postgres type semantics.

TypeCanonical behavior
uuidStore and compare canonical lowercase hyphenated uuid::text. Accept ORM-bound UUID parameters through Postgres casts.
timestamptzCompare instants, not display strings. Normalize through Postgres timestamptz; hydration may render offsets.
dateCompare date values as dates, not midnight timestamps.
numericPreserve numeric precision for hydration-time verification. Hot-path pushdown must document any scale/range limits.
booleanAccept true boolean values, not arbitrary text aliases in raw JSONB.
jsonbHydration returns to_jsonb(source_row.*). JSONB columns remain nested JSON values inside node; traversal filters over JSONB require registered/generated columns or explicit JSON path helpers.
arraysHydrated as JSON arrays. Search/filter semantics are not implicit.
NULLNode identity columns must be non-null. Search and filters do not match NULL unless a dedicated is_null helper is added.

Admin Functions

graph.status()

Return current engine state.

graph.status() RETURNS TABLE ( node_count INT, -- Total nodes in graph edge_count INT, -- Total edges in graph memory_used_mb FLOAT, -- Current memory usage memory_limit_mb INT, -- graph.memory_limit_mb GUC value sync_mode TEXT, -- 'manual', 'trigger', or 'wal' sync_status TEXT, -- 'idle', 'syncing', 'read_only' last_build TIMESTAMPTZ, -- When build() last completed last_vacuum TIMESTAMPTZ, -- When vacuum() last completed edge_types TEXT[], -- List of registered edge labels edge_buffer_used INT, -- Pending mutations in edge buffer has_unidirectional_edges BOOL, -- Global flag for shortest_path strategy -- Full-sync status fields schema_status TEXT, -- 'current', 'stale', or 'invalid' sync_lag BIGINT, -- Pending row changes not yet applied pending_edge_deltas INT, -- Pending relationship mutations needs_vacuum BOOL, -- Edge deltas should be merged into CSR needs_rebuild BOOL -- Graph must be rebuilt before safe use )
SELECT * FROM graph.status();

graph.reset()

Drop the in-memory graph and remove persisted files.

SELECT graph.reset();

graph.remove_table(table_name) / graph.remove_edge(label)

Unregister. Requires graph.build() to take effect.

SELECT graph.remove_table('customers'); SELECT graph.remove_edge('ordered_by');

ACL Pre-Flight Check

Every query function performs an O(1) ACL check before touching the graph engine. This ensures users cannot traverse tables they don’t have SELECT access to.

// Pseudocode for every graph query function: fn check_acl(table_oid: Oid) { let user_oid = pg_sys::GetUserId(); let result = pg_sys::pg_class_aclcheck(table_oid, user_oid, ACL_SELECT); if result != ACLCHECK_OK { return Err(GraphError::AclDenied { table: table_oid.to_string(), }); } Ok(()) }

This check happens for:

  • The seed table in graph.traverse()
  • Both source and target tables in graph.shortest_path()
  • The table_filter in graph.search() when specified
  • All registered tables that own the searched property when graph.search() has no table_filter

Note: The ACL check covers the seed table. Nodes discovered during traversal may belong to tables the user doesn’t have access to — but those nodes are only useful when the user JOINs back to the source table, at which point Postgres enforces RLS and table-level permissions. See 09_security.md for the “ghost traversal” discussion.


GUC Parameters

All configurable via postgresql.conf, ALTER SYSTEM, or SET:

# Core graph.enabled = on # Master kill switch graph.data_dir = 'graph' # Subdirectory of $PGDATA graph.memory_limit_mb = 2048 # Hard memory limit (ERROR, not crash) # Sync graph.sync_mode = 'trigger' # 'manual', 'trigger', or 'wal' graph.vacuum_interval_secs = 60 # Auto-vacuum period graph.edge_buffer_size = 100000 # Max pending edge mutations # Query defaults graph.default_max_depth = 5 # Default BFS depth graph.max_nodes = 100000 # Default circuit breaker graph.max_frontier = 100000 # Default frontier limit # Indexing graph.max_token_length = 128 # Max property value length for TokenIndex graph.bloom_bits = 64 # Bloom signature width (read-only, fixed at 64-bit u64) graph.bloom_hash_count = 1 # Number of hash functions per token # Memory management graph.auto_load = on # Load graph into RAM on server start graph.oom_action = 'error' # 'error' (return SQL error) or 'readonly' (degrade) graph.build_scan_mode = 'select' # 'select'; 'copy' reserved until safe pgrx support

Error Codes

graph uses custom SQLSTATE codes so applications can programmatically handle errors rather than parsing message strings.

SQLSTATENameRaised ByDescription
PG001graph_oombuild(), vacuum()Memory limit exceeded. HINT: increase graph.memory_limit_mb.
PG002graph_acl_deniedtraverse(), shortest_path(), search(), global analyticsUser lacks SELECT on the specified table or lacks graph schema admin privileges for global analytics.
PG003graph_not_builtAll query functionsGraph not built yet. Call graph.build() first.
PG004graph_edge_type_limitadd_edge()254 edge type limit exceeded.
PG005graph_invalid_filtertraverse()Invalid structured filter payload or unsupported filter helper/operator.
PG006graph_build_lockedbuild(), vacuum()Another build() or vacuum() is already in progress.
PG007(reserved)Reserved for future use.
PG008graph_edge_buffer_fullTrigger syncEdge mutation buffer is full. Graph in read-only mode.
PG009graph_corrupt_fileStartup loader.graph file failed magic/checksum validation.
PG010graph_node_not_foundtraverse(), shortest_path(), weighted_shortest_path()Seed/source/target node not found in graph.

All errors include a DETAIL field with context and a HINT field with the recommended fix.


Complete Example: Fraud Investigation

-- 1. Install and auto-discover CREATE EXTENSION graph; SELECT * FROM graph.auto_discover(); -- 2. An AI agent asks: "Find all active users connected to Fraudster U-123 -- within 4 hops via money transfers or shared devices" SELECT u.name, u.email, u.risk_score, g.depth, g.path FROM graph.traverse('users', 'U-123', 4, ARRAY['transferred_to', 'shares_device_with']) g JOIN users u ON u.id = g.node_id WHERE u.status = 'active' AND g.node_table = 'users'::regclass ORDER BY g.depth, u.risk_score DESC; -- 3. Agent found a suspicious user U-999. Show the exact connection chain. SELECT p.step, p.node_table::regclass::text AS table_name, p.node_id, p.edge_label, p.node FROM graph.shortest_path('users', 'U-123', 'users', 'U-999') p ORDER BY p.step; -- 4. Batch screening: check 1,000 entities against a sanctions list WITH suspects AS ( SELECT id::text AS pk FROM users ORDER BY random() LIMIT 1000 ) SELECT s.pk, count(g.node_id) AS connections FROM suspects s CROSS JOIN LATERAL graph.traverse('users', s.pk, 5, ARRAY['transferred_to']) g WHERE g.node_table = 'sanctioned_entities'::regclass GROUP BY s.pk HAVING count(g.node_id) > 0;

Weighted Shortest Path

When an edge is registered with a weight_column, the weighted path functions become available:

-- Register an edge with a weight column SELECT graph.add_edge('invoices', 'customer_id', 'customers', 'id', 'billed_to', weight_column := 'amount' ); -- Weighted shortest path (Dijkstra, O(V + E log V)) graph.weighted_shortest_path( source_table REGCLASS, source_id TEXT, target_table REGCLASS, target_id TEXT ) RETURNS TABLE (path_nodes TEXT[], total_cost INT)

Dijkstra uses a BinaryHeap (min-priority queue) over the same CSR arrays with an added parallel weights: Vec<u32> array. See 01_architecture.md for implementation details.

Last updated on