SQL API Reference

Overview

graph exposes all functionality through Set-Returning Functions (SRFs) in the graph schema. There is no proprietary query language. Every function returns a virtual table that integrates seamlessly with standard SQL JOIN, WHERE, and ORDER BY.

All functions use REGCLASS for table references, enforcing compile-time table validation. All functions perform ACL pre-flight checks before accessing the graph. All errors are reported as standard SQL ERROR with HINT messages — graph never crashes Postgres.

The “Aha!” Moment


-- Install and graph-query your entire database in 10 seconds
CREATE EXTENSION graph;
SELECT * FROM graph.auto_discover();
 
-- Done. Now query:
SELECT u.name, g.depth, g.path
FROM graph.traverse('users', 'U-123', 4) g
JOIN users u ON u.id = g.node_id
WHERE u.status = 'active';

Discovery Functions

`graph.auto_discover(schema_name DEFAULT 'public')`

The hero function. Reads information_schema, discovers all tables and FK relationships, registers them, and builds the graph — automatically.

Runs automatically on CREATE EXTENSION graph; — you don’t need to call this explicitly. The extension bootstrap SQL calls graph.auto_discover() on the public schema during installation. Call it manually to discover additional schemas, or to re-discover after schema changes.


-- Runs automatically on CREATE EXTENSION graph; — no need to call manually.
-- Use explicitly for additional schemas or re-discovery:
SELECT * FROM graph.auto_discover();          -- re-discover public schema
SELECT * FROM graph.auto_discover('sales');   -- discover a specific schema
 
-- Returns a summary of what was discovered:
--   type     | name            | details
--   'table'  | 'users'         | 'id column: id, 5 property columns'
--   'table'  | 'orders'        | 'id column: id, 3 property columns'
--   'edge'   | 'customer'      | 'orders.customer_id → users.id'
--   'edge'   | 'product'       | 'orders.product_id → products.id'
--   'build'  | 'graph'         | '2,500,000 nodes, 8,200,000 edges, 312 MB, 12.4s'

What happens:

Queries information_schema.tables for all user tables.
Queries information_schema.key_column_usage + table_constraints for all FK relationships.
Registers each table via graph.add_table() (auto-selects id column and property columns).
Registers each FK as an edge via graph.add_edge() (auto-generates edge labels from FK column names).
Calls graph.build().
Returns a summary table of everything discovered.

Column selection heuristic:

Primary key: Uses the PRIMARY KEY constraint. Tables with single-column PKs are registered with the PK column directly. Tables with composite primary keys are classified as either junction tables (if all PK columns are foreign keys → registered as edges, not nodes) or composite entities (if at least one PK column is not a FK → registered as nodes with a JSON array key, e.g., '["order_123","4"]').
Property columns: All text, varchar, and name columns with max length ≤ 128 characters are indexed. Binary columns (bytea), numeric columns, and generated columns are excluded.
Filter columns: No numeric columns are auto-registered as filter columns — use graph.add_filter_column() manually for those.

Edge label generation: FK column customer_id → edge label customer. FK column parent_category_id → edge label parent_category. The _id suffix is stripped automatically.

Setup Functions

`graph.add_table(table_name, id_columns, columns, tenant_column)`

Manually register a table as a node source. Use this for fine-grained control (otherwise auto_discover() handles it).


-- Minimal: just table and primary key
SELECT graph.add_table('customers', id_columns := ARRAY['id']);
 
-- With specific columns to index for property filtering
SELECT graph.add_table(
    'customers',
    id_columns := ARRAY['id'],
    columns := ARRAY['name', 'email', 'status', 'city']
);
 
-- With tenant isolation
SELECT graph.add_table(
    'customers',
    id_columns := ARRAY['id'],
    columns := ARRAY['name', 'email', 'status'],
    tenant_column := 'organisation_id'
);
 
-- Composite key
SELECT graph.add_table(
    'order_lines',
    id_columns := ARRAY['order_id', 'line_number'],
    columns := ARRAY['sku', 'status']
);

Parameters:

Parameter	Type	Default	Description
`table_name`	`regclass`	(required)	Table to register
`id_columns`	`text[]`	(required)	One or more primary/unique key columns. Each must be NOT NULL and the tuple must be unique.
`columns`	`text[]`	`NULL`	Columns to index. `NULL` = auto-detect.
`tenant_column`	`text`	`NULL`	Column for multi-tenant RLS integration.

Compatibility overload:


SELECT graph.add_table('customers', id_column := 'id');

The singular id_column text overload remains for existing single-key callers. New code should use id_columns text[], including for one-column keys.

Composite primary keys store node ids as canonical JSON array text generated by jsonb_build_array(col1::text, col2::text, ...)::text, for example '["order_123","4"]'. Callers may pass that string as seed_id, but generated SQL and ORM integrations should prefer graph helper constructors when available instead of hand-concatenating JSON.

`graph.add_edge(from_table, from_column, to_table, to_column, label, ...)`

Register a relationship as an edge. No FK constraint required — use this for any column that references another table, even without a formal foreign key.


-- Basic relationship
SELECT graph.add_edge(
    'orders', 'customer_id',
    'customers', 'id',
    'customer'
);
 
-- Self-referential relationship
SELECT graph.add_edge(
    'employees', 'manager_id',
    'employees', 'id',
    'reports_to'
);
 
-- One-directional edge (default is bidirectional)
-- Unidirectional edges are only traversable in the registered direction.
-- Starting a traverse() from the TARGET node will NOT expand backward
-- through a unidirectional edge. Only the source → target direction works.
SELECT graph.add_edge(
    'follows', 'follower_id',
    'users', 'id',
    'follows',
    bidirectional := false
);
 
-- Edge-table relationship with per-row labels
-- edge_label / edge_path will contain values such as 'officer_of'
-- and 'registered_address', not the generic fallback label.
SELECT graph.add_edge(
    'relationships'::regclass, 'node_id_start',
    'entities'::regclass, 'node_id_end',
    'relationship',
    label_column := 'rel_type'
);

Parameters:

Parameter	Type	Default	Description
`from_table`	`regclass`	(required)	Source table
`from_column`	`text`	(required)	FK column in source table
`to_table`	`regclass`	(required)	Target table
`to_column`	`text`	(required)	PK column in target table
`label`	`text`	(required)	Human-readable edge name (this is the value you pass to `edge_types` in `traverse()`)
`bidirectional`	`bool`	`true`	Create edges in both directions
`weight_column`	`text`	`NULL`	Optional numeric source-table column used by weighted shortest path
`label_column`	`text`	`NULL`	Optional source-table column whose per-row values become edge labels in `edge_label` and `edge_path`

Edge type limit: 254 distinct edge labels (u8). Returns an error if exceeded. Values 0 and 255 are reserved internally (0 = untyped/null, 255 = internal sentinel).

Terminology note: The label you assign here is what you pass in the edge_types TEXT[] array when calling graph.traverse() or other query functions. The two names refer to the same concept — label is used during registration, edge_types during querying.

Dynamic labels: If label_column is set, each edge row uses that column’s value as its relationship label. Blank or NULL values fall back to label. This is useful for edge tables such as Panama Papers relationships, where one table contains many relationship types (officer_of, registered_address, intermediary_of, …).

Bidirectional labels: bidirectional := true stores a traversable reverse edge, but it does not invent a second relationship label. The reverse hop uses the same label as the source row. For example, if an officer_of row connects an officer to an entity, a traversal from the entity back to the officer still reports officer_of. Treat the label as the relationship type, and use path to see which direction the traversal walked.

No FK required: add_edge() works for any column-to-column relationship. If your tables don’t have formal FK constraints, register the logical edge manually:


-- Tables without FK constraints — still works
SELECT graph.add_edge(
    'user_profiles', 'user_id',
    'users', 'id',
    'has_profile'
);

`graph.add_filter_column(table_name, column_name, column_type)`

Register a source column for traversal filter pushdown. This is distinct from Bloom/TokenIndex search. Registered filter columns are eligible for pruning inside traversal; unregistered columns remain valid in filter := ... but are verified during hydration.


-- Register numeric columns for filtering
SELECT graph.add_filter_column('invoices', 'amount', column_type := 'numeric');
SELECT graph.add_filter_column('users', 'risk_score', column_type := 'numeric');
SELECT graph.add_filter_column('transactions', 'created_at', column_type := 'timestamptz');

Parameters:

Parameter	Type	Default	Description
`table_name`	`regclass`	(required)	Table that owns the column
`column_name`	`text`	(required)	Column available to filter helpers
`column_type`	`text`	(required)	Filter type: `'text'`, `'numeric'`, `'boolean'`, `'date'`, `'timestamptz'`, or `'uuid'`

The implementation may use compact internal representations for hot-path pushdown, but the SQL contract is type-aware. Any range, precision, scale, or timezone limitations must be documented by column_type and surfaced as clear errors.

Sparse array warning: The FilterIndex allocates one u32 per node in the entire graph, not just nodes from the registered table. If you register invoices.amount in a graph with 10M nodes but only 500K invoices, the remaining 9.5M entries are zeroed. Only register filter columns on tables that represent a large fraction of your graph, or accept the RAM overhead (4 bytes × total node count per column).

Use for: Amounts, risk scores, timestamps, status, UUID ownership keys, and other columns you want to filter during traversal without the cost of hydrating every candidate row.

Do NOT confuse with weight columns. Filter columns prune BFS results ($O(1)$ check per node). Weight columns change the pathfinding algorithm from BFS to Dijkstra ($O(V + E \log V)$). These serve completely different mathematical purposes.

Dynamic SQL Safety Contract

Every SQL-facing identifier and value is untrusted input. regclass validates table references, but it does not validate column names, property keys, edge labels, JSONB filter payloads, or tenant columns.

Implementation requirements:

resolve table arguments to OIDs and re-check pg_catalog.pg_class / pg_catalog.pg_namespace;
validate column names against pg_catalog.pg_attribute for the resolved table OID before build, search, hydration, trigger, or verification SQL uses them;
quote identifiers with Postgres identifier APIs such as quote_identifier / quote_ident, never by manual string quoting;
pass values through SPI parameters or typed pgrx values; never interpolate seed_id, property_value, tenant values, JSONB filter values, or edge labels into SQL strings;
reject raw JSONB filters that reference unregistered columns, unsupported operators, invalid type casts, or unknown table/edge scopes before building any SQL;
schema-qualify internal references as graph.* or pg_catalog.* to avoid search_path surprises.

This contract applies to COPY queries during graph.build(), hydration, filter verification, trigger generation, search verification, and admin/catalog helpers.

Build Functions

`graph.build(concurrently)`

Build (or rebuild) the graph from all registered tables.


SELECT *
FROM graph.build(concurrently := false);
 
-- Returns:
-- build_id | status | nodes_loaded | edges_loaded | build_time_ms | memory_used_mb | sync_mode

What happens:

Estimates memory needed. Returns ERROR with HINT if it would exceed graph.memory_limit_mb.
Scans registered tables using graph.build_scan_mode. The default select mode uses SPI cursors; copy is reserved until a safe pgrx COPY reader is available.
Builds CSR EdgeStore, NodeStore, BloomFilter, TokenIndex.
Persists to $PGDATA/graph/graph.graph with atomic rename.
Installs sync triggers (if graph.sync_mode = 'trigger').

Safe to call while serving queries. Old graph continues serving until the new one is ready.

Concurrency: build() acquires an advisory lock (pg_advisory_xact_lock). Concurrent calls to build() or vacuum() from other sessions will block until the lock is released. Only one build/vacuum can run at a time.

Primary key handling: All PK values are cast to their ::text representation internally. integer PKs become '42', uuid PKs are stored in standard hyphenated format ('550e8400-e29b-41d4-a716-446655440000'). Composite primary keys are supported — tables with composite PKs store a JSON array string (e.g., '["order_123","4"]') generated via jsonb_build_array(). Users reference composite-PK nodes using this JSON array format as the seed_id in query functions.

concurrently := false is the default. The function blocks until the snapshot is built and installed, then returns one completed row.

concurrently := true queues or starts a background build and returns immediately with status = 'queued' or status = 'running'. Existing queries continue using the previous graph snapshot until the new graph is complete and installed atomically.

Capacity note: concurrent builds temporarily hold the old serving graph and the new graph being built. Plan for old mmap pages, new build-time structures, per-backend heap indexes, Postgres shared buffers, and OS headroom. As a practical starting point, provision at least 2x the base graph memory before using concurrent builds. See 04_memory_model.md.

`graph.vacuum()`

Rebuild the CSR from pending edge mutations.


SELECT graph.vacuum();

The background worker calls this automatically. Call manually after large bulk mutations.

Query Functions

`graph.traverse(...)`

The core traversal function. Traverse outward from one node or many starting nodes. Returns ordinary SQL rows and hydrates source records as JSONB by default.


graph.traverse(
    seed_table       REGCLASS,
    seed_id          TEXT,
    max_depth        INT DEFAULT 3,
    edge_types       TEXT[] DEFAULT NULL,
    direction        TEXT DEFAULT 'any',
    node_tables      REGCLASS[] DEFAULT NULL,
    filter           JSONB DEFAULT NULL,
    tenant           TEXT DEFAULT NULL,
    strategy         TEXT DEFAULT 'bfs',
    uniqueness       TEXT DEFAULT 'node_global',
    include_start    BOOLEAN DEFAULT true,
    hydrate          BOOLEAN DEFAULT true,
    limit            INT DEFAULT 1000,
    offset           INT DEFAULT 0,
    max_nodes        INT DEFAULT current_setting('graph.max_nodes')::int,
    max_frontier     INT DEFAULT current_setting('graph.max_frontier')::int
)
RETURNS TABLE (
    root_table  REGCLASS,
    root_id     TEXT,
    node_table  REGCLASS,
    node_id     TEXT,
    depth       INT,
    path        JSONB,
    edge_path   JSONB,
    node        JSONB
)

Parameters:

Parameter	Type	Default	Description
`seed_table`	`regclass`	(required)	Table of the seed node
`seed_id`	`text`	(required)	Primary key of the seed node
`max_depth`	`int`	`3`	Max traversal depth
`edge_types`	`text[]`	`NULL`	Edge type filter. `NULL` = traverse all.
`direction`	`text`	`'any'`	`'any'` or `'out'`. `'in'` is reserved for future reverse-adjacency traversal.
`node_tables`	`regclass[]`	`NULL`	Only return nodes from these tables
`filter`	`jsonb`	`NULL`	Structured filter built by helpers such as `graph.equals()`, `graph.greater_than()`, and `graph.all()`
`tenant`	`text`	`NULL`	Tenant/account/workspace scope
`strategy`	`text`	`'bfs'`	`'bfs'` or `'dfs'`. Use `graph.weighted_shortest_path()` for weighted pathfinding.
`uniqueness`	`text`	`'node_global'`	Duplicate visit policy
`include_start`	`boolean`	`true`	Include starting nodes at depth 0
`hydrate`	`boolean`	`true`	Return full source rows as JSONB
`limit`	`int`	`1000`	Page size after traversal-time filtering
`offset`	`int`	`0`	Pagination offset after traversal-time filtering
`max_nodes`	`int`	GUC	Query circuit breaker
`max_frontier`	`int`	GUC	Frontier circuit breaker


-- Preferred: helper-built structured filter
SELECT *
FROM graph.traverse(
    'users',
    'U-123',
    max_depth := 5,
    edge_types := ARRAY['transferred_to'],
    filter := graph.all(
        graph.greater_than('amount', 10000),
        graph.at_least('risk_score', 80)
    )
);
 
-- Advanced/generated-query form
SELECT *
FROM graph.traverse(
    'users',
    'U-123',
    max_depth := 5,
    filter := '{"node":{"where":{"jurisdiction":{"eq":"Panama"}}}}'::jsonb
);

String filters such as filter_condition := 'amount > 10000' are not part of the public traversal API. They are hard to validate safely and too close to dynamic SQL. Public callers use helper functions that return structured JSONB.

Additional GUC-controlled limits (circuit breakers):

graph.max_nodes — max nodes visited per query (default 100,000)
graph.max_frontier — max frontier size (default 100,000)

Pagination execution order:


BFS / traversal strategy
  -> traversal-time pushdown filters
  -> deterministic traversal result ordering
  -> OFFSET / LIMIT
  -> hydration of the final page
  -> hydration-time verification for unregistered filters

OFFSET is applied before hydration, so deep offsets do not hydrate skipped rows. Filters on unregistered columns are verified during hydration and may return fewer than limit rows on a page, just as an SRF followed by an outer WHERE clause can produce a partially filled page. Register frequently used filter columns when stable page density matters.

Planner note: traversal functions are SRFs with data-dependent cardinality. The extension should declare conservative COST and ROWS values, but callers should materialize large coordinate result sets before joining to wide application tables when the planner chooses a poor nested-loop plan:


WITH nearby AS MATERIALIZED (
    SELECT *
    FROM graph.traverse('users', 'U-123', max_depth := 5, hydrate := false)
)
SELECT u.*
FROM nearby g
JOIN users u
  ON g.node_table = 'users'::regclass
 AND u.id::text = g.node_id;

For very large hydration jobs, insert traversal coordinates into a temporary table, ANALYZE it, then join to source tables so the planner has real statistics.

ORM integration notes:

Prisma, Hibernate, and SQLAlchemy users should call graph functions through parameterized native SQL, not by string-building table, column, or filter fragments.
Prefer named parameters for scalar values such as seed_id, tenant, limit, and offset.
Pass table names as explicit SQL casts such as $1::regclass only when the ORM/driver supports binding regclass cleanly; otherwise keep table names as static SQL fragments from trusted migration metadata.
For multi-start traversal, ORMs that cannot bind graph.node_ref[] should use graph.traverse_search() or populate a temporary table of (node_table regclass, node_id text) coordinates and call the coordinate ingestion API when available.
Hydrated node jsonb is the most ORM-friendly output for read models. Use hydrate := false only when the application intentionally joins coordinates back to typed ORM entities.

Usage Examples


-- Basic: everything connected to user U-123 within 4 hops
SELECT * FROM graph.traverse('users', 'U-123', 4);
 
-- Filtered: only follow 'transferred_to' and 'shares_device_with' edges
SELECT * FROM graph.traverse('users', 'U-123', 4,
    ARRAY['transferred_to', 'shares_device_with']);
 
-- Numeric filter: only traverse nodes where amount > 10,000
-- (requires graph.add_filter_column('invoices', 'amount', column_type := 'numeric') first)
SELECT * FROM graph.traverse('users', 'U-123', 5,
    filter := graph.greater_than('amount', 10000));
 
-- Combined: edge type filter + numeric filter
SELECT * FROM graph.traverse('users', 'U-123', 5,
    edge_types := ARRAY['transferred_to'],
    filter := graph.greater_than('risk_score', 80));
 
-- The hydration pattern: traverse then JOIN for full data
-- This is where Postgres RLS enforcement happens automatically
SELECT u.name, u.email, g.depth, g.path
FROM graph.traverse('users', 'U-123', 4) g
JOIN users u ON u.id = g.node_id
WHERE u.status = 'active';

Why `path JSONB` Is Critical

AI agents don’t just need to know that two entities are connected — they need the exact chain to explain why. The path column returns ordered node coordinates, and edge_path returns the edge labels/directions between them:


SELECT g.path
FROM graph.traverse('users', 'U-123', 4) g
WHERE g.node_id = 'C-789';
-- path: [
--   {"table":"users","id":"U-123"},
--   {"table":"transfers","id":"T-456"},
--   {"table":"accounts","id":"A-111"},
--   {"table":"companies","id":"C-789"}
-- ]
-- The AI agent can now explain: "U-123 transferred to T-456, 
-- which shares an account with A-111, who is director of C-789"

edge_path is aligned to the gaps between path entries:


path[0] --edge_path[0]--> path[1] --edge_path[1]--> path[2]

So a depth-2 row with edge_path = ["officer_of", "officer_of"] is not a duplicate relationship. It means the selected node is two hops from the root, and both hops used the officer_of relationship type. Keep the full edge_path when you need to explain the route; add a separate distinct summary only for display.


SELECT depth,
       node_table::regclass::text AS source_table,
       node_id,
       graph.format_path(path, edge_path) AS readable_path,
       path,
       edge_path,
       node->>'name' AS name,
       node
FROM graph.traverse_search(
    'name', 'mossack fonseca',
    search_mode := 'contains',
    search_limit := 1,
    max_depth := 4,
    hydrate := true
)
ORDER BY depth, source_table, node_id;

For SQL table viewers that render JSON objects as [object Object], select only the display helper:


SELECT depth,
       node_table::regclass::text AS source_table,
       node_id,
       graph.format_path(path, edge_path) AS readable_path,
       node->>'name' AS name,
       node
FROM graph.traverse_search(
    'name', 'mossack fonseca',
    search_mode := 'contains',
    search_limit := 1,
    max_depth := 4,
    hydrate := true
)
ORDER BY depth, source_table, node_id;

`graph.format_path(path, edge_path, separator)`

Format traversal coordinates and edge labels as readable hop text. This is a presentation helper; path and edge_path remain the structured API.


SELECT graph.format_path(
    '[{"table":"entities","id":"10000266"},
      {"table":"officers","id":"12160432"},
      {"table":"entities","id":"10000001"}]'::jsonb,
    '["officer_of", "officer_of"]'::jsonb
);
 
-- entities:10000266 --officer_of--> officers:12160432 |
-- officers:12160432 --officer_of--> entities:10000001

Parameter	Type	Default	Description
`path`	`jsonb`	(required)	JSONB array from `graph.traverse()` or `graph.traverse_search()`
`edge_path`	`jsonb`	(required)	JSONB array of labels aligned between path entries
`separator`	`text`	`'	'`

`graph.shortest_path(source_table, source_id, target_table, target_id, max_depth, hydrate)`

Find the shortest path between two specific nodes using bidirectional BFS.


-- Signature
graph.shortest_path(
    source_table  REGCLASS,
    source_id     TEXT,
    target_table  REGCLASS,
    target_id     TEXT,
    max_depth     INT DEFAULT 20,
    hydrate       BOOLEAN DEFAULT true
)
RETURNS TABLE (
    step          INT,              -- Position in path (0 = source)
    node_table    REGCLASS,
    node_id       TEXT,
    edge_label    TEXT,             -- Edge that reached this node (NULL for source)
    node          JSONB             -- Hydrated source row when hydrate = true
)


-- Find how user U-123 is connected to company C-456
SELECT step, node_table::regclass::text AS table_name, node_id, edge_label, node
FROM graph.shortest_path('users', 'U-123', 'companies', 'C-456');
 
-- Return coordinates only
SELECT step, node_table::regclass::text AS table_name, node_id, edge_label
FROM graph.shortest_path(
    'users', 'U-123', 'companies', 'C-456',
    hydrate := false
)
ORDER BY step;

If no path exists: Returns empty result set (not an error).

Bidirectional BFS: Expands from both source and target simultaneously. Meets in the middle. Typically 2× faster than single-direction BFS for shortest path.

`graph.search(...)`

Find nodes by property value and return verified, hydrated results by default.


graph.search(
    property_key           TEXT,
    property_value         TEXT,
    table_filter           REGCLASS DEFAULT NULL,
    mode                   TEXT DEFAULT 'contains',
    case_sensitive         BOOLEAN DEFAULT false,
    limit                  INT DEFAULT 100,
    offset                 INT DEFAULT 0,
    tenant                 TEXT DEFAULT NULL,
    hydrate                BOOLEAN DEFAULT true,
    allow_bloom_candidates BOOLEAN DEFAULT false
)
RETURNS TABLE (
    node_table REGCLASS,
    node_id    TEXT,
    match_type TEXT,
    score      REAL,
    verified   BOOLEAN,
    node       JSONB
)

Parameters:

Parameter	Type	Default	Description
`property_key`	`text`	(required)	Property column name to search
`property_value`	`text`	(required)	Value to match
`table_filter`	`regclass`	`NULL`	Restrict to a specific table. `NULL` = search all.
`mode`	`text`	`'contains'`	`'contains'`, `'prefix'`, `'exact'`, or `'token'`
`case_sensitive`	`boolean`	`false`	Use case-sensitive matching when supported
`limit`	`int`	`100`	Page size
`offset`	`int`	`0`	Search result offset
`tenant`	`text`	`NULL`	Tenant/account/workspace scope
`hydrate`	`boolean`	`true`	Return full source rows as JSONB
`allow_bloom_candidates`	`boolean`	`false`	Opt into lossy Bloom candidate rows

limit and offset are reserved SQL grammar words. Use positional arguments or quote them as named arguments, for example "limit" := 20.

Return columns:

Column	Type	Description
`node_table`	`regclass`	Table the matched node belongs to
`node_id`	`text`	Primary key of the matched node
`match_type`	`text`	Search mode or `'bloom_candidate'`
`score`	`real`	Mode-specific score. Current non-fuzzy modes return `1.0`
`verified`	`boolean`	Whether the source row matched during verification
`node`	`jsonb`	Hydrated source row when `hydrate := true` and verification succeeds

Behaviour:

Default search is case-insensitive contains.
Search results are verified against the current source row before being returned in the developer-facing API.
Bloom candidates are disabled unless allow_bloom_candidates := true.
Bloom candidates are labeled match_type = 'bloom_candidate' and verified = false unless hydration verification succeeds.
Zero results: Returns an empty result set (not an error).

Search modes:

Mode	Description
`contains`	Case-insensitive substring search by default
`prefix`	Prefix search, useful for autocomplete
`exact`	Full-value exact match after configured case normalization
`token`	Tokenized word search, useful for names with reordered parts


-- Default verified hydrated search
SELECT node_id, node->>'name' AS name
FROM graph.search('name', 'mossack fonseca', table_filter := 'entities');
 
-- Prefix search for autocomplete
SELECT node_id, node->>'name' AS name
FROM graph.search('name', 'mossack', mode := 'prefix', "limit" := 20);
 
-- Combine with traversal: search → traverse pipeline
SELECT t.*
FROM graph.search('name', 'Jonathan Chan') s
CROSS JOIN LATERAL graph.traverse(s.node_table, s.node_id, 5) t
WHERE s.verified;

Native Type Semantics

Graph coordinates are stored as text, but comparisons and filter pushdown must preserve Postgres type semantics.

Type	Canonical behavior
`uuid`	Store and compare canonical lowercase hyphenated `uuid::text`. Accept ORM-bound UUID parameters through Postgres casts.
`timestamptz`	Compare instants, not display strings. Normalize through Postgres `timestamptz`; hydration may render offsets.
`date`	Compare date values as dates, not midnight timestamps.
`numeric`	Preserve numeric precision for hydration-time verification. Hot-path pushdown must document any scale/range limits.
`boolean`	Accept true boolean values, not arbitrary text aliases in raw JSONB.
`jsonb`	Hydration returns `to_jsonb(source_row.*)`. JSONB columns remain nested JSON values inside `node`; traversal filters over JSONB require registered/generated columns or explicit JSON path helpers.
arrays	Hydrated as JSON arrays. Search/filter semantics are not implicit.
`NULL`	Node identity columns must be non-null. Search and filters do not match `NULL` unless a dedicated `is_null` helper is added.

Admin Functions

`graph.status()`

Return current engine state.


graph.status()
RETURNS TABLE (
    node_count       INT,                -- Total nodes in graph
    edge_count       INT,                -- Total edges in graph
    memory_used_mb   FLOAT,              -- Current memory usage
    memory_limit_mb  INT,                -- graph.memory_limit_mb GUC value
    sync_mode        TEXT,               -- 'manual', 'trigger', or 'wal'
    sync_status      TEXT,               -- 'idle', 'syncing', 'read_only'
    last_build       TIMESTAMPTZ,        -- When build() last completed
    last_vacuum      TIMESTAMPTZ,        -- When vacuum() last completed
    edge_types       TEXT[],             -- List of registered edge labels
    edge_buffer_used INT,                -- Pending mutations in edge buffer
    has_unidirectional_edges BOOL,       -- Global flag for shortest_path strategy
 
    -- Full-sync status fields
    schema_status    TEXT,               -- 'current', 'stale', or 'invalid'
    sync_lag         BIGINT,             -- Pending row changes not yet applied
    pending_edge_deltas INT,             -- Pending relationship mutations
    needs_vacuum     BOOL,               -- Edge deltas should be merged into CSR
    needs_rebuild    BOOL                -- Graph must be rebuilt before safe use
)


SELECT * FROM graph.status();

`graph.reset()`

Drop the in-memory graph and remove persisted files.


SELECT graph.reset();

`graph.remove_table(table_name)` / `graph.remove_edge(label)`

Unregister. Requires graph.build() to take effect.


SELECT graph.remove_table('customers');
SELECT graph.remove_edge('ordered_by');

ACL Pre-Flight Check

Every query function performs an O(1) ACL check before touching the graph engine. This ensures users cannot traverse tables they don’t have SELECT access to.


// Pseudocode for every graph query function:
fn check_acl(table_oid: Oid) {
    let user_oid = pg_sys::GetUserId();
    let result = pg_sys::pg_class_aclcheck(table_oid, user_oid, ACL_SELECT);
    if result != ACLCHECK_OK {
        return Err(GraphError::AclDenied {
            table: table_oid.to_string(),
        });
    }
    Ok(())
}

This check happens for:

The seed table in graph.traverse()
Both source and target tables in graph.shortest_path()
The table_filter in graph.search() when specified
All registered tables that own the searched property when graph.search() has no table_filter

Note: The ACL check covers the seed table. Nodes discovered during traversal may belong to tables the user doesn’t have access to — but those nodes are only useful when the user JOINs back to the source table, at which point Postgres enforces RLS and table-level permissions. See 09_security.md for the “ghost traversal” discussion.

GUC Parameters

All configurable via postgresql.conf, ALTER SYSTEM, or SET:


# Core
graph.enabled = on                       # Master kill switch
graph.data_dir = 'graph'               # Subdirectory of $PGDATA
graph.memory_limit_mb = 2048               # Hard memory limit (ERROR, not crash)
 
# Sync
graph.sync_mode = 'trigger'              # 'manual', 'trigger', or 'wal'
graph.vacuum_interval_secs = 60          # Auto-vacuum period
graph.edge_buffer_size = 100000          # Max pending edge mutations
 
# Query defaults
graph.default_max_depth = 5              # Default BFS depth
graph.max_nodes = 100000         # Default circuit breaker
graph.max_frontier = 100000      # Default frontier limit
 
# Indexing
graph.max_token_length = 128             # Max property value length for TokenIndex
graph.bloom_bits = 64                    # Bloom signature width (read-only, fixed at 64-bit u64)
graph.bloom_hash_count = 1              # Number of hash functions per token
 
# Memory management
graph.auto_load = on              # Load graph into RAM on server start
graph.oom_action = 'error'               # 'error' (return SQL error) or 'readonly' (degrade)
graph.build_scan_mode = 'select'         # 'select'; 'copy' reserved until safe pgrx support

Error Codes

graph uses custom SQLSTATE codes so applications can programmatically handle errors rather than parsing message strings.

SQLSTATE	Name	Raised By	Description
`PG001`	`graph_oom`	`build()`, `vacuum()`	Memory limit exceeded. HINT: increase `graph.memory_limit_mb`.
`PG002`	`graph_acl_denied`	`traverse()`, `shortest_path()`, `search()`, global analytics	User lacks `SELECT` on the specified table or lacks graph schema admin privileges for global analytics.
`PG003`	`graph_not_built`	All query functions	Graph not built yet. Call `graph.build()` first.
`PG004`	`graph_edge_type_limit`	`add_edge()`	254 edge type limit exceeded.
`PG005`	`graph_invalid_filter`	`traverse()`	Invalid structured filter payload or unsupported filter helper/operator.
`PG006`	`graph_build_locked`	`build()`, `vacuum()`	Another `build()` or `vacuum()` is already in progress.
`PG007`	(reserved)	—	Reserved for future use.
`PG008`	`graph_edge_buffer_full`	Trigger sync	Edge mutation buffer is full. Graph in read-only mode.
`PG009`	`graph_corrupt_file`	Startup loader	`.graph` file failed magic/checksum validation.
`PG010`	`graph_node_not_found`	`traverse()`, `shortest_path()`, `weighted_shortest_path()`	Seed/source/target node not found in graph.

All errors include a DETAIL field with context and a HINT field with the recommended fix.

Complete Example: Fraud Investigation


-- 1. Install and auto-discover
CREATE EXTENSION graph;
SELECT * FROM graph.auto_discover();
 
-- 2. An AI agent asks: "Find all active users connected to Fraudster U-123 
--    within 4 hops via money transfers or shared devices"
SELECT u.name, u.email, u.risk_score, g.depth, g.path
FROM graph.traverse('users', 'U-123', 4, 
    ARRAY['transferred_to', 'shares_device_with']) g
JOIN users u ON u.id = g.node_id
WHERE u.status = 'active'
  AND g.node_table = 'users'::regclass
ORDER BY g.depth, u.risk_score DESC;
 
-- 3. Agent found a suspicious user U-999. Show the exact connection chain.
SELECT p.step,
       p.node_table::regclass::text AS table_name,
       p.node_id,
       p.edge_label,
       p.node
FROM graph.shortest_path('users', 'U-123', 'users', 'U-999') p
ORDER BY p.step;
 
-- 4. Batch screening: check 1,000 entities against a sanctions list
WITH suspects AS (
    SELECT id::text AS pk FROM users ORDER BY random() LIMIT 1000
)
SELECT s.pk, count(g.node_id) AS connections
FROM suspects s
CROSS JOIN LATERAL graph.traverse('users', s.pk, 5, 
    ARRAY['transferred_to']) g
WHERE g.node_table = 'sanctioned_entities'::regclass
GROUP BY s.pk
HAVING count(g.node_id) > 0;

Weighted Shortest Path

When an edge is registered with a weight_column, the weighted path functions become available:


-- Register an edge with a weight column
SELECT graph.add_edge('invoices', 'customer_id', 'customers', 'id', 'billed_to',
    weight_column := 'amount'
);
 
-- Weighted shortest path (Dijkstra, O(V + E log V))
graph.weighted_shortest_path(
    source_table REGCLASS, source_id TEXT,
    target_table REGCLASS, target_id TEXT
)
RETURNS TABLE (path_nodes TEXT[], total_cost INT)

Dijkstra uses a BinaryHeap (min-priority queue) over the same CSR arrays with an added parallel weights: Vec<u32> array. See 01_architecture.md for implementation details.

SQL API Reference

Overview

The “Aha!” Moment

Discovery Functions

graph.auto_discover(schema_name DEFAULT 'public')

Setup Functions

graph.add_table(table_name, id_columns, columns, tenant_column)

graph.add_edge(from_table, from_column, to_table, to_column, label, ...)

graph.add_filter_column(table_name, column_name, column_type)

Dynamic SQL Safety Contract

Build Functions

graph.build(concurrently)

graph.vacuum()

Query Functions

graph.traverse(...)

Usage Examples

Why path JSONB Is Critical

graph.format_path(path, edge_path, separator)

graph.shortest_path(source_table, source_id, target_table, target_id, max_depth, hydrate)

graph.search(...)

Native Type Semantics

Admin Functions

graph.status()

graph.reset()

graph.remove_table(table_name) / graph.remove_edge(label)

ACL Pre-Flight Check

GUC Parameters

Error Codes

Complete Example: Fraud Investigation

Weighted Shortest Path

`graph.auto_discover(schema_name DEFAULT 'public')`

`graph.add_table(table_name, id_columns, columns, tenant_column)`

`graph.add_edge(from_table, from_column, to_table, to_column, label, ...)`

`graph.add_filter_column(table_name, column_name, column_type)`

`graph.build(concurrently)`

`graph.vacuum()`

`graph.traverse(...)`

Why `path JSONB` Is Critical

`graph.format_path(path, edge_path, separator)`

`graph.shortest_path(source_table, source_id, target_table, target_id, max_depth, hydrate)`

`graph.search(...)`

`graph.status()`

`graph.reset()`

`graph.remove_table(table_name)` / `graph.remove_edge(label)`