Security Model

Design Principle

graph stores topology, not data. Your security policies stay in Postgres.

The graph index contains:

Node indices (u32)
Edge connections (source → target, type_id)
Bloom signatures (lossy, non-reconstructable)
TokenIndex tokens (property key:value pairs, ≤ graph.max_token_length chars)
Node metadata (source_table OID, source_pk)

It does NOT contain:

Full row payloads
Sensitive field values (beyond indexed short properties)
Passwords, PII beyond what you explicitly register

Security Layers

graph implements security at three layers, each providing a different guarantee:


Layer 1: ACL Pre-Flight Check (BEFORE BFS)
   → "Can this user even START a traversal on this table?"
   → O(1), zero cost, enforced by graph

Layer 2: Hydration-Time RLS (AFTER BFS, DURING JOIN)
   → "Can this user SEE the payload of each discovered node?"
   → Enforced by Postgres (standard RLS policies)

Layer 3: Tenant-Scoped Traversal (OPTIONAL, DURING BFS)
   → "Should the BFS even VISIT nodes from other tenants?"
   → Prevents topology leakage across tenant boundaries

Layer 1: ACL Pre-Flight Check

Before dropping into the Rust BFS hot loop, every graph query function verifies that the calling user has SELECT permission on the seed table. This uses Postgres’s internal C function pg_class_aclcheck() — the same function Postgres uses for its own permission checks.


// Executed BEFORE any graph traversal begins
let user_oid = pgrx::pg_sys::GetUserId();
 
let acl_result = pgrx::pg_sys::pg_class_aclcheck(
    seed_table_oid,
    user_oid,
    pgrx::pg_sys::ACL_SELECT
);
 
if acl_result != pgrx::pg_sys::ACLCHECK_OK {
    return Err(GraphError::AclDenied {
        table: table_name.to_string(),
    });
}

Where ACL Checks Happen

Function	Tables Checked
`graph.traverse()`	Seed table
`graph.shortest_path()`	Both source and target tables
`graph.search()`	table_filter (if specified)
`graph.build()`	All registered tables (during build)

Cost: O(1) per check. Effectively zero microseconds. This check happens once per function call, not per node visited.

Layer 2: Hydration-Time RLS (Default Security)

The traversal function returns (node_table REGCLASS, node_id TEXT) pairs. When the user JOINs back to the source tables, Postgres RLS policies apply automatically:


-- RLS policy on users table
CREATE POLICY tenant_isolation ON users
    USING (organisation_id = current_setting('app.tenant_id'));
 
-- Traversal returns ALL connected node IDs (fast)
-- The JOIN enforces RLS (secure)
SELECT u.name, u.email, g.depth
FROM graph.traverse('users', 'U-123', 4) g
JOIN users u ON u.id = g.node_id
WHERE g.node_table = 'users'::regclass;
 
-- If the user doesn't have SELECT on `users`, the JOIN returns zero rows.
-- If RLS hides certain users, they are filtered out automatically.
-- graph doesn't need to know anything about your security policies.

This is the recommended pattern and the most Postgres-native approach. The traversal is fast (no security checks in the BFS loop), and RLS enforcement happens during hydration — exactly how Postgres is designed to work.

Layer 3: Tenant-Scoped Traversal (Optional)

For multi-tenant applications where you want to prevent topology leakage (user A shouldn’t even know that a node in tenant B exists), graph supports tenant-level filtering during BFS.

When a tenant_column is registered with graph.add_table(), the engine builds per-tenant bitmaps in the TokenIndex during build(). At query time, graph reads the current tenant context from a Postgres session variable (e.g., current_setting('app.tenant_id')) and restricts BFS expansion to nodes matching that tenant. Within that registered tenant scope, nodes outside the tenant are not expanded or returned by traversal/search. This is a topology-leakage mitigation, not a blanket zero-leakage guarantee for every deployment or artifact.


-- 1. Register table with tenant column
SELECT graph.add_table('users', id_columns := ARRAY['id'],
    columns := ARRAY['name', 'email'],
    tenant_column := 'organisation_id'
);
SELECT graph.build();
 
-- 2. Set tenant context (typically done by your app/middleware)
SET app.tenant_id = 'org_123';
 
-- 3. Traverse — only nodes belonging to org_123 are visited
SELECT * FROM graph.traverse('users', 'U-123', 5);

Note: Tenant isolation at the BFS level requires tenant_column to be registered on every table in the graph. Tables without a tenant_column are treated as shared/global and are always traversable.

The “Ghost Traversal” Problem

This is the one structural vulnerability you must understand.

The Problem

Consider this graph:


[Public Node A] ─────→ [Secret RLS Node B] ─────→ [Public Node C]

If a user starts at Node A and does a 2-hop traversal, the Rust BFS hot loop (which is blind to RLS) will traverse through Node B and return Node C.

When Postgres does the hydration JOIN:

The user sees Node A and Node C (they have access)
The user does NOT see the payload of Node B (RLS hides it)
However, by seeing that Node C was returned at depth 2, the user now knows that a relationship exists between A and C through some intermediate node

In highly classified environments, knowing the topology exists can be a security breach even without seeing the payload.

Impact Assessment

Environment	Risk Level	Recommended Approach
Standard OLTP	Low	Layer 2 (hydration RLS) is sufficient
Multi-tenant SaaS	Medium	Use Layer 3 (tenant-scoped traversal)
Classified / financial	High	Use Layer 3 + application-level path filtering

Mitigation Options

Tenant-scoped traversal (Layer 3): If tenants should never see each other’s nodes, register a tenant_column and always pass tenant_id. Nodes from other tenants are never expanded in BFS.
Post-traversal path validation: After traversal, the application can filter paths where any intermediate node belongs to a table the user can’t access:


-- Application-level path filtering
WITH graph AS (
    SELECT * FROM graph.traverse('users', 'U-123', 5)
)
SELECT g.*
FROM graph g
WHERE NOT EXISTS (
    -- Remove results whose path passes through restricted tables
    SELECT 1 FROM unnest(g.path) AS path_node
    WHERE NOT has_table_privilege(current_user, 
        (SELECT node_table FROM graph WHERE node_id = path_node LIMIT 1), 
        'SELECT')
);

The Honest Position

Layer 2 (hydration RLS) is often sufficient for ordinary application retrieval because payload visibility remains governed by Postgres. The ghost traversal problem matters when:

The existence of a relationship is itself classified
AND the user has access to nodes on both sides of a hidden node
AND the user can infer the hidden node’s identity from the path structure

Document this clearly to users and let them choose the appropriate security layer.

Data Safety

What the Extension Stores

Data	Where	Reconstructable?	Risk
Node indices (u32)	RAM + .graph file	No (meaningless without mapping)	None
Edge connections	RAM + .graph file	Topology only	Low
Bloom signatures (u64)	RAM + .graph file	No (lossy hash)	None
TokenIndex tokens	RAM + .graph file	⚠️ Yes (plaintext key:value)	Medium
source_table + source_pk	RAM + .graph file	⚠️ Yes (identifies source row)	Medium

TokenIndex Security

The TokenIndex stores plaintext key:value tokens for properties ≤ graph.max_token_length (default 128 characters). This means:

name:alice is stored in the index
status:active is stored in the index
A 5000-character description field is NOT stored (exceeds threshold, falls back to Bloom)

Mitigation options:

Exclude sensitive columns from columns when registering tables:


-- Don't index SSN, DOB, or salary
SELECT graph.add_table('employees', id_columns := ARRAY['id'],
    columns := ARRAY['name', 'department', 'status']
    -- SSN, date_of_birth, salary are NOT indexed
);

Use Bloom-only mode (future): A configuration option to disable TokenIndex entirely and use only Bloom filters for property filtering. Bloom signatures are non-reconstructable.

.graph File Security

The .graph file is stored in $PGDATA/graph/. It has the same file permissions as other Postgres data files (owned by the postgres user, mode 0600).

If an attacker gains access to $PGDATA, they already have access to your raw table data — the .graph file is the least of your concerns.

Extension Permissions

Functions

All graph functions live in the graph schema. Access requires USAGE on the schema:


-- Grant access to a role
GRANT USAGE ON SCHEMA graph TO app_role;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA graph TO app_role;
 
-- Revoke build/admin functions from non-admin roles
REVOKE EXECUTE ON FUNCTION graph.build() FROM app_role;
REVOKE EXECUTE ON FUNCTION graph.reset() FROM app_role;
REVOKE EXECUTE ON FUNCTION graph.vacuum() FROM app_role;

Recommended Role Setup


-- Admin role: can build, vacuum, register tables
CREATE ROLE graph_admin;
GRANT ALL ON SCHEMA graph TO graph_admin;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA graph TO graph_admin;
 
-- Query role: explicit whitelist of allowed functions
-- (Using explicit grants rather than GRANT ALL + REVOKE, because the REVOKE
-- pattern is fragile — adding new admin functions in future versions would
-- accidentally grant them to readers.)
CREATE ROLE graph_reader;
GRANT USAGE ON SCHEMA graph TO graph_reader;
GRANT EXECUTE ON FUNCTION graph.traverse(regclass, text, int, text[], text, regclass[], jsonb, text, text, text, boolean, boolean, int, int, int, int) TO graph_reader;
GRANT EXECUTE ON FUNCTION graph.shortest_path(regclass, text, regclass, text, int, boolean) TO graph_reader;
GRANT EXECUTE ON FUNCTION graph.weighted_shortest_path(regclass, text, regclass, text) TO graph_reader;
GRANT EXECUTE ON FUNCTION graph.search(text, text, regclass, text, boolean, int, int, text, boolean, boolean) TO graph_reader;
GRANT EXECUTE ON FUNCTION graph.status() TO graph_reader;

Threat Model

Threat	Mitigation
SQL injection via table parameters	`REGCLASS` type validates at call time, plus `pg_catalog` cross-check
SQL injection via column names, property keys, filter payloads, or tenant values	Validate identifiers against `pg_catalog`, quote identifiers with Postgres APIs, bind values through SPI parameters, reject unsupported JSONB filter payloads before SQL generation
User traverses table they can’t access	ACL pre-flight check (Layer 1) — O(1), before BFS
Denial of service via deep traversal	Circuit breakers: `max_nodes`, `max_depth`, `max_frontier`
Memory exhaustion via large graph	`graph.memory_limit_mb` hard cap + OOM guard (ERROR, not crash)
Cross-tenant topology leakage	Layer 3 tenant-scoped traversal (optional)
Ghost traversal (see above)	Documented, with mitigation options for each security level
Stale graph serving wrong results	`is_active` tombstoning, generation counters (future)
.graph file tampering	CRC32 checksum, magic bytes verification
Background worker crash	Auto-restart with 5-second backoff
Rust panic in extension	`catch_unwind` on all entry points → PG ERROR, not crash