Benchmarking Strategy

Performance Targets (V1 Spec Estimates)

Operation	Target (2M nodes)	Target (50M nodes)	Measurement
Entity search (TokenIndex)	< 200 μs	< 1 ms	p50 latency
5-hop BFS (no filter)	< 1 ms	< 15 ms	p50 latency
5-hop BFS (Bloom filter)	< 2 ms	< 20 ms	p50 latency
7-hop path finding	< 5 ms	< 30 ms	p50 latency
graph.build() initial load	< 30s	< 5 min	Wall clock
Trigger sync overhead	< 20 μs/row	< 20 μs/row	Per-statement
Vacuum cycle	< 2s	< 30s	Wall clock
Crash recovery (mmap reload)	< 1s	< 15s	Time to first query

Benchmark Suite

Layer 1: Rust Unit Benchmarks (criterion.rs)

These run the graph engine directly in Rust — no SQL, no Postgres overhead. They measure the raw data structure performance.


benches/
├── bfs_bench.rs            # BFS traversal at various scales
├── bloom_bench.rs          # Bloom filter compute + check
├── token_index_bench.rs    # TokenIndex insert + query
├── csr_build_bench.rs      # CSR construction from adjacency lists
└── graph_gen.rs            # Deterministic graph generators

Key benchmarks:

Benchmark	What it measures	Graph size
`bfs_5hop_100k`	BFS latency at small scale	100K nodes
`bfs_5hop_1m`	BFS latency at medium scale	1M nodes
`bfs_5hop_10m`	BFS latency at large scale	10M nodes
`bfs_5hop_bloom`	BFS with Bloom pre-filter	1M nodes
`bfs_5hop_token`	BFS with TokenIndex filter	1M nodes
`search_token_1m`	TokenIndex search latency	1M nodes
`bloom_compute`	Bloom signature computation	1K properties
`csr_build_1m`	CSR construction time	1M nodes, 20M edges


# Run all benchmarks
cargo bench
 
# Run specific benchmark
cargo bench --bench bfs_bench
 
# Generate HTML report
cargo bench -- --output-format=json > bench_results.json

Layer 2: SQL Integration Benchmarks (pgbench + custom)

These measure end-to-end performance through the SQL interface, including pgrx overhead, Postgres process context switching, and result serialisation.


-- Benchmark setup: load Panama Papers dataset
SELECT graph.add_table('nodes', id_columns := ARRAY['node_id'],
    columns := ARRAY['name', 'jurisdiction', 'country_codes']);
SELECT graph.add_edge('edges', 'node_id_start', 'nodes', 'node_id', 'related_to');
SELECT graph.build();
 
-- Benchmark 1: Single traversal
\timing on
SELECT count(*) FROM graph.traverse('nodes', '12345', 5);
-- Expected: < 5ms
 
-- Benchmark 2: Batch screening (1000 entities)
WITH entities AS (
    SELECT node_id::text AS pk
    FROM nodes
    ORDER BY random()
    LIMIT 1000
)
SELECT e.pk, count(t.node_id)
FROM entities e
CROSS JOIN LATERAL graph.traverse('nodes', e.pk, 5) t
GROUP BY e.pk;
-- Expected: < 200ms for 1000 entities
 
-- Benchmark 3: Search + traverse pipeline
SELECT count(*)
FROM graph.search('name', 'Jonathan Chan') s
CROSS JOIN LATERAL graph.traverse('nodes', s.node_id, 5) t;
-- Expected: < 10ms

Layer 3: Comparative Benchmarks

Side-by-side comparison against:

Postgres recursive CTE (native, no extension)
Apache AGE (open-source graph extension)
Neo4j (external graph database)


-- Benchmark: Postgres recursive CTE equivalent
WITH RECURSIVE graph_walk AS (
    SELECT id, 0 AS depth
    FROM nodes
    WHERE id = 12345
    
    UNION ALL
    
    SELECT e.target_id, gw.depth + 1
    FROM graph_walk gw
    JOIN edges e ON e.source_id = gw.id
    WHERE gw.depth < 5
)
SELECT count(DISTINCT id) FROM graph_walk;
 
-- Compare timing against:
SELECT count(*) FROM graph.traverse('nodes', '12345', 5);

Scaling Benchmarks

Validate performance scaling from small to max capacity:

Scale	Nodes	Edges	Expected BFS p50 (5-hop)	Expected Build Time
Tiny	10K	200K	< 0.1 ms	< 1s
Small	100K	2M	< 0.5 ms	< 3s
Medium	1M	20M	< 2 ms	< 15s
Large	10M	200M	< 8 ms	< 2 min
Very large	50M	1B	< 15 ms	< 5 min

Graph Generator

All benchmarks use a deterministic power-law graph generator:


fn build_benchmark_graph(node_count: u32, avg_degree: u32, seed: u64) 
    -> (NodeStore, EdgeStore) 
{
    let mut rng = StdRng::seed_from_u64(seed);
    // Power-law degree distribution: most nodes have few edges,
    // some supernodes have many. Realistic for large relational datasets.
    // ...
}

Seed = 42 for all published benchmarks. Anyone can reproduce exact results.

Metrics Collected

For every benchmark run, record:


{
    "benchmark": "bfs_5hop_1m_bloom",
    "timestamp": "2025-01-01T00:00:00Z",  // Example benchmark run time
    "hardware": {
        "cpu": "AMD EPYC 7R32",
        "cores": 16,
        "ram_gb": 64,
        "storage": "NVMe"
    },
    "postgres_version": "17.2",
    "graph": {
        "nodes": 1000000,
        "edges": 20000000,
        "avg_degree": 20,
        "generator_seed": 42
    },
    "results": {
        "p50_us": 1200,
        "p95_us": 2400,
        "p99_us": 4100,
        "max_us": 8300,
        "iterations": 10000,
        "throughput_qps": 8300
    }
}

Continuous Benchmarking

Every PR must include benchmark results for:

bfs_5hop_1m — regression detection for the hot loop.
search_token_1m — regression detection for TokenIndex.
csr_build_1m — regression detection for build performance.

Benchmark results are committed to benches/results/ for historical tracking.

A GitHub Actions workflow runs cargo bench on a standardised machine and fails the PR if any benchmark regresses by >10%.