Skip to Content
Benchmarking

Benchmarking Strategy

Performance Targets (V1 Spec Estimates)

OperationTarget (2M nodes)Target (50M nodes)Measurement
Entity search (TokenIndex)< 200 μs< 1 msp50 latency
5-hop BFS (no filter)< 1 ms< 15 msp50 latency
5-hop BFS (Bloom filter)< 2 ms< 20 msp50 latency
7-hop path finding< 5 ms< 30 msp50 latency
graph.build() initial load< 30s< 5 minWall clock
Trigger sync overhead< 20 μs/row< 20 μs/rowPer-statement
Vacuum cycle< 2s< 30sWall clock
Crash recovery (mmap reload)< 1s< 15sTime to first query

Benchmark Suite

Layer 1: Rust Unit Benchmarks (criterion.rs)

These run the graph engine directly in Rust — no SQL, no Postgres overhead. They measure the raw data structure performance.

benches/ ├── bfs_bench.rs # BFS traversal at various scales ├── bloom_bench.rs # Bloom filter compute + check ├── token_index_bench.rs # TokenIndex insert + query ├── csr_build_bench.rs # CSR construction from adjacency lists └── graph_gen.rs # Deterministic graph generators

Key benchmarks:

BenchmarkWhat it measuresGraph size
bfs_5hop_100kBFS latency at small scale100K nodes
bfs_5hop_1mBFS latency at medium scale1M nodes
bfs_5hop_10mBFS latency at large scale10M nodes
bfs_5hop_bloomBFS with Bloom pre-filter1M nodes
bfs_5hop_tokenBFS with TokenIndex filter1M nodes
search_token_1mTokenIndex search latency1M nodes
bloom_computeBloom signature computation1K properties
csr_build_1mCSR construction time1M nodes, 20M edges
# Run all benchmarks cargo bench # Run specific benchmark cargo bench --bench bfs_bench # Generate HTML report cargo bench -- --output-format=json > bench_results.json

Layer 2: SQL Integration Benchmarks (pgbench + custom)

These measure end-to-end performance through the SQL interface, including pgrx overhead, Postgres process context switching, and result serialisation.

-- Benchmark setup: load Panama Papers dataset SELECT graph.add_table('nodes', id_columns := ARRAY['node_id'], columns := ARRAY['name', 'jurisdiction', 'country_codes']); SELECT graph.add_edge('edges', 'node_id_start', 'nodes', 'node_id', 'related_to'); SELECT graph.build(); -- Benchmark 1: Single traversal \timing on SELECT count(*) FROM graph.traverse('nodes', '12345', 5); -- Expected: < 5ms -- Benchmark 2: Batch screening (1000 entities) WITH entities AS ( SELECT node_id::text AS pk FROM nodes ORDER BY random() LIMIT 1000 ) SELECT e.pk, count(t.node_id) FROM entities e CROSS JOIN LATERAL graph.traverse('nodes', e.pk, 5) t GROUP BY e.pk; -- Expected: < 200ms for 1000 entities -- Benchmark 3: Search + traverse pipeline SELECT count(*) FROM graph.search('name', 'Jonathan Chan') s CROSS JOIN LATERAL graph.traverse('nodes', s.node_id, 5) t; -- Expected: < 10ms

Layer 3: Comparative Benchmarks

Side-by-side comparison against:

  1. Postgres recursive CTE (native, no extension)
  2. Apache AGE (open-source graph extension)
  3. Neo4j (external graph database)
-- Benchmark: Postgres recursive CTE equivalent WITH RECURSIVE graph_walk AS ( SELECT id, 0 AS depth FROM nodes WHERE id = 12345 UNION ALL SELECT e.target_id, gw.depth + 1 FROM graph_walk gw JOIN edges e ON e.source_id = gw.id WHERE gw.depth < 5 ) SELECT count(DISTINCT id) FROM graph_walk; -- Compare timing against: SELECT count(*) FROM graph.traverse('nodes', '12345', 5);

Scaling Benchmarks

Validate performance scaling from small to max capacity:

ScaleNodesEdgesExpected BFS p50 (5-hop)Expected Build Time
Tiny10K200K< 0.1 ms< 1s
Small100K2M< 0.5 ms< 3s
Medium1M20M< 2 ms< 15s
Large10M200M< 8 ms< 2 min
Very large50M1B< 15 ms< 5 min

Graph Generator

All benchmarks use a deterministic power-law graph generator:

fn build_benchmark_graph(node_count: u32, avg_degree: u32, seed: u64) -> (NodeStore, EdgeStore) { let mut rng = StdRng::seed_from_u64(seed); // Power-law degree distribution: most nodes have few edges, // some supernodes have many. Realistic for large relational datasets. // ... }

Seed = 42 for all published benchmarks. Anyone can reproduce exact results.


Metrics Collected

For every benchmark run, record:

{ "benchmark": "bfs_5hop_1m_bloom", "timestamp": "2025-01-01T00:00:00Z", // Example benchmark run time "hardware": { "cpu": "AMD EPYC 7R32", "cores": 16, "ram_gb": 64, "storage": "NVMe" }, "postgres_version": "17.2", "graph": { "nodes": 1000000, "edges": 20000000, "avg_degree": 20, "generator_seed": 42 }, "results": { "p50_us": 1200, "p95_us": 2400, "p99_us": 4100, "max_us": 8300, "iterations": 10000, "throughput_qps": 8300 } }

Continuous Benchmarking

Every PR must include benchmark results for:

  1. bfs_5hop_1m — regression detection for the hot loop.
  2. search_token_1m — regression detection for TokenIndex.
  3. csr_build_1m — regression detection for build performance.

Benchmark results are committed to benches/results/ for historical tracking.

A GitHub Actions workflow runs cargo bench on a standardised machine and fails the PR if any benchmark regresses by >10%.

Last updated on