Benchmarking Strategy
Performance Targets (V1 Spec Estimates)
| Operation | Target (2M nodes) | Target (50M nodes) | Measurement |
|---|---|---|---|
| Entity search (TokenIndex) | < 200 μs | < 1 ms | p50 latency |
| 5-hop BFS (no filter) | < 1 ms | < 15 ms | p50 latency |
| 5-hop BFS (Bloom filter) | < 2 ms | < 20 ms | p50 latency |
| 7-hop path finding | < 5 ms | < 30 ms | p50 latency |
| graph.build() initial load | < 30s | < 5 min | Wall clock |
| Trigger sync overhead | < 20 μs/row | < 20 μs/row | Per-statement |
| Vacuum cycle | < 2s | < 30s | Wall clock |
| Crash recovery (mmap reload) | < 1s | < 15s | Time to first query |
Benchmark Suite
Layer 1: Rust Unit Benchmarks (criterion.rs)
These run the graph engine directly in Rust — no SQL, no Postgres overhead. They measure the raw data structure performance.
benches/
├── bfs_bench.rs # BFS traversal at various scales
├── bloom_bench.rs # Bloom filter compute + check
├── token_index_bench.rs # TokenIndex insert + query
├── csr_build_bench.rs # CSR construction from adjacency lists
└── graph_gen.rs # Deterministic graph generatorsKey benchmarks:
| Benchmark | What it measures | Graph size |
|---|---|---|
bfs_5hop_100k | BFS latency at small scale | 100K nodes |
bfs_5hop_1m | BFS latency at medium scale | 1M nodes |
bfs_5hop_10m | BFS latency at large scale | 10M nodes |
bfs_5hop_bloom | BFS with Bloom pre-filter | 1M nodes |
bfs_5hop_token | BFS with TokenIndex filter | 1M nodes |
search_token_1m | TokenIndex search latency | 1M nodes |
bloom_compute | Bloom signature computation | 1K properties |
csr_build_1m | CSR construction time | 1M nodes, 20M edges |
# Run all benchmarks
cargo bench
# Run specific benchmark
cargo bench --bench bfs_bench
# Generate HTML report
cargo bench -- --output-format=json > bench_results.jsonLayer 2: SQL Integration Benchmarks (pgbench + custom)
These measure end-to-end performance through the SQL interface, including pgrx overhead, Postgres process context switching, and result serialisation.
-- Benchmark setup: load Panama Papers dataset
SELECT graph.add_table('nodes', id_columns := ARRAY['node_id'],
columns := ARRAY['name', 'jurisdiction', 'country_codes']);
SELECT graph.add_edge('edges', 'node_id_start', 'nodes', 'node_id', 'related_to');
SELECT graph.build();
-- Benchmark 1: Single traversal
\timing on
SELECT count(*) FROM graph.traverse('nodes', '12345', 5);
-- Expected: < 5ms
-- Benchmark 2: Batch screening (1000 entities)
WITH entities AS (
SELECT node_id::text AS pk
FROM nodes
ORDER BY random()
LIMIT 1000
)
SELECT e.pk, count(t.node_id)
FROM entities e
CROSS JOIN LATERAL graph.traverse('nodes', e.pk, 5) t
GROUP BY e.pk;
-- Expected: < 200ms for 1000 entities
-- Benchmark 3: Search + traverse pipeline
SELECT count(*)
FROM graph.search('name', 'Jonathan Chan') s
CROSS JOIN LATERAL graph.traverse('nodes', s.node_id, 5) t;
-- Expected: < 10msLayer 3: Comparative Benchmarks
Side-by-side comparison against:
- Postgres recursive CTE (native, no extension)
- Apache AGE (open-source graph extension)
- Neo4j (external graph database)
-- Benchmark: Postgres recursive CTE equivalent
WITH RECURSIVE graph_walk AS (
SELECT id, 0 AS depth
FROM nodes
WHERE id = 12345
UNION ALL
SELECT e.target_id, gw.depth + 1
FROM graph_walk gw
JOIN edges e ON e.source_id = gw.id
WHERE gw.depth < 5
)
SELECT count(DISTINCT id) FROM graph_walk;
-- Compare timing against:
SELECT count(*) FROM graph.traverse('nodes', '12345', 5);Scaling Benchmarks
Validate performance scaling from small to max capacity:
| Scale | Nodes | Edges | Expected BFS p50 (5-hop) | Expected Build Time |
|---|---|---|---|---|
| Tiny | 10K | 200K | < 0.1 ms | < 1s |
| Small | 100K | 2M | < 0.5 ms | < 3s |
| Medium | 1M | 20M | < 2 ms | < 15s |
| Large | 10M | 200M | < 8 ms | < 2 min |
| Very large | 50M | 1B | < 15 ms | < 5 min |
Graph Generator
All benchmarks use a deterministic power-law graph generator:
fn build_benchmark_graph(node_count: u32, avg_degree: u32, seed: u64)
-> (NodeStore, EdgeStore)
{
let mut rng = StdRng::seed_from_u64(seed);
// Power-law degree distribution: most nodes have few edges,
// some supernodes have many. Realistic for large relational datasets.
// ...
}Seed = 42 for all published benchmarks. Anyone can reproduce exact results.
Metrics Collected
For every benchmark run, record:
{
"benchmark": "bfs_5hop_1m_bloom",
"timestamp": "2025-01-01T00:00:00Z", // Example benchmark run time
"hardware": {
"cpu": "AMD EPYC 7R32",
"cores": 16,
"ram_gb": 64,
"storage": "NVMe"
},
"postgres_version": "17.2",
"graph": {
"nodes": 1000000,
"edges": 20000000,
"avg_degree": 20,
"generator_seed": 42
},
"results": {
"p50_us": 1200,
"p95_us": 2400,
"p99_us": 4100,
"max_us": 8300,
"iterations": 10000,
"throughput_qps": 8300
}
}Continuous Benchmarking
Every PR must include benchmark results for:
bfs_5hop_1m— regression detection for the hot loop.search_token_1m— regression detection for TokenIndex.csr_build_1m— regression detection for build performance.
Benchmark results are committed to benches/results/ for historical tracking.
A GitHub Actions workflow runs cargo bench on a standardised machine and fails the PR if any benchmark regresses by >10%.