Panama Papers Demo — End-to-End Setup
This guide takes you from a fresh machine to running graph queries against the ICIJ Panama Papers dataset inside PostgreSQL. Allow about 15 minutes for a first-time setup.
Prerequisites
| Requirement | Version | Notes |
|---|---|---|
| PostgreSQL | 14, 15, 16, or 17 | Homebrew, apt, or official installer |
| Rust | 1.78+ stable | rustup recommended |
| cargo-pgrx | Latest | Must match your pgrx crate version |
1. Install PostgreSQL
macOS (Homebrew)
brew install postgresql@17
brew services start postgresql@17
# Verify
psql --version
pg_isreadyUbuntu / Debian
sudo apt update
sudo apt install postgresql-17 postgresql-client-17
sudo systemctl enable --now postgresql
pg_isreadyVerify
# Should print that PostgreSQL is accepting connections
pg_isready2. Install Rust and cargo-pgrx
# Install Rust (skip if already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
# Install cargo-pgrx
cargo install --locked cargo-pgrx
# Initialize pgrx against your PostgreSQL installation
cargo pgrx init --pg17=$(which pg_config)
# Adjust the flag for your version: --pg14, --pg15, --pg163. Build and Install the pgGraph Extension
From the root of the evokoa-pgGraph repository:
cd graph
# Build and install into your local PostgreSQL
cargo pgrx install --releaseConfigure shared preload
Add the extension to postgresql.conf so GUC parameters register on startup:
# Find your postgresql.conf
psql -c "SHOW config_file;"
# Add this line (or use ALTER SYSTEM):
# shared_preload_libraries = 'graph'Then restart PostgreSQL:
# macOS (Homebrew)
brew services restart postgresql@17
# Linux
sudo systemctl restart postgresql4. Download the Panama Papers Dataset
The raw CSV files are not included in the repository. You must download them yourself from the ICIJ Offshore Leaks Database:
Download link: ICIJ Offshore Leaks Database
Download the full CSV archive and extract it. You will get a set of CSV files for nodes and relationships.
Place the files
Copy the extracted CSVs into test_data/panama_papers/ so your directory
structure looks like this:
evokoa-pgGraph/
├── graph/
├── docs/
├── bench/
└── test_data/
├── README.md
├── load.sql
└── panama_papers/ ← create this directory
├── nodes-addresses.csv
├── nodes-entities.csv
├── nodes-intermediaries.csv
├── nodes-officers.csv
├── nodes-others.csv
└── relationships.csv[!NOTE] The
test_data/panama_papers/directory is in.gitignore. The CSV files will not be committed to the repository.
5. Create the Database and Load the Data
# Create the graph_test database
createdb graph_test
# Load tables, data, and indexes from the repo root
psql -d graph_test -f test_data/load.sqlThe load script will:
- create 5 node tables (
entities,officers,intermediaries,addresses,others) and 1 relationship table (relationships); \copyeach CSV into its table;- build GIN full-text indexes and relationship lookup indexes;
- print a row-count summary when finished.
Expected output (approximate):
=== Panama Papers Dataset Loaded ===
table_name | row_count
----------------+----------
addresses | 151,127
entities | 213,634
intermediaries | 24,189
officers | 345,645
others | 2,884
relationships | 1,269,7966. Enable the Extension and Build the Graph
psql -d graph_test-- Install the extension
CREATE EXTENSION graph;
-- Discover tables, primary keys, and foreign-key edges from the public schema
SELECT * FROM graph.auto_discover('public');
-- Panama stores many relationship types in one edge table. Register that
-- source column so graph.shortest_path().edge_label and traverse().edge_path
-- return labels like officer_of, registered_address, and intermediary_of.
SELECT graph.add_edge(
'relationships'::regclass, 'node_id_start',
'entities'::regclass, 'node_id_end',
'relationship',
label_column := 'rel_type'
);
-- Verify what graph registered
SELECT * FROM graph.registered_tables();
SELECT * FROM graph.registered_edges();
-- Build the in-memory graph
SELECT * FROM graph.build();The build step loads all registered nodes and edges into the engine’s in-memory structures. On the Panama Papers dataset this typically takes 2–5 seconds and uses roughly 200–400 MB of RAM.
Verify
SELECT * FROM graph.status();You should see node and edge counts matching the loaded data, sync_mode,
memory usage, and a last_build timestamp.
7. Start Querying
You are ready to go. Open a psql session and enable timing:
\timing onQuick smoke tests
-- Search for an entity by name (case-insensitive substring by default)
SELECT node_table::regclass::text AS table_name,
node_id,
match_type,
node->>'name' AS name
FROM graph.search('name', 'mossack fonseca')
LIMIT 10;
-- Traverse 2 hops from an entity
SELECT depth,
node_table::regclass::text AS table_name,
node_id,
node->>'name' AS name
FROM graph.traverse('entities'::regclass, '10011209', 2)
ORDER BY depth, table_name
LIMIT 20;
-- Shortest path between two entities
SELECT step,
node_table::regclass::text AS table_name,
node_id,
edge_label,
node
FROM graph.shortest_path(
'entities'::regclass, '10011209',
'officers'::regclass, '12013972'
)
ORDER BY step;
-- Component overview
SELECT * FROM graph.component_stats();Full query cookbook
For the complete set of search, traversal, pathfinding, component, and agent workload patterns, see:
The cookbook is organized by use case and includes ready-to-paste SQL for investigative workflows, compliance checks, and LLM-agent integration patterns.
Troubleshooting
pg_isready says “no response”
PostgreSQL is not running. Start it with brew services start postgresql@17 or
sudo systemctl start postgresql.
cargo pgrx install fails
Make sure cargo pgrx init was run with the correct pg_config path for your
PostgreSQL version. Run which pg_config and verify it points to the right
installation.
CREATE EXTENSION graph fails with “could not access file”
The shared library was not installed into the correct PostgreSQL lib directory.
Re-run cargo pgrx install --release and check that pg_config --pkglibdir
matches where pgrx is installing.
graph.build() fails
Check graph.status() for error details. Common causes:
- tables not registered (run
graph.auto_discover('public')first); - memory limit too low (increase
graph.memory_limit_mbinpostgresql.conf).
First query is slow (~5 seconds)
This is the auto-load cost on a fresh backend connection. The engine loads the
persisted graph into memory on first use. Subsequent queries on the same
connection will be fast. Set graph.auto_load = on in postgresql.conf to
pre-warm on server startup.
Further Reading
| Document | What it covers |
|---|---|
| Installation & Configuration | Full GUC reference, upgrading, uninstalling |
| SQL API | Complete function signatures and behavior |
| Architecture | Engine internals and data flow |
| Memory Model | Capacity planning and memory formulas |
| Benchmarking | Methodology for reproducible performance measurement |
| Query Cookbook | Ready-to-run query patterns for the Panama Papers dataset |