Skip to Content
DemosPanama Papers

Panama Papers Demo — End-to-End Setup

This guide takes you from a fresh machine to running graph queries against the ICIJ Panama Papers dataset inside PostgreSQL. Allow about 15 minutes for a first-time setup.


Prerequisites

RequirementVersionNotes
PostgreSQL14, 15, 16, or 17Homebrew, apt, or official installer
Rust1.78+ stablerustup recommended
cargo-pgrxLatestMust match your pgrx crate version

1. Install PostgreSQL

macOS (Homebrew)

brew install postgresql@17 brew services start postgresql@17 # Verify psql --version pg_isready

Ubuntu / Debian

sudo apt update sudo apt install postgresql-17 postgresql-client-17 sudo systemctl enable --now postgresql pg_isready

Verify

# Should print that PostgreSQL is accepting connections pg_isready

2. Install Rust and cargo-pgrx

# Install Rust (skip if already installed) curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source $HOME/.cargo/env # Install cargo-pgrx cargo install --locked cargo-pgrx # Initialize pgrx against your PostgreSQL installation cargo pgrx init --pg17=$(which pg_config) # Adjust the flag for your version: --pg14, --pg15, --pg16

3. Build and Install the pgGraph Extension

From the root of the evokoa-pgGraph repository:

cd graph # Build and install into your local PostgreSQL cargo pgrx install --release

Configure shared preload

Add the extension to postgresql.conf so GUC parameters register on startup:

# Find your postgresql.conf psql -c "SHOW config_file;" # Add this line (or use ALTER SYSTEM): # shared_preload_libraries = 'graph'

Then restart PostgreSQL:

# macOS (Homebrew) brew services restart postgresql@17 # Linux sudo systemctl restart postgresql

4. Download the Panama Papers Dataset

The raw CSV files are not included in the repository. You must download them yourself from the ICIJ Offshore Leaks Database:

Download link: ICIJ Offshore Leaks Database

Download the full CSV archive and extract it. You will get a set of CSV files for nodes and relationships.

Place the files

Copy the extracted CSVs into test_data/panama_papers/ so your directory structure looks like this:

evokoa-pgGraph/ ├── graph/ ├── docs/ ├── bench/ └── test_data/ ├── README.md ├── load.sql └── panama_papers/ ← create this directory ├── nodes-addresses.csv ├── nodes-entities.csv ├── nodes-intermediaries.csv ├── nodes-officers.csv ├── nodes-others.csv └── relationships.csv

[!NOTE] The test_data/panama_papers/ directory is in .gitignore. The CSV files will not be committed to the repository.


5. Create the Database and Load the Data

# Create the graph_test database createdb graph_test # Load tables, data, and indexes from the repo root psql -d graph_test -f test_data/load.sql

The load script will:

  • create 5 node tables (entities, officers, intermediaries, addresses, others) and 1 relationship table (relationships);
  • \copy each CSV into its table;
  • build GIN full-text indexes and relationship lookup indexes;
  • print a row-count summary when finished.

Expected output (approximate):

=== Panama Papers Dataset Loaded === table_name | row_count ----------------+---------- addresses | 151,127 entities | 213,634 intermediaries | 24,189 officers | 345,645 others | 2,884 relationships | 1,269,796

6. Enable the Extension and Build the Graph

psql -d graph_test
-- Install the extension CREATE EXTENSION graph; -- Discover tables, primary keys, and foreign-key edges from the public schema SELECT * FROM graph.auto_discover('public'); -- Panama stores many relationship types in one edge table. Register that -- source column so graph.shortest_path().edge_label and traverse().edge_path -- return labels like officer_of, registered_address, and intermediary_of. SELECT graph.add_edge( 'relationships'::regclass, 'node_id_start', 'entities'::regclass, 'node_id_end', 'relationship', label_column := 'rel_type' ); -- Verify what graph registered SELECT * FROM graph.registered_tables(); SELECT * FROM graph.registered_edges(); -- Build the in-memory graph SELECT * FROM graph.build();

The build step loads all registered nodes and edges into the engine’s in-memory structures. On the Panama Papers dataset this typically takes 2–5 seconds and uses roughly 200–400 MB of RAM.

Verify

SELECT * FROM graph.status();

You should see node and edge counts matching the loaded data, sync_mode, memory usage, and a last_build timestamp.


7. Start Querying

You are ready to go. Open a psql session and enable timing:

\timing on

Quick smoke tests

-- Search for an entity by name (case-insensitive substring by default) SELECT node_table::regclass::text AS table_name, node_id, match_type, node->>'name' AS name FROM graph.search('name', 'mossack fonseca') LIMIT 10; -- Traverse 2 hops from an entity SELECT depth, node_table::regclass::text AS table_name, node_id, node->>'name' AS name FROM graph.traverse('entities'::regclass, '10011209', 2) ORDER BY depth, table_name LIMIT 20; -- Shortest path between two entities SELECT step, node_table::regclass::text AS table_name, node_id, edge_label, node FROM graph.shortest_path( 'entities'::regclass, '10011209', 'officers'::regclass, '12013972' ) ORDER BY step; -- Component overview SELECT * FROM graph.component_stats();

Full query cookbook

For the complete set of search, traversal, pathfinding, component, and agent workload patterns, see:

bench/QUERY_COOKBOOK.md

The cookbook is organized by use case and includes ready-to-paste SQL for investigative workflows, compliance checks, and LLM-agent integration patterns.


Troubleshooting

pg_isready says “no response”

PostgreSQL is not running. Start it with brew services start postgresql@17 or sudo systemctl start postgresql.

cargo pgrx install fails

Make sure cargo pgrx init was run with the correct pg_config path for your PostgreSQL version. Run which pg_config and verify it points to the right installation.

CREATE EXTENSION graph fails with “could not access file”

The shared library was not installed into the correct PostgreSQL lib directory. Re-run cargo pgrx install --release and check that pg_config --pkglibdir matches where pgrx is installing.

graph.build() fails

Check graph.status() for error details. Common causes:

  • tables not registered (run graph.auto_discover('public') first);
  • memory limit too low (increase graph.memory_limit_mb in postgresql.conf).

First query is slow (~5 seconds)

This is the auto-load cost on a fresh backend connection. The engine loads the persisted graph into memory on first use. Subsequent queries on the same connection will be fast. Set graph.auto_load = on in postgresql.conf to pre-warm on server startup.


Further Reading

DocumentWhat it covers
Installation & ConfigurationFull GUC reference, upgrading, uninstalling
SQL APIComplete function signatures and behavior
ArchitectureEngine internals and data flow
Memory ModelCapacity planning and memory formulas
BenchmarkingMethodology for reproducible performance measurement
Query CookbookReady-to-run query patterns for the Panama Papers dataset
Last updated on