Panama Papers Demo — End-to-End Setup

This guide takes you from a fresh machine to running graph queries against the ICIJ Panama Papers dataset inside PostgreSQL. Allow about 15 minutes for a first-time setup.

Prerequisites

Requirement	Version	Notes
PostgreSQL	14, 15, 16, or 17	Homebrew, apt, or official installer
Rust	1.78+ stable	`rustup` recommended
cargo-pgrx	Latest	Must match your `pgrx` crate version

1. Install PostgreSQL

macOS (Homebrew)


brew install postgresql@17
brew services start postgresql@17
 
# Verify
psql --version
pg_isready

Ubuntu / Debian


sudo apt update
sudo apt install postgresql-17 postgresql-client-17
sudo systemctl enable --now postgresql
 
pg_isready

Verify


# Should print that PostgreSQL is accepting connections
pg_isready

2. Install Rust and cargo-pgrx


# Install Rust (skip if already installed)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env
 
# Install cargo-pgrx
cargo install --locked cargo-pgrx
 
# Initialize pgrx against your PostgreSQL installation
cargo pgrx init --pg17=$(which pg_config)
# Adjust the flag for your version: --pg14, --pg15, --pg16

3. Build and Install the pgGraph Extension

From the root of the evokoa-pgGraph repository:


cd graph
 
# Build and install into your local PostgreSQL
cargo pgrx install --release

Configure shared preload

Add the extension to postgresql.conf so GUC parameters register on startup:


# Find your postgresql.conf
psql -c "SHOW config_file;"
 
# Add this line (or use ALTER SYSTEM):
#   shared_preload_libraries = 'graph'

Then restart PostgreSQL:


# macOS (Homebrew)
brew services restart postgresql@17
 
# Linux
sudo systemctl restart postgresql

4. Download the Panama Papers Dataset

The raw CSV files are not included in the repository. You must download them yourself from the ICIJ Offshore Leaks Database:

Download link: ICIJ Offshore Leaks Database

Download the full CSV archive and extract it. You will get a set of CSV files for nodes and relationships.

Place the files

Copy the extracted CSVs into test_data/panama_papers/ so your directory structure looks like this:


evokoa-pgGraph/
├── graph/
├── docs/
├── bench/
└── test_data/
    ├── README.md
    ├── load.sql
    └── panama_papers/          ← create this directory
        ├── nodes-addresses.csv
        ├── nodes-entities.csv
        ├── nodes-intermediaries.csv
        ├── nodes-officers.csv
        ├── nodes-others.csv
        └── relationships.csv

[!NOTE] The test_data/panama_papers/ directory is in .gitignore. The CSV files will not be committed to the repository.

5. Create the Database and Load the Data


# Create the graph_test database
createdb graph_test
 
# Load tables, data, and indexes from the repo root
psql -d graph_test -f test_data/load.sql

The load script will:

create 5 node tables (entities, officers, intermediaries, addresses, others) and 1 relationship table (relationships);
\copy each CSV into its table;
build GIN full-text indexes and relationship lookup indexes;
print a row-count summary when finished.

Expected output (approximate):


=== Panama Papers Dataset Loaded ===
 table_name     | row_count
----------------+----------
 addresses      |   151,127
 entities       |   213,634
 intermediaries |    24,189
 officers       |   345,645
 others         |     2,884
 relationships  | 1,269,796

6. Enable the Extension and Build the Graph


psql -d graph_test


-- Install the extension
CREATE EXTENSION graph;
 
-- Discover tables, primary keys, and foreign-key edges from the public schema
SELECT * FROM graph.auto_discover('public');
 
-- Panama stores many relationship types in one edge table. Register that
-- source column so graph.shortest_path().edge_label and traverse().edge_path
-- return labels like officer_of, registered_address, and intermediary_of.
SELECT graph.add_edge(
    'relationships'::regclass, 'node_id_start',
    'entities'::regclass, 'node_id_end',
    'relationship',
    label_column := 'rel_type'
);
 
-- Verify what graph registered
SELECT * FROM graph.registered_tables();
SELECT * FROM graph.registered_edges();
 
-- Build the in-memory graph
SELECT * FROM graph.build();

The build step loads all registered nodes and edges into the engine’s in-memory structures. On the Panama Papers dataset this typically takes 2–5 seconds and uses roughly 200–400 MB of RAM.

Verify


SELECT * FROM graph.status();

You should see node and edge counts matching the loaded data, sync_mode, memory usage, and a last_build timestamp.

7. Start Querying

You are ready to go. Open a psql session and enable timing:


\timing on

Quick smoke tests


-- Search for an entity by name (case-insensitive substring by default)
SELECT node_table::regclass::text AS table_name,
       node_id,
       match_type,
       node->>'name' AS name
FROM graph.search('name', 'mossack fonseca')
LIMIT 10;
 
-- Traverse 2 hops from an entity
SELECT depth,
       node_table::regclass::text AS table_name,
       node_id,
       node->>'name' AS name
FROM graph.traverse('entities'::regclass, '10011209', 2)
ORDER BY depth, table_name
LIMIT 20;
 
-- Shortest path between two entities
SELECT step,
       node_table::regclass::text AS table_name,
       node_id,
       edge_label,
       node
FROM graph.shortest_path(
    'entities'::regclass, '10011209',
    'officers'::regclass, '12013972'
)
ORDER BY step;
 
-- Component overview
SELECT * FROM graph.component_stats();

Full query cookbook

For the complete set of search, traversal, pathfinding, component, and agent workload patterns, see:

bench/QUERY_COOKBOOK.md

The cookbook is organized by use case and includes ready-to-paste SQL for investigative workflows, compliance checks, and LLM-agent integration patterns.

Troubleshooting

`pg_isready` says “no response”

PostgreSQL is not running. Start it with brew services start postgresql@17 or sudo systemctl start postgresql.

`cargo pgrx install` fails

Make sure cargo pgrx init was run with the correct pg_config path for your PostgreSQL version. Run which pg_config and verify it points to the right installation.

`CREATE EXTENSION graph` fails with “could not access file”

The shared library was not installed into the correct PostgreSQL lib directory. Re-run cargo pgrx install --release and check that pg_config --pkglibdir matches where pgrx is installing.

`graph.build()` fails

Check graph.status() for error details. Common causes:

tables not registered (run graph.auto_discover('public') first);
memory limit too low (increase graph.memory_limit_mb in postgresql.conf).

First query is slow (~5 seconds)

This is the auto-load cost on a fresh backend connection. The engine loads the persisted graph into memory on first use. Subsequent queries on the same connection will be fast. Set graph.auto_load = on in postgresql.conf to pre-warm on server startup.

Document	What it covers
Installation & Configuration	Full GUC reference, upgrading, uninstalling
SQL API	Complete function signatures and behavior
Architecture	Engine internals and data flow
Memory Model	Capacity planning and memory formulas
Benchmarking	Methodology for reproducible performance measurement
Query Cookbook	Ready-to-run query patterns for the Panama Papers dataset

Panama Papers Demo — End-to-End Setup

Prerequisites

1. Install PostgreSQL

macOS (Homebrew)

Ubuntu / Debian

Verify

2. Install Rust and cargo-pgrx

3. Build and Install the pgGraph Extension

Configure shared preload

4. Download the Panama Papers Dataset

Place the files

5. Create the Database and Load the Data

6. Enable the Extension and Build the Graph

Verify

7. Start Querying

Quick smoke tests

Full query cookbook

Troubleshooting

pg_isready says “no response”

cargo pgrx install fails

CREATE EXTENSION graph fails with “could not access file”

graph.build() fails

First query is slow (~5 seconds)

Further Reading

`pg_isready` says “no response”

`cargo pgrx install` fails

`CREATE EXTENSION graph` fails with “could not access file”

`graph.build()` fails