About the Method

1.4M Documents

1.9M Emails

2.8M Pages Indexed

1.5M Entity Records

13.4 GB Structured Data

8 Databases

The Problem

In April 2026, the Department of Justice released approximately 3.5 million pages of Epstein-related files under the Epstein Files Transparency Act (H.R. 4405). The release was the largest single disclosure of investigative material in the case's history.

This analysis ingested the full production into eight structured databases totaling 13.4 GB: 1.4 million documents, 1.9 million emails, 2.8 million indexed pages, 1.5 million entity records, a 606-node knowledge graph with 2,302 relationship edges, and a concordance spanning every Bates number in the release.

A corpus that size cannot be read sequentially. Keyword search finds only what you already know to look for. The documents that matter most — the ones that reveal what the curation was designed to obscure — are not the ones with obvious keywords. They are the ones sitting at structural chokepoints in the release: the gaps between Bates numbers, the clusters of zero-byte placeholder files, the sequential exhibits that tell a story only when read in order, the entity co-occurrences that surface relationships never reported.

The Approach

Analytical techniques drawn from signals intelligence and large-scale data forensics were applied to this document corpus. The core structure is the same: a multi-source, multi-clock, internally-contradictable dataset where the truth lives in the delta between sources, not inside any single source.

Align timestamps across clocks. Catalog identifiers. Decode structured records. Query databases. Map networks. Detect anomalies. Cross-correlate everything. Applied to a government document release, that means dates, Bates numbers, OCR text, entity graphs, and gap analysis — systematically, across the entire production.

The curation is the evidence. What was preserved reveals the curators' thesis. What was removed reveals what they needed to protect. Reading the two against each other reconstructs the operation.

The Pipeline

The analysis follows a 16-phase pipeline. Each phase is deterministic and reproducible. The phases build on each other: early phases produce structured data, later phases detect patterns, and the final phases cross-validate findings across independent datasets.

Acquisition — Raw corpus download, hash, chain of custody
Structured parsing — OCR, entity extraction, Bates number cataloging
Semantic layer — Document classification, topic extraction, summary generation
Entity cataloging — Name resolution, deduplication, role assignment
Knowledge graph construction — Relationship mapping across all documents
Timeline reconstruction — Date alignment, sequence detection, gap analysis
Bates gap analysis — Missing page detection, cluster identification, pattern scoring
Co-occurrence analysis — Entity pairs, frequency anomalies, cluster detection
Cross-reference validation — Court records, public filings, news archives
Anomaly detection — Zero-byte files, OCR artifacts, format inconsistencies
Sequential narrative reconstruction — Exhibit chains, deposition threads
Financial tracing — Dollar amounts, wire references, account patterns
Communication mapping — Email chains, phone records, visitor logs
Index-space analysis — Mathematical operations against corpus positions
Curation delta — Pre-release vs. post-release comparison, redaction pattern analysis
Cross-dataset validation — Findings verified against independent corpus crawls

The Standard

Every finding published on this site meets two tests:

Net-new — The finding has not been surfaced elsewhere in this framing. If mainstream reporting has already covered the same facts with the same implications, it is not published here.
Material — The finding is specific, sourced, and consequential. A fact a reasonable reader would want on the record. Not a colorful detail. Not a restatement of something already understood.

Both tests must pass. Every finding cites specific DOJ document identifiers, page numbers, and direct quotes. Every source document is attached with a SHA-256 hash. Every claim is independently verifiable against the public record.

Verification

No finding in this collection requires a security clearance, a FOIA request, or institutional access to verify. Every source document is a US Government work released under the The Epstein Index Transparency Act — public domain under 17 U.S.C. § 105.

Each finding page includes the exact document, the exact pages, the exact quotes, and step-by-step verification instructions. If you cannot complete the verification steps for a given finding, the finding has failed its own standard.