Methodology

The methodology, exposed start to finish.

We don't ask you to trust us — we ask you to check. Every score carries the calculation that produced it: the sources, the cross-referenced signals, the hash that seals the result. This page explains, step by step, how we get to a number.

The pipeline, in depth

The volumes below reflect the system's operation today — they grow with new sources and change over time; they aren't a fixed commitment.

Collection

~95,000 documents a day, today, from 63 global public sources: news, SAM.gov, SEC EDGAR, regional grid operator (ISO) interconnection queues, sector RSS, USPTO.

Deterministic filter

Code-based rules — not AI — read the collected documents and discard what isn't relevant, before any AI cost. Volume drops to about 190 documents a day that move on to analysis.

Extraction

An AI reads each of the filtered documents and extracts the entities that matter: who, what, where, how much, when.

Correlation + Score

Deterministic logic (SQL and Python, not AI) cross-references independent signals — environmental permit, energy contract, federal registration, hiring. The more signals converge, the higher the region's score.

Prediction + Proof

The result is dated and sealed in an immutable ledger, identified by hash. Over time, every prediction becomes part of a public track record — including when we're wrong.

The score: CFI 0–100

The Compute Formation Index (CFI) gives a 0–100 score to each of the 49 monitored regions. The score doesn't count announcements — it measures the committed fraction of the interconnection queue that we can corroborate with independent signals. A queue inflated by speculative requests counts against the score, not for it.

The ± next to the score is the calculation's margin of uncertainty — also exposed, not hidden.

strong signals, committed queue

illustrative sample

78 ±6

REAL

"The buildout here is real."

0 · phantom45 · not sure68 · real · 100

weak signals, no counterpart

illustrative sample

22 ±9

PHANTOM

"Announcement with no counterpart in the queue — signals don't corroborate."

0 · phantom45 · not sure68 · real · 100

The limits — spelled out

Earning trust means being upfront about where the system doesn't reach yet.

Geographic coverage

Today we cover the US, at county/state level. We don't reach node or feeder level yet — the granularity some operational decisions require.

Source: news vs. filing

About 93% of sources shown today are news, not an official document (filing). We mark which is which on every item — we never treat one as the other.

Descriptive, not predictive

Today's score is descriptive: it measures corroboration now. A validated predictive track record only closes around December 2026 — we don't sell accuracy we haven't proven yet.

Coverage gap ≠ phantom

Absence of signal isn't proof of phantom; it's a coverage gap — and we say which one.

Snapshot, not live feed

Figures shown across the site, including illustrative visualizations, reflect the latest verified pull from public sources on a set schedule — not a real-time feed.

The ledger — check it yourself

Every score is sealed with a hash the moment it’s calculated. The seal records the date, the sources used, and the calculation that produced the number — nothing about it changes afterward without a trace. "Recompute it" leads to the methodology and the sources behind that specific score, so anyone can check the number’s path.

sealed·ledger #a1b2c3·recompute it ↗

illustrative example of the chip — every real score carries its own hash.

Straight questions

Do you have a track record of being right?

Not yet — that closes around December 2026. Today we sell what's descriptive and checkable, not the prediction. We don't claim accuracy we haven't proven.

Is this opinion, or an index?

An index. We don't give an analyst's opinion — every number is the result of a rule applied to public signals, traceable back to the source.

How do you avoid counting the same project twice?

Interconnection requests sometimes duplicate the same project under different filings — which inflates the queue. The system cross-references internal queue identifiers, filing dates, and request metadata to flag likely duplicates as "SUSPECT," with an explicit confidence level. That check already runs in the methodology behind every score; public, browsable exposure of those identifiers arrives in a later phase of the site.

Want to see this applied to a real project?

Request sample report →