Methodology
The methodology, exposed start to finish.
We don't ask you to trust us — we ask you to check. Every score carries the calculation that produced it: the sources, the cross-referenced signals, the hash that seals the result. This page explains, step by step, how we get to a number.
The pipeline, in depth
The volumes below reflect the system's operation today — they grow with new sources and change over time; they aren't a fixed commitment.
Collection
~95,000 documents a day, today, from 63 global public sources: news, SAM.gov, SEC EDGAR, regional grid operator (ISO) interconnection queues, sector RSS, USPTO.
Deterministic filter
Code-based rules — not AI — read the collected documents and discard what isn't relevant, before any AI cost. Volume drops to about 190 documents a day that move on to analysis.
Extraction
An AI reads each of the filtered documents and extracts the entities that matter: who, what, where, how much, when.
Correlation + Score
Deterministic logic (SQL and Python, not AI) cross-references independent signals — environmental permit, energy contract, federal registration, hiring. The more signals converge, the higher the region's score.
Prediction + Proof
The result is dated and sealed in an immutable ledger, identified by hash. Over time, every prediction becomes part of a public track record — including when we're wrong.
The score: CFI 0–100
The Compute Formation Index (CFI) gives a 0–100 score to each of the 49 monitored regions. The score doesn't count announcements — it measures the committed fraction of the interconnection queue that we can corroborate with independent signals. A queue inflated by speculative requests counts against the score, not for it.
The ± next to the score is the calculation's margin of uncertainty — also exposed, not hidden.
strong signals, committed queue
illustrative sample
"The buildout here is real."
weak signals, no counterpart
illustrative sample
"Announcement with no counterpart in the queue — signals don't corroborate."
The limits — spelled out
Earning trust means being upfront about where the system doesn't reach yet.
Geographic coverage
Today we cover the US, at county/state level. We don't reach node or feeder level yet — the granularity some operational decisions require.
Source: news vs. filing
About 93% of sources shown today are news, not an official document (filing). We mark which is which on every item — we never treat one as the other.
Descriptive, not predictive
Today's score is descriptive: it measures corroboration now. A validated predictive track record only closes around December 2026 — we don't sell accuracy we haven't proven yet.
Coverage gap ≠ phantom
Absence of signal isn't proof of phantom; it's a coverage gap — and we say which one.
Snapshot, not live feed
Figures shown across the site, including illustrative visualizations, reflect the latest verified pull from public sources on a set schedule — not a real-time feed.
The ledger — check it yourself
Every score is sealed with a hash the moment it’s calculated. The seal records the date, the sources used, and the calculation that produced the number — nothing about it changes afterward without a trace. "Recompute it" leads to the methodology and the sources behind that specific score, so anyone can check the number’s path.
illustrative example of the chip — every real score carries its own hash.
Straight questions
Do you have a track record of being right?
Not yet — that closes around December 2026. Today we sell what's descriptive and checkable, not the prediction. We don't claim accuracy we haven't proven.
Is this opinion, or an index?
An index. We don't give an analyst's opinion — every number is the result of a rule applied to public signals, traceable back to the source.
How do you avoid counting the same project twice?
Interconnection requests sometimes duplicate the same project under different filings — which inflates the queue. The system cross-references internal queue identifiers, filing dates, and request metadata to flag likely duplicates as "SUSPECT," with an explicit confidence level. That check already runs in the methodology behind every score; public, browsable exposure of those identifiers arrives in a later phase of the site.
Want to see this applied to a real project?
Request sample report →