Side note: the diagrams are kind of AI slop in this, but I wrote the rest. I will get around to fixing them later.
There are many ways to apply NLP to a system that understands the large corpus of U.S. nuclear compliance material. One approach, which some Everstar competitors took, is to fine-tune an existing LLM on nuclear text. Another is to train a foundation model from scratch on nuclear documents and bake the domain into the weights. Both are a poor fit for this problem.
In the early LLM cycle there was a lot of hype around fine-tuning models and training domain models from scratch, but the economics and data shape usually point the other way. It made more sense to build a retrieval-heavy GPT wrapper than to train a model. That was clear in 2023 and has only become clearer since. If you have parsed through ADAMS, you know much of it is operationally irrelevant for licensing retrieval: meeting notices, routine memos, transmittals, duplicates, and other low-signal records. There is not enough high-quality, task-specific supervision to make model training the center of the system.
That leaves an agentic system whose quality depends on retrieval. The model needs an efficient way to pull exact source material into context, because it does not know the regulatory corpus by itself. This was one of my primary jobs at Everstar: ingest public and private documents so agents could understand the nuclear compliance landscape and synthesize new documents from prior precedents.
The documents that were ingested for Gordian originated from many different sources:
NRC ADAMS; eCFR Title 10; 10 CFR Parts 20, 35, 37, 50, 52, 53, 71, 73; 10 CFR 50 Appendix B; Part 21-style compliance; NUREG-1556 Vol. 17; NUREG-1801; NUREG-2191; SRPs; RAIs; NRC/licensee correspondence; NQA-1; DOE 10 CFR 830; DOE-STD-3009-2014; DOE O 433.1B; DOD AR 50-7.
ADAMS had most of the data we needed, and the rest could be scraped from government websites such as eCFR. ADAMS also had a painful search interface, which made downloading and indexing it for internal use even more important. The harder part was knowing which documents mattered. ADAMS contains millions of records, including both accepted and rejected compliance applications, so you cannot blindly treat every document as precedent. Theresa, our chief nuclear officer, and Excel Services, a nuclear compliance consultancy, helped identify which documents should be available for lookup and which should form the core of the nuclear brain.
The two primary methods of indexing large corpora of text are semantic search and lexical search. Semantic search embeds chunks into vector space and retrieves by meaning; lexical search builds an inverted index over tokens and usually scores exact term overlap with BM25. In practice, the useful system is hybrid: vectors catch paraphrase, BM25 catches citations, part numbers, acronyms, and exact regulatory language.
The most interesting retrieval result was contextual embedding, a method popularized by posts like Anthropic's. Contextual embeddings are especially useful for nuclear compliance documents that can run hundreds or thousands of pages. A raw chunk often loses the document-level meaning that explains why the paragraph matters, so I generated compact context for each chunk before embedding it. To compare methods, we built an internal nuclear-specific retrieval benchmark with more than 2000 examples. I used it to compare our ingestion path against plain embedding baselines and Atomic Canyon's public Fermi sparse retriever. In our tests, Fermi's learned sparse representation behaved more like lexical retrieval than a dense semantic encoder.
The most direct product to build with a nuclear brain is a document search engine, so that is what we set up first. We made "Better ADAMS": natural-language document search that returned the relevant document, the exact supporting quote, and an explanation of why the quote answered the question.
The second obvious product was a nuclear chatbot. I built this in two forms. First, I hosted a remote nuclear MCP server that exposed our document database, so users could plug in their preferred LLM and have it answer nuclear questions, pull sources, and dig for connections. Second, we built an internal agentic chat that ran more thinking, validation, and source-attribution steps before returning a response. The internal path had higher latency but produced higher-quality answers.
The lofty goal for Everstar while I was there was to generate a full Subsequent License Renewal (SLR) draft that would need minimal revision from start to finish. That could save millions of dollars on producing one of these applications. Retrieval was the first step, but draft generation had a separate architecture, so I will save that for another post.