Why we took the calculator away from the model

From the DocuStrata team · July 2026

From the DocuStrata team

Here is an uncomfortable fact about the technology our product is built on: large language models are extraordinary readers and unreliable calculators. Anyone building AI document tools knows this. Almost nobody designs around it, because designing around it means admitting it, and admitting it complicates the demo.

We'd rather complicate the demo. If DocuStrata's job is to read the documents you never will — loan disclosures, policies, contracts, statements — then a meaningful fraction of the questions you'll ask are money questions. What do I still owe? What's the total of payments? What does this fee schedule actually add up to? And a money answer that's fluent, confident, and wrong is worse than no answer at all, because it's wearing the costume of a right one.

What language models actually do with numbers

A language model doesn't compute. It predicts text. When it "multiplies" 59 by 415, it's producing the sequence of digits that seems most likely to follow, based on patterns in everything it has read. For famous arithmetic, that works. For your loan's arithmetic — numbers that have never appeared together anywhere in the world before your disclosure was printed — it works often enough to build false confidence and fails often enough to be disqualifying. The failures aren't dramatic. They're a transposed digit, a dropped carry, a plausible total that's off by a few hundred dollars. Exactly the kind of error a human skims past.

We are candid about this because we tripped over it ourselves — twice — during development, watching the model produce money math that looked right and wasn't. Two strikes was enough. The conclusion wasn't "prompt it better." The conclusion was architectural: the model should never be the thing doing arithmetic.

The calc step

So in DocuStrata, financial answers are produced by a division of labor.

The model does what it is genuinely superhuman at: reading. It finds the operative terms in your documents — the payment amount, the count, the balloon, the rate, the dates — and it cites where each one came from.

Then those terms go to a deterministic calculation step, running server-side, doing arithmetic the way a calculator does: exactly, every time, with no creativity. The number you see was computed, not composed.

Take the disclosure that says 59 payments of $415 with a final balloon of $9,115. The extracted terms are cited to the page. The math is then just math: 59 × $415 is $24,485, plus the $9,115 balloon is $33,600 — and if the document states a total of payments, the calc step reconciles against it and tells you whether the paper's own numbers agree with each other. Sometimes they don't. That's worth knowing too, and it's precisely the kind of thing a text predictor would smooth over rather than surface.

The result self-audits. You can check the inputs against the cited passages in seconds, and the arithmetic on top of verified inputs is not a matter of trust at all.

Citations are the same idea, applied to words

The calc step is one instance of a general principle: when you delegate reading, verification becomes the product. The attention you no longer spend reading has to be replaced by something, and the only honest replacement is verification cheap enough that you'll actually do it.

That's what citations are for. Every answer in DocuStrata is grounded in passages from your own documents, and the passages are one click away. Not a bibliography gesture — an audit trail. The design target is that a skeptical reader can go from "answer" to "the sentence in the source that supports it" in under a minute. And when your documents don't support an answer, the product says so rather than improvising, because a system that admits ignorance is usable and a system that never does is a liability.

Why architecture honesty matters in this category

There's a version of this essay we could have written with the seams hidden — "our advanced AI accurately analyzes your financial documents" — and it would demo identically. We think that version is how this product category dies. The first plausible wrong number attached to a real decision doesn't just burn one answer; it burns the delegation itself, and the user goes back to reading everything by hand, now with an extra subscription.

The alternative is to build like an instrument rather than an oracle. In precision work, you never trust a measurement you can't trace — you measure the part, you calibrate the gauge, and the number is the number because the process that produced it is inspectable. Same standard here: model reads, math computes, citations trace, and every link in that chain is one you can check.

Read nothing — but check anything, in seconds. That's the whole design.

Read nothing. Know everything. — docustrata.com

Answers are grounded in your own documents with citations; financial figures are computed server-side. Your documents are never used to train AI models.