An AI underwriting assistant adopted by a 120-person credit operation in 10 weeks

2026-04-08 / 4 min / ai / llm / underwriting / fintech / production

Not a model demo. A workflow tool the credit team actually opened every morning. Built in 10 weeks, took manual review off the top decile of cases, and saved roughly five minutes of handling time per accepted draft against the pre-launch six-minute baseline. Here is how it shipped without an LLM-replaces-humans pitch.

The setup

A mid-size fintech with roughly 120 people on the credit operation. Underwriters reviewed every loan application by hand, averaging about six minutes per case, with a long tail of variance. The top 30% of applications were obvious approvals, the bottom 20% obvious declines, and the middle 50% required real judgment. The team was good. The problem was that they spent most of their day on cases that did not need their judgment, and the cases that did need it sat in a queue.

The timing and adoption claims here refer to the initial 10-week rollout. Handling-time changes were measured from queue instrumentation against the pre-launch six-minute average, not from self-reported estimates.

The brief from the head of credit was deliberately narrow. They did not want an autonomous underwriter. They wanted a tool that drafted the case review for the underwriter to accept, edit, or reject. The credit policy, the scorecards, and the regulatory posture would not move.

What we did not build

We did not replace the existing credit scorecard or any policy logic. Those were business and compliance artefacts the team had spent years tuning, and they were correct for the cases they covered.

We did not ship anything that could auto-decision a real case in the first six weeks. The model drafted, the human submitted. That separation was the entire reason adoption worked.

The system

Three retrieval steps fed every draft. First, the case dossier: the application, the supporting documents, the scorecard output, and any prior interactions. Second, the current credit policy, segmented by product and risk band, with citations the model could reference. Third, a small set of historic decisions on similar cases with the original underwriter’s notes attached.

The LLM produced a structured review: a one-paragraph summary, three to five risk callouts, a recommended decision, and citations into the policy memo for every claim it made. Strict JSON output and post-hoc citation verification. If the model cited a policy clause that did not say what it claimed, the draft was held and the underwriter saw a flag.

Every edit the underwriter made to the draft was logged with the original text, the final text, and the section. That edit stream became the training signal for the next iteration of the prompt and retrieval.

What made adoption work

Underwriters had veto power by default. The model never auto-submitted in the first month. The tool sat next to their existing queue and offered drafts; they could ignore it entirely if they wanted. They didn’t, because accepted-draft cases saved roughly five minutes of handling time compared with the pre-launch average from week two onward.

We instrumented which sections of the draft got rewritten most often and used that to retune retrieval and the prompt. The summary stabilised quickly. The risk callouts took longer; underwriters rewrote them often in the first two weeks because the model was over-indexing on stale risk signals.

Auto-approval was added only on the top decile of cases, only after four weeks of shadow scoring, and only with a daily review of a random sample by a senior underwriter. The bands grew slowly and never crossed into the middle distribution.

What did not work

Free-form prose summaries lost to structured fields. The first version produced a paragraph the underwriter had to scan. The second version gave them four labelled fields they could read in seconds. Adoption jumped in the same week.

Citation verification was harder than it looked. The model would phrase a policy clause in a way that was technically supported by the source but practically misleading. We added a second pass that compared the claim against the cited clause sentence by sentence. The pass was expensive per token but cheap compared to a wrong decision.

Latency budgets shifted mid-project. The team was happy to wait fifteen seconds for a good draft when we estimated five. We rebalanced toward quality on the slower path and saved the fast path for the obvious cases.

The lesson worth keeping

LLM products for expert workflows do not win by replacing the expert. They win by removing the tedious parts of the expert’s job, in a shape the expert recognises. The time savings on middle-case underwriting came from drafting and structured layout, not from automation. The automation that did ship, on the obvious decile, was the smallest part of the project and the easiest to defend in a regulator’s office.

If you are building one of these, design the draft around the workflow the expert already has. Do not redesign the workflow first. The model is the cheap part. The trust path is the expensive part.

If you have a credit or underwriting operation and want to know whether an LLM tool could remove tedious work without redesigning the workflow, send a brief.