← Work · Financial Services

From four hours to twelve seconds: how an eight-person team handles 1,200 tickets a day.

A back-office team at a UK financial services firm was drowning in inbound queries. We built an AI agent that handles the straightforward ones and routes everything else to a human, with full traceability back to source.

£2.4m+

Annual saving

40%

Auto-resolved

12 sec

Response time

6 wks

To production

The challenge

The team handled customer queries across three product lines: card payments, mortgage administration, and savings products. Each query landed in the same inbox and was triaged manually before being routed to a specialist. The team's average response time had crept from under an hour at launch to four hours. Recruiting more specialists wasn't a clean answer — the work was knowledge-heavy and onboarding took six months.

The pattern was visible in the data: about 40% of queries were repeat or near-repeat questions. Customer asking about a payment delay; specialist looking up the same policy section; replying with a near-identical answer. The other 60% were genuinely novel — customers asking unusual questions, complex cases requiring judgement.

We were brought in after the operations director had quoted three larger consultancies. All three had pitched a six-figure scoping engagement with a strategy phase before any build work. None of them had committed to a delivery timeline.

The diagnostic

We ran a two-week diagnostic. First, we sat with the team for two days, watched how they handled inbound queries, and read enough of the resolved tickets to understand the patterns. Then we did the numbers.

The diagnostic identified three things:

  • 01About 40% of inbound traffic could be handled by an agent reading the existing knowledge base and resolved-ticket history. Confidence scoring would catch the cases where the agent should defer.
  • 02The remaining 60% needed faster routing, not automation. The bottleneck wasn't reading the question — it was finding the right specialist for that question.
  • 03The data was usable. The resolved-ticket archive was structured enough to embed and retrieve. The policy document needed re-chunking but was readable.

We told the operations director the build would take six weeks, quoted a fixed price, and they started us the following Monday.

What we built

Architecture · runtime flow
Inbound query Triage classify type + product line Retrieve policy + 3yrs of resolved tickets Draft + confidence candidate response + score HIGH LOW Auto-respond flag for spot-check Escalate specialist + draft attached

The system has three components:

Knowledge layer

We chunked the policy document and three years of resolved tickets, generated embeddings, and indexed them in a vector store with metadata (product line, query type, resolution date). Embedding strategy was paragraph-level for the policy document and resolution-summary-level for the tickets — the team had been writing one-line summaries on close-out, which turned out to be the right granularity for retrieval.

Agent layer

Inbound queries hit a triage prompt that classifies the query type and product line, then routes to one of five specialised retrieval flows. Each flow pulls relevant policy sections and similar past resolutions, drafts a candidate response, and runs a confidence check. Below threshold, the query routes to a human with the draft and supporting documents attached. Above threshold, the response is sent and the ticket auto-closed with a flag for spot-checking.

Operations layer

Logging, monitoring, and an audit trail showing every decision the agent made, what it read, and how confident it was. Every auto-resolved ticket is reviewable; every escalated one has the agent's draft attached so the specialist can edit rather than start from scratch.

The whole thing runs on the client's infrastructure. No data leaves their environment. The vector store sits behind their existing access controls.

Build phase

Working software at the end of week one. Not the production system — a thin slice handling one product line on a small slice of historical traffic, with the team able to inspect every decision. They found things we'd missed (the agent over-confidently answering questions that needed regulatory context the model didn't have access to). We tightened the confidence thresholds, added a regulatory-flag classifier, and re-tested.

By week three we had all three product lines wired up. Weeks four and five were spent on the operations layer — making sure the team could see what the agent was doing, override it, and feed corrections back into the retrieval set. Week six was production cutover, with a phased rollout: 10% of inbound traffic in week one, 50% in week two, full traffic in week three.

Outcomes

After three months in production:

  • 40% of inbound tickets close without a human touching them.
  • 12-second average response time on auto-resolved tickets, down from four hours.
  • 73% of escalated tickets close in the first specialist response (up from 55%) — because specialists work from the agent's draft, not starting cold.
  • £2.4m annual saving, calculated on team time freed up and not having to recruit two more specialists this year.

The team didn't shrink. The work changed. The eight specialists now spend more time on the genuinely novel cases and less on routine policy lookup. Onboarding for new specialists is also down — they ramp on the agent's draft history, not the raw policy document.

What translates

The pattern works wherever you have:

  • A repetitive inbound stream — support, claims, intake, intra-business queries.
  • A knowledge base that's already structured, or close to it.
  • A team that's good at the hard cases but losing time on the routine ones.
  • Clear consequences for getting answers wrong, so confidence scoring matters.

It doesn't work where you have no historical resolution data to learn from, or where the work is fundamentally judgement-heavy from the first message.

Got something that looks like this?

We start with a two-week diagnostic. If the numbers don't work, we tell you and refund the second week.

Start with a diagnostic →