How long until we see something?

Two weeks for the diagnostic. Six weeks for a working pilot. Twelve weeks for a live production system. If we can't hit those, we'll tell you upfront.

Yes. All of it. Your repository, your deployment, your servers. We don't host your system and we don't lock you into our tooling.

What if AI isn't the right answer?

We tell you. The diagnostic is designed to catch this. About one in five diagnostics ends with us recommending the client not automate, or to fix something else first. We'd rather lose the build than sell you something that won't work.

Are you a consultancy or a software house?

Both. We started as a software house in 2017 and we've delivered 40+ production systems. We added the strategy and adoption work because clients were asking us to. The two halves work because we don't recommend things we couldn't build ourselves.

Can you work alongside our existing IT team?

Usually yes. Most of our engagements involve someone in-house who'll own the system after we leave. We design the handover from day one.

What about data protection and security?

UK data residency by default. ISO 27001 aligned. We've delivered into financial services, legal, and healthcare. We can share our security pack under NDA.

Financial Services AI Agent | Appoly Intelligence

The challenge

The team handled customer queries across three product lines: card payments, mortgage administration, and savings products. Each query landed in the same inbox and was triaged manually before being routed to a specialist. The team's average response time had crept from under an hour at launch to four hours. Recruiting more specialists wasn't a clean answer — the work was knowledge-heavy and onboarding took six months.

The pattern was visible in the data: about 40% of queries were repeat or near-repeat questions. Customer asking about a payment delay; specialist looking up the same policy section; replying with a near-identical answer. The other 60% were genuinely novel — customers asking unusual questions, complex cases requiring judgement.

We were brought in after the operations director had quoted three larger consultancies. All three had pitched a six-figure scoping engagement with a strategy phase before any build work. None of them had committed to a delivery timeline.

The diagnostic

We ran a two-week diagnostic. First, we sat with the team for two days, watched how they handled inbound queries, and read enough of the resolved tickets to understand the patterns. Then we did the numbers.

The diagnostic identified three things:

01About 40% of inbound traffic could be handled by an agent reading the existing knowledge base and resolved-ticket history. Confidence scoring would catch the cases where the agent should defer.
02The remaining 60% needed faster routing, not automation. The bottleneck wasn't reading the question — it was finding the right specialist for that question.
03The data was usable. The resolved-ticket archive was structured enough to embed and retrieve. The policy document needed re-chunking but was readable.

We told the operations director the build would take six weeks, quoted a fixed price, and they started us the following Monday.

What we built

Architecture · runtime flow

The system has three components:

Knowledge layer

We chunked the policy document and three years of resolved tickets, generated embeddings, and indexed them in a vector store with metadata (product line, query type, resolution date). Embedding strategy was paragraph-level for the policy document and resolution-summary-level for the tickets — the team had been writing one-line summaries on close-out, which turned out to be the right granularity for retrieval.

Agent layer

Inbound queries hit a triage prompt that classifies the query type and product line, then routes to one of five specialised retrieval flows. Each flow pulls relevant policy sections and similar past resolutions, drafts a candidate response, and runs a confidence check. Below threshold, the query routes to a human with the draft and supporting documents attached. Above threshold, the response is sent and the ticket auto-closed with a flag for spot-checking.

Operations layer

Logging, monitoring, and an audit trail showing every decision the agent made, what it read, and how confident it was. Every auto-resolved ticket is reviewable; every escalated one has the agent's draft attached so the specialist can edit rather than start from scratch.

The whole thing runs on the client's infrastructure. No data leaves their environment. The vector store sits behind their existing access controls.

Build phase

Working software at the end of week one. Not the production system — a thin slice handling one product line on a small slice of historical traffic, with the team able to inspect every decision. They found things we'd missed (the agent over-confidently answering questions that needed regulatory context the model didn't have access to). We tightened the confidence thresholds, added a regulatory-flag classifier, and re-tested.

By week three we had all three product lines wired up. Weeks four and five were spent on the operations layer — making sure the team could see what the agent was doing, override it, and feed corrections back into the retrieval set. Week six was production cutover, with a phased rollout: 10% of inbound traffic in week one, 50% in week two, full traffic in week three.

Outcomes

After three months in production:

—40% of inbound tickets close without a human touching them.
—12-second average response time on auto-resolved tickets, down from four hours.
—73% of escalated tickets close in the first specialist response (up from 55%) — because specialists work from the agent's draft, not starting cold.
—£2.4m annual saving, calculated on team time freed up and not having to recruit two more specialists this year.

The team didn't shrink. The work changed. The eight specialists now spend more time on the genuinely novel cases and less on routine policy lookup. Onboarding for new specialists is also down — they ramp on the agent's draft history, not the raw policy document.

What translates

The pattern works wherever you have:

—A repetitive inbound stream — support, claims, intake, intra-business queries.
—A knowledge base that's already structured, or close to it.
—A team that's good at the hard cases but losing time on the routine ones.
—Clear consequences for getting answers wrong, so confidence scoring matters.

It doesn't work where you have no historical resolution data to learn from, or where the work is fundamentally judgement-heavy from the first message.

From four hours to twelve seconds:
how an eight-person team handles
1,200 tickets a day.

The challenge

The diagnostic

What we built

Knowledge layer

Agent layer

Operations layer

Build phase

Outcomes

What translates

Got something that looks like this?

From four hours to twelve seconds: how an eight-person team handles 1,200 tickets a day.

The challenge

The diagnostic

What we built

Knowledge layer

Agent layer

Operations layer

Build phase

Outcomes

What translates

Got something that looks like this?

From four hours to twelve seconds:
how an eight-person team handles
1,200 tickets a day.