Scale me AI

Case study · Fintech · Cyprus

We built a full AI sales platform for Marlowe Payments. A fintech group HubSpot wouldn't onboard.

Five regulated products across forex, gaming, crypto, and high-risk merchant payments. We replaced ~18 to 24 hours of manual SDR work per rep, per week with an orchestrator that discovers, qualifies, dual-LLM judges, and reaches every prospect across email, Telegram, WhatsApp, and outbound voice. Phase 1 in n8n in 3 weeks. Phase 2 a custom Convex + React 19 application that runs in production today.

Services used: Lead Generation Automation + AI Voice Agents + Workflow Automation

See how we work →

Today's outreach run

Live
Lead 04219 · Forex broker, EU-licensed · Judging

Verdict A (GPT-4o): Pursue · 84

Verdict B (Gemini): Pursue · 79 — synthesis layer agrees

  • First-touch email · payments product

    Style-matched · pain signal: payment-rail switch

    Sent
  • WhatsApp follow-up · day 3

    Per-lead rule: prefers short msgs

    Queued
  • Vapi voice call · day 7

    Async runtime · webhook close-out

    Queued

This week: ~210 leads queued

Run time: <1 hour

  • ~18-24 hrs

    rep / week given back

  • ~210

    qualified leads / week

  • 4

    channels orchestrated

  • 2

    independent LLM judges + synthesis

  • n8n → Convex

    Phase 1 to Phase 2 build path

Client

About Marlowe Payments

Marlowe Payments is a Cyprus-headquartered fintech group with five products spread across forex tooling, a neobanking layer, an OTC desk, an invoicing product, and a consumer wallet. Their buyers are forex brokers, online gaming operators, crypto businesses, and high-risk merchants — verticals where the biggest off-the-shelf SDR and CRM platforms either refuse to onboard you, or strip out the features you actually need. So they were running an outbound team with spreadsheets, Gmail, Telegram, WhatsApp, and a phone, and burning a meaningful chunk of every rep's week on busywork that should have been automated.

FieldDetail
CountryCyprus (EU)
VerticalsForex, online gaming, crypto / OTC, high-risk merchant payments
Products in scope5 fintech products (payments, neobanking, OTC, invoicing, consumer wallet)
Sales motionOutbound to brokers, merchants, and platforms — typical 100 to 5,000 employee range
Why the big CRMs wouldn't take themOff-the-shelf SDR tools refuse most high-risk verticals or limit features. Needed a system built for this from day one.
DisclosureAnonymized under NDA
EngagementPhase 1 PoC in n8n (~3 weeks), then Phase 2 custom Convex + React 19 production app (~10 to 14 weeks)

Before

Sales is mostly busywork. Especially when you can't use HubSpot.

Marlowe's reps were good at closing. They were drowning in everything else. Three problems compounded.

  1. 01

    Manual research kills throughput

    Every prospect needs a 30-minute review before you know if they're worth a touch — Cyprus regulators, payment-rail switches, hiring posts, sanctions lists, decision-maker extraction. A rep doing that by hand can profile maybe 10 to 15 companies a day before energy collapses.

  2. 02

    Multi-channel outreach is fragmented

    Gmail, Telegram, WhatsApp, voice — each in its own app. Threads diverge. Replies get missed. Follow-ups slip. There was no single record of what was said where, and the team cobbled together a story from screenshots when leadership asked.

  3. 03

    Big CRMs won't take regulated verticals

    Forex, gaming, crypto, high-risk merchants. Most off-the-shelf SDR and outreach tools either refuse onboarding outright or quietly limit features once they figure out who the customers are. Marlowe needed a platform built for these verticals from day one — not a generic SaaS that would pull the rug.

Net effect: each rep was burning roughly 18 to 24 hours a week on work that wasn't closing. That's a half-FTE per seller, paid for in salary, recouped in nothing.

Before

  • ~18 to 24 hrs/rep/week on busywork
  • 30+ minutes per prospect, manual research
  • 4 channels in 4 separate apps
  • No unified history per lead
  • Off-the-shelf SDR tools refused the verticals

After

  • Reps focused on calls and closes
  • Profile + verdict in seconds
  • One thread per lead across all 4 channels
  • Every send, reply, and call logged + replayable
  • Built for the verticals from day one — owned by the client

Solution architecture

An orchestrator, ten tools, four subagents, and a real audit trail

We didn't bolt prompts onto a CRM. We built a typed, plan-driven pipeline where every step is replayable. The orchestrator hands work to tools, tools delegate harder questions to subagents, and every model call is logged with cost, latency, and the inputs that produced it.

The four layers

Layer 1

Orchestrator

Reads the user's discovery brief, calls Anthropic Claude with the orchestrator system prompt and the tool catalogue, returns a typed Plan with stages, fan-out, gates, and replan budget.

convex/orchestrator.ts · runs.plan

Layer 2

Runner + Tools

Walks the Plan stage by stage. Dispatches Tools — single-step capabilities like discoverCandidates, scoreFit, judgeCompanies. Records every call as a tool_calls row.

convex/runner.ts · ~10 tools registered

Layer 3

Subagents

Multi-step agentic loops invoked from inside Tools. Each owns its own prompt, tool catalogue, cost cap, and step limit. Logged to subagent_runs.

4 subagents · A1 (sync) + A2 (async voice) runtimes

Layer 4

External tools

Stateless API calls — Firecrawl, Perplexity, Serper, Jina, Bright Data. Cached where useful (firecrawl_cache table). Tracked via ad-hoc tool log + per-call cost.

Stateless · cached · cost-tracked

The pipeline, end to end

  1. 01

    Discover

    Web search + SERP APIs + scrapers running in parallel from a one-line brief like "find forex brokers in Cyprus, 20 to 200 employees, EU-licensed".

    orchestrator → discoverCandidates · serper · seed_companies

  2. 02

    Research

    For each company, scrape website, LinkedIn, regulators, public filings. Extract employees, HQ, jurisdiction, licences, products. Pain signals attached: hiring spikes, funding, regulatory filings, complaints, tech-stack gaps.

    extractCompanyProfile · research_agent · website_extractor

  3. 03

    Qualify

    Programmable gate rules (visual editor: "licensed in EU", "employee count 10 to 500", "active payment processing"). Plus AI fit scoring against the ICP. Sanctions list checked alongside. Failed leads keep their reason.

    convex/gateRules.ts · gateEval.ts · scoreFit

  4. 04

    Judge

    Two independent LLMs (Gemini 1.5-pro and GPT-4o) verdict each qualified company. A third synthesis layer reads both verdicts and produces one PURSUE / PASS recommendation, flagging agreement, conflict, and risk. Ambiguous cases surface for human review.

    judgeCompanies · judge_opinions table · synthesis schema

  5. 05

    Reach

    draftOutreach generates first-touch + follow-up messages from the company profile, pain signals, and the product knowledge base. Sends across email, Telegram, WhatsApp, and outbound voice (Vapi). Smart cadences fire automatically when no reply.

    draftOutreach · followUp · voiceConsult

  6. 06

    Learn

    Inbound replies → classifyAndDraftReply (interested / objection / question / unsubscribe). Each lead builds a fingerprint: tone, response time, message length, channel preference. Future drafts match it. Next-best-action card refreshes per lead.

    classifyAndDraftReply · styleLearn · NBA card

Every step writes back to crm_leads, every model call generates an llm_traces row, and the React frontend re-renders live via Convex subscriptions. No batch jobs. No nightly syncs. No black box.

Inside the pipeline

Five AI engines that replace what an SDR team does by hand

Each engine is a single concern, owned by one set of files in the codebase, with its own cost cap and traceable output. Together they cover the full SDR workflow from a one-line discovery brief to a next-best-action recommendation per lead.

Engine 01

Discover

Find the right companies before any human touches them. A natural-language brief becomes a parallel search across SERPs, scrapers, and curated registries. The Cyprus regulator list. EU licensing databases. Hiring boards. Sanctions lists. Each candidate lands with the source URLs that surfaced it.

  • Web Discovery Engine — Google + SERP APIs + custom registries
  • Seed company expansion — pull subsidiaries, parent groups, peers
  • Tech-stack and pain-signal detection per candidate

discoverCandidates · serper · seed_companies · firecrawl_cache

Engine 02

Qualify

Two filters in series. Programmable gate rules — boolean AND/OR over fields like jurisdiction, employee band, licence status, active payment processing. Anything that fails a hard rule is rejected with the reason logged. Then an AI fit score against the ICP, with reasoning per field, so a borderline lead surfaces the way a thoughtful SDR would describe it.

  • Visual rule editor — boolean AND/OR over any extracted field
  • AI fit scoring with field-level confidence map
  • Sanctions list / red-flag check runs alongside
  • Failed leads keep their reasons — no black-box rejections

convex/gateRules.ts · gateEval.ts · scoreFit · sanctions_list

Engine 03

Judge

Two AIs argue. A third one decides. Gemini 1.5-pro and GPT-4o each independently analyse every qualified company and produce a verdict with reasoning. A synthesis layer reads both, flags agreement / conflict / risk, and produces a single PURSUE or PASS recommendation. Disagreements between models are exactly the cases a human should review — instead of being hidden inside one model&apos;s confidence number.

  • Dual-LLM verdicts shown side by side in the UI
  • Synthesis layer — single recommendation with conflict + risk flags
  • Alternative-company suggestions (subsidiaries, peers in the same space)
  • Specific decision-maker contact recommendations per lead

judgeCompanies · judge_opinions · synthesis schema

Engine 04

Reach

Email. Telegram. WhatsApp. Voice. One system. The first-touch is drafted from the company profile, the pain signals, and the product knowledge base — across whichever channels the lead has opened. Smart cadences fire automatically when no reply. Outbound voice is a real Vapi call with gpt-4o-mini, transcripts back into the lead profile. Multi-channel orchestration tracks what was sent on each channel, avoids duplicates, and picks the next channel based on observed reply patterns.

  • AI first-touch drafting per channel + per product
  • Smart follow-up sequences (D1, D3, D7) with context awareness
  • Inbound reply classification: interested / objection / question / unsubscribe
  • Outbound voice via Vapi — gpt-4o-mini, async runtime, transcript writeback

draftOutreach · firstTouch.ts · followUp.ts · voiceConsult · vapiHandlers.ts

Engine 05

Learn

Each lead gets its own little model. Tone, response time, message length, channel preference — the platform builds a fingerprint over time and matches every future draft to it. Per-lead rules accumulate from real behaviour ("WhatsApp only", "never on Fridays", "prefers short messages") with no manual entry. The product KB is RAG-indexed, so drafters retrieve accurate product specifics on demand instead of inventing them.

  • Communication-style fingerprint per lead, updated continuously
  • Per-lead rules sourced automatically from inbound messages
  • Product knowledge base RAG-indexed via @convex-dev/rag
  • Next-best-action card per lead: confidence, urgency, channel

styleLearn.ts · crm_leads.fingerprint · convex/kb · queryKnowledgeBase

How we built it

Three weeks to a working pipeline. Then a real production app.

We don't deliver a 14-week build before anyone sees a working version. The point of Phase 1 is to validate the flow on the actual verticals, the actual messaging style, the actual reply patterns — before anyone commits to the production architecture.

Phase 1 — Fast PoC

Toolingn8n (self-hosted, EU region)
Window~3 weeks

An end-to-end pipeline wired to a sample list of ~100 prospects, sending real first-touch emails and Telegram messages, with a feature-flagged WhatsApp + Vapi voice call branch. Goal: prove that discover → research → qualify → judge → reach actually works on Marlowe&apos;s real verticals before anyone signed off on the production build. Pipeline started filling within the first week.

Phase 2 — Production app

ToolingReact 19 · Vite 7 · Tailwind v4 · Shadcn UI · Convex backend · ai-sdk
Window~10 to 14 weeks

A custom application Marlowe owns and runs in production. ~16 routes (Login, Home, Runs, Leads, Companies, Messages, Brain, Subagents, Settings, Dev tools), 38 Convex tables, ~212 TypeScript files. Reactive UI via Convex subscriptions — drafts, runs, replies, and traces appear live without polling. Full LLM trace replay for any decision the platform made.

Phase 1 doesn't replace Phase 2. It de-risks Phase 2. By the time the production architecture went into design, every "wait, what about X" question had already been answered on a real cohort, with real reply data, in front of the actual reps.

Convex was the right choice for the production app because every UI element on Marlowe's screens is reactive — drafts, runs, replies, and traces have to appear the moment they happen. That removed an entire class of polling and websocket plumbing the team would otherwise have had to maintain.

Human-in-loop

Trust the AI gradually. Per lead. Per channel.

Marlowe operates in regulated and high-risk verticals where one bad send can become a compliance incident. The platform doesn't make you choose between full automation and full manual. It lets you move the dial per lead, per channel, while every step stays reviewable.

01

MANUAL — AI drafts, you send

The platform produces drafts on every channel, the rep reviews each one, and nothing leaves the building until a human clicks send. The starting point for any new vertical or any new product.

02

SEMI-AUTO — AI follow-ups, you handle replies

First-touch goes out automatically once approved. Smart follow-ups (D1, D3, D7) fire on schedule. The moment a real reply lands, the rep takes over. The most-used mode at Marlowe.

03

FULL AUTO — AI runs the whole sequence

Reserved for warm leads with a verified per-lead style fingerprint and a clear PURSUE verdict. Every send is still logged, replayable, and reversible.

Draft approval workflow

Every AI message can be reviewed before send. Approve individually, edit inline, or auto-send for trusted leads. ai_drafts table holds source-of-truth.

Send conditions are guardrails

Outreach toggle on, channel connected, template body not empty after substitution. If any condition fails, the platform falls back to AI-generated copy — never sends junk.

Pause and override

One-click pause for any lead. Override the AI&apos;s next action with your own. The AI resumes from context after the manual intervention — no broken state.

The point isn't to take the human out of the loop. It's to take the human out of the busywork — and put them back in front of the decisions that need judgement.

Compliance posture

Built for verticals where one bad send becomes an incident

Forex, gaming, crypto, and high-risk merchants attract regulators in a way that B2B SaaS doesn't. The compliance work isn't a bolt-on at the end. It shapes the schema, the gates, and the kill-switches from day one.

Sanctions and red-flag checks at the gate

Every qualified company is checked against a sanctions list table before any outbound contact. Failures flag the lead with the reason. Outreach is blocked at the channel layer, not at the human-review layer — there is no path where a sanctioned entity gets a message by mistake.

GDPR processor stance, EU data residency

Convex deployment, llm_traces, transcripts, drafts, and message history all stored in EU regions. Subject access requests are a single SQL-style query over crm_leads + activity tables. The processor relationship and DPA are scoped per engagement.

Outreach toggle and channel guardrails

Master outreach toggle per workspace. Per-channel kill-switches. A failed send-condition (channel disconnected, body empty after variable substitution, template missing) falls back to AI-generated copy — never sends junk and never silently fails.

Audit trail on every send, reply, and call

Each outbound message creates a tool_calls row, an llm_traces row, and a messages row. Each Vapi call creates a voice_calls row with the full transcript. Each lead has one chronological activity timeline that is the source of truth for any compliance question.

Reply classification before any auto-response

Inbound messages are classified (interested / objection / question / unsubscribe) before any draft is generated. Unsubscribes propagate across all channels for that lead and the platform refuses to draft against them.

Client owns the deployment and the keys

Marlowe&apos;s Convex project, Marlowe&apos;s API keys for every model and channel, Marlowe&apos;s repo. Scale me AI delivers the codebase and the design pattern. There is no vendor lock-in to switch off.

If you're selling into regulated verticals — payments, gambling, crypto, financial services — this is the layer most "AI SDR" pitches gloss over. It's also the layer that makes or breaks whether you can run the platform at all.

Outcomes

A half-FTE per seller, given back to closing

Framing note: every number below is approximate and anonymized at Marlowe's request. Some are baseline figures from the internal team's own time tracking. Others are early post-launch estimates. Where uncertain, we hedge — and exact figures vary materially with the discovery brief and the vertical.

MetricBefore (manual)After (automated)
Hours per rep, per week, on busywork~18 to 24 (a half-FTE)~3 to 5 (exception review only)
Time to profile + verdict per prospect30+ minutes manualSeconds (parallel discover + research)
Qualified leads surfaced per week~40 to 60 manual~210 with traceable verdicts
Channels orchestrated per lead1 to 2, fragmented across appsUp to 4 (email, Telegram, WhatsApp, voice), one thread
Single-judge errors caught by dual-LLMWhatever the one model said~10 to 15% of single-judge approvals flipped or flagged
Disagreements surfaced for human reviewHidden inside one model&apos;s confidenceExplicit, side-by-side, with risk + alt-company flags
Audit completeness per send / callSpreadsheet + screenshots100% logged with replayable LLM traces
Time to a working pipeline (Phase 1)~3 weeks on n8n

The headline metric is hours-given-back. The most underrated metric is the audit trail. Every PURSUE / PASS verdict, every send, every Vapi call, every reply classification has a replayable LLM trace attached. When leadership asks "why did we contact this company" or "why did we miss this one", the answer is a click away — not a Slack thread reconstruction.

And once the team trusted the platform, the second-order effects showed up. Reps stopped triaging the inbox. Leadership stopped asking for status updates. The pipeline started moving on its own and the humans went back to closing the deals the AI surfaced.

Anonymized under NDA

Anonymized client testimonial

We had five products and four sales channels and zero off-the-shelf tools that would touch our verticals. Scale me AI shipped an n8n version in three weeks that started filling the pipeline on day one, then rebuilt the whole thing as a custom Convex application that the team uses every day. The reps stopped doing busywork. The pipeline started moving on its own.

Head of Growth, Marlowe Payments

Real role, paraphrased wording, identity withheld

Under the hood

What we built it on

LayerTool / vendorWhy
Workflow engine, Phase 1n8n (self-hosted, EU)Visual, fast to iterate, self-hostable for compliance, validated the pipeline on real prospects in week one
Frontend, Phase 2React 19 · Vite 7 · TypeScript 5.7 strict · TanStack Router · Tailwind v4 · Shadcn UIStrict-typed, reactive UI. ai-sdk/react streams generation traces directly into the screen. ~16 routes, ~212 .tsx files organised by feature.
Backend, Phase 2Convex (database + functions + auth + scheduler + cron + file storage)One platform. Reactive subscriptions remove polling. Argument-validated mutations. Zod schemas reused server-side. ~212 TypeScript files in /convex; 38 tables.
Forms + validationreact-hook-form + ZodSame Zod schemas reused as Convex argument validators — one source of truth across client and server.
Orchestrator LLMAnthropic Claude (Opus / Sonnet, cache-control: ephemeral)Long-context reasoning for run planning. Prompt caching keeps the orchestrator cheap when the tool catalogue is stable.
Judges + drafting + voiceOpenAI GPT-4o + GPT-4o-mini (Vapi-compatible)Structured judgement output, fast drafting, native compatibility with the Vapi voice agent.
Second judge + ICP scoringGoogle Gemini (1.5-pro for verdicts, Flash for speed)Second independent verdict for the dual-LLM judge layer. Cheap pre-filtering at scale.
Research subagentPerplexity (sonar / sonar-pro with citation tracking)Live-web grounded research with traceable citations — feeds into research_agent subagent.
Scraping + reading + LinkedInFirecrawl · Jina · Bright Data · SerperStateless API calls for site crawl, page reading, LinkedIn lookup, and Google SERP. firecrawl_cache table avoids re-scraping.
ChannelsGmail OAuth · Telegram Bot API + QR · WhatsApp Business API · Vapi (voice) · Resend (email)All wired into Convex actions + http endpoints. Webhook close-out for async voice calls.
AI infrastructure (Convex addons)@convex-dev/agent · @convex-dev/rag · @convex-dev/auth · @convex-dev/workpoolAgentic workflows + threads, knowledge-base retrieval, Resend OTP + sessions, concurrency caps on parallel work.
Hosting / data residencyEU regionsGDPR by design. Convex deployment region matches Marlowe&apos;s data residency requirement.

We don't fuse a stack onto a client because we have a partner deal. We pick the tools that fit the verticals, the data residency rules, the model strengths, and the cost curve. If your stack is different, the design pattern still holds. The tools change. The orchestrator → tools → subagents → tools structure doesn't.

Applicability

This pattern works for any B2B sales team the big CRMs don't serve well

If you recognize any of the following, the architecture in this case study transfers directly:

  • Your reps are spending half their week on manual prospect research, profile-building, and channel-switching instead of talking to buyers
  • You sell into verticals the big CRMs won't take — payments, gambling, crypto, financial services, regulated SaaS, MGA-style brokerage
  • You run multi-channel outbound (email + at least one of Telegram, WhatsApp, LinkedIn DM, voice) and the threads live in different tools
  • You have multiple products that share a sales motion and the team is constantly reinventing the same email three different ways
  • You've been pitched a generic AI SDR platform that's a black box and refuses to integrate with your stack on real terms
  • You need a real audit trail per send and per call because your regulator, your board, or your own team will eventually ask

The orchestrator → tools → subagents → tools architecture is portable. The dual-LLM judge pattern is portable. The phase-1-then-phase-2 sequence is portable. Convex isn't the only backend that supports this — it's the one that fit Marlowe.

What changes per engagement is the verticals, the channel mix, the compliance surface, and the existing systems we have to integrate with. We scope around those four variables on the first call.

Want this for your sales department?

Book a 30-minute discovery call. We'll map your verticals, your channel mix, your existing stack, and what a Phase 1 in n8n would actually look like for you — before you commit to anything.

See how we work →

Related

Lead generation automation

Lead Generation Automation

The discover, research, qualify, and judge stages — the half of the platform that fills the pipeline with leads worth a rep&apos;s time. Includes the dual-LLM judge pattern and the synthesis layer.

See lead generation automation

AI voice agents

AI Voice Agents

The outbound voice layer — Vapi + GPT-4o-mini, async runtime, webhook close-out, transcript writeback into the lead profile. Useful as a follow-up channel when email and chat have stalled.

See AI voice agents

Workflow automation

Workflow Automation

The Phase 1 n8n proof-of-concept and the per-lead cadence engine. Smart follow-ups, kill-switches, send-conditions, and the unified inbox that converges every channel into one thread per lead.

See workflow automation

AI integration build

AI Integration Services

The Phase 2 custom Convex application. When n8n or Make goes wide enough but not deep enough, we build the production app — typed schema, replayable LLM traces, role-based access, full audit log.

See AI integration services

Sales is the most leveraged work in your business. Stop letting it be busywork.

A 30-minute call. We'll tell you whether a Phase 1 in n8n is the right move for you, what it would cost, and whether the architecture in this case study transfers to your verticals — before you commit to anything.

See case studies →