Voice transcripts to structured data with Claude tool use
Zachary Jackson
Why this comes up constantly
Voice agents are everywhere now. Every business with a phone line is staring at a pile of transcripts and asking some version of the same question: how do I turn this into something useful?
Useful means structured. The transcript says "yeah, the customer wants the exclusionary plan for a 2019 Range Rover, thirty-six months, called back Tuesday at two." The CRM wants { coveragePlan: "EXCLUSIONARY", vehicle: { make: "Land Rover", model: "Range Rover", year: 2019 }, termMonths: 36, callbackAt: "2026-05-21T14:00:00" }. Getting from A to B reliably is the entire problem.
Most teams reach for regex, then for JSON-mode prompting, then for "well, we'll have a human review it." Each is worse than the next. There's a better primitive.
What's wrong with the obvious approaches
Regex breaks the moment the customer phrases something differently. "Twenty-nineteen Rover," "the '19 Range," "a Range Rover from 2019" — three patterns, three regexes, and you still missed the one where they said "the same one we talked about last week."
JSON-mode prompting ("respond only with a JSON object matching this shape...") works most of the time. Most of the time is the worst possible reliability profile for a backend job — failure is rare enough that you don't catch it in testing, common enough that it shows up in production at a rate that makes downstream systems unhappy. The failure modes are colorful: missing fields, extra fields, fields with the right name but the wrong type, JSON wrapped in markdown, JSON with a trailing comment explaining the model's reasoning.
Human review is what teams default to when the above two fail. It works until the volume of calls makes it unaffordable.
Tool use enforces the schema at decode time
Claude's tool use is the right primitive here. When you call the API with a tool definition, the model isn't being asked to respond in a schema — it's being constrained to. The tool's input_schema shapes the decoding directly. The model literally cannot return a tool call that doesn't fit the schema you specified.
That means:
- No JSON parsing or repair logic
- No "I asked for an integer and got 'thirty-six'" surprises
- No markdown wrappers, no trailing prose
- Validation failures become Zod failures inside your application, not edge cases in the model's prose
import { z } from 'zod';
import { extract } from 'voice-transcript-extractor';
const Lead = z.object({
name: z.string(),
email: z.string().email().optional(),
callbackAt: z.string().optional(),
budget: z.enum(['under_10k', '10k_50k', '50k_plus']).optional(),
});
const { data } = await extract({
schema: Lead,
description: 'Extract the customer contact details and callback time from this discovery call.',
transcript,
});
// `data` is typed as z.infer<typeof Lead> — guaranteed to match the schema or throw.A few things this gets right that JSON-mode doesn't
Optional fields stay optional. The Zod schema marks a field as optional, the tool schema reflects it, and the model is free to omit it when the transcript doesn't support a high-confidence value. JSON-mode prompts that say "if you don't know, return null" tend to leak hallucinations because the model is trained to produce a useful answer.
Enums are real constraints. z.enum(['under_10k', '10k_50k', '50k_plus']) translates to a tool schema constraint that prevents the model from inventing a fourth bucket. JSON-mode prompts that list "valid values" treat them as suggestions.
Nested shapes pass through cleanly. A vehicle: { make, model, year } object stays nested. JSON-mode tends to flatten or shuffle nested shapes under pressure.
Pairing tool use with a system prompt that says "say less"
The schema is half of the work. The other half is teaching the model to be conservative. The system prompt that ships with the extractor leads with:
Be precise and conservative — only populate fields you can support directly from the transcript. Omit any field you cannot infer with high confidence.
Without this hint, models will fill in fields with plausible-sounding answers because that's what they're trained to do. With this hint, the omission rate goes up — which is what you want when the downstream consumer is a CRM that prefers null to "probably_smith".
Open-source library
The full extractor — including the Zod-to-tool-schema translation, the conservative system prompt, the timestamp-aware transcript formatter, and tests — is open-sourced as voice-transcript-extractor. It works against any Zod schema, any model in the Claude lineup (defaults to claude-haiku-4-5 for speed/cost), and accepts either a raw transcript string or a turn-by-turn array.
The takeaway
If you're turning transcripts into structured records and not using tool use yet, you're paying for it in either reliability or in the time someone spends fixing extractions. Schema-enforced output isn't a fancier prompt — it's a fundamentally different primitive. Use it.
Is your business ready for AI agents?
Take our 2-minute assessment. Get a personalized readiness score and specific recommendations for where AI can have the most impact on your operations.