Building Automation Pipelines with Claude API

Most of what gets written about the Claude API is either a ten-line 'hello world' example or an ambitious architecture diagram that leaves out all the messy bits in the middle. This is the messy bits version — practical patterns I've actually used to build pipelines that automate real enterprise workflows.

Sumit Mohanty
Staff TAM, Twilio · APAC

I got interested in Claude API after using it as a customer for a while. I was doing something embarrassingly manual: summarising call notes after customer meetings, then copying key points into our CRM, then drafting follow-up emails. Three separate tasks, all requiring roughly the same information, all taking time I didn’t have.

The first pipeline I built automated all three. It wasn’t clever. It wasn’t sophisticated. But it saved me probably 45 minutes a day, and once it worked, I started seeing the same pattern everywhere.

This is what I want to walk through: not AI theory, but the practical approach to turning a repetitive manual task into something that runs itself.

The mental model that makes everything simpler

Before writing any code, I find it useful to think about what an automation pipeline actually is at its core: structured input → language model reasoning → structured output → action.

The input could be text, a transcript, a JSON blob from an API, an email. The reasoning is what Claude does — understanding, extracting, summarising, classifying, drafting. The output needs to be in a shape your downstream system can use. And the action is whatever happens next: a CRM update, a Slack message, a WhatsApp notification, a database write.

Most failed automation projects I’ve seen break down at the “structured output” step. People get Claude to generate a nice-looking paragraph, then try to parse it with string matching. This works until it doesn’t, and debugging it is painful. The fix is simple: ask Claude to return JSON from the start. Structured outputs, either through system prompt instructions or Anthropic’s native structured output features, make your pipeline robust.

Example 1: Call summary + CRM update pipeline

This was the first one I built. After every customer call, I paste the meeting notes or transcript into a simple web interface, and the pipeline does three things: produces a 3-bullet summary, extracts action items with owners and due dates, and drafts a follow-up email.

Here’s the core of it:

# call_summary.py
import anthropic
import json

client = anthropic.Anthropic()

def process_call_notes(raw_notes: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        system="""You are a technical account manager assistant.
Given raw meeting notes, return a JSON object with:
- summary: array of exactly 3 bullet strings (key discussion points)
- action_items: array of objects with keys: task, owner, due_date
- follow_up_email: a brief, professional follow-up email (plain text)

Return only valid JSON, no explanation.""",
        messages=[{
            "role": "user",
            "content": f"Meeting notes:\n\n{raw_notes}"
        }]
    )
    return json.loads(response.content[0].text)

# Usage
notes = """
Spoke with Priya at 2pm. She's concerned about their
WhatsApp delivery rates dropping in the last week.
Engineering team thinks it might be template approval
delays. I need to check with our WA team by Thursday.
Also, her VP wants a QBR in June — I should schedule that.
"""

result = process_call_notes(notes)
print(result["follow_up_email"])

The output is always valid JSON I can work with. The action items get written to a Google Sheet via the Sheets API. The follow-up email drops into a draft in Gmail. The summary goes into the customer’s CRM record. None of this required anything exotic — just the Claude API and a few standard integrations.

Example 2: Support ticket triage and routing

The second pipeline came from a conversation with a customer who was drowning in support volume. Their tier-one team was spending a huge amount of time just reading tickets and deciding which queue they belonged to — a task that required enough domain knowledge to be non-trivial, but not enough to justify senior engineer time.

We built a classifier that reads incoming ticket text, assigns a category, estimates urgency on a 1–3 scale, extracts any account identifiers mentioned, and suggests the first two questions an agent should ask. The whole thing runs on webhook — ticket comes in, Claude processes it, output writes back to the ticketing system before a human sees it.

# ticket_triage.py
def triage_ticket(ticket_text: str, product_context: str) -> dict:
    response = client.messages.create(
        model="claude-haiku-4-5-20251001",  # Haiku for speed + cost
        max_tokens=512,
        system=f"""You are a support triage assistant for {product_context}.

Classify this support ticket and return JSON:
{{
  "category": "billing|technical|account|feature_request|other",
  "urgency": 1-3,
  "account_ids": ["list of any account/customer IDs mentioned"],
  "suggested_questions": ["question 1", "question 2"],
  "one_line_summary": "brief description for agent queue view"
}}""",
        messages=[{
            "role": "user",
            "content": ticket_text
        }]
    )
    return json.loads(response.content[0].text)

Note the model choice here: claude-haiku-4-5-20251001 rather than Sonnet. For classification tasks where the reasoning required is bounded and the output is short, Haiku is meaningfully faster and cheaper. For the summarisation pipeline in Example 1, where nuanced writing matters, Sonnet earns its cost. This distinction matters at scale — picking the right model for the task is part of good pipeline design.

Example 3: Multi-step pipeline with external data

The most useful pipelines I’ve built aren’t single Claude calls — they’re chains where the output of one step feeds into the next, with external data pulled in along the way.

Here’s a simplified version of a customer health monitoring pipeline. It runs nightly, pulls usage data from Twilio’s APIs, cross-references it with account notes, and produces a risk flag and suggested action for any account showing anomalous behaviour.

# account_health.py
def assess_account_health(account_id: str) -> dict:

    # Step 1 — pull usage metrics from your platform API
    usage = get_usage_metrics(account_id, days=30)

    # Step 2 — pull last 3 CRM notes for context
    notes = get_crm_notes(account_id, limit=3)

    # Step 3 — ask Claude to reason about the combination
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=768,
        system="""You are a customer health analyst.
Given usage metrics and recent account notes, assess account health.
Return JSON:
{
  "health_score": 1-10,
  "risk_level": "low|medium|high",
  "risk_reason": "one sentence explanation",
  "recommended_action": "specific next step for the TAM",
  "urgency": "this_week|this_month|monitor"
}""",
        messages=[{
            "role": "user",
            "content": f"""Account: {account_id}

Usage (last 30 days):
{json.dumps(usage, indent=2)}

Recent account notes:
{chr(10).join(notes)}"""
        }]
    )

    result = json.loads(response.content[0].text)
    result["account_id"] = account_id
    return result

The accounts that come back with risk_level: "high" and urgency: "this_week" go into a prioritised list that I review every morning. Everything else gets checked weekly. Claude isn’t replacing my judgement here — it’s doing the first-pass pattern recognition so my judgement goes to where it’s most needed.

A few things I’ve learned the hard way

Always validate and handle JSON parse errors. Even with the best system prompts, occasionally Claude returns something that’s technically invalid JSON — maybe a trailing comma, maybe some explanatory text crept in. Wrap every json.loads() in a try/except and log the raw response when it fails. You’ll want that data.

Build in a human review layer for anything consequential. My call summary pipeline doesn’t send the follow-up email automatically. It drops it into Gmail drafts. I spend thirty seconds reviewing it before I hit send. For workflows where the output goes directly to customers, this kind of checkpoint is worth the friction.

Prompt caching is worth understanding. For pipelines that run the same system prompt repeatedly against different inputs, Anthropic’s prompt caching can meaningfully reduce both latency and cost. If your system prompt is long and static, it’s worth reading the caching documentation before you’re running a few thousand calls a day.

Start boring and iterate. Every pipeline I’ve kept running long-term started as the simplest possible version. One input type, one output, minimal error handling. Ship it, use it, find the rough edges. The temptation to build the complete system before using any of it is strong and usually counterproductive. The real requirements emerge from real usage.

The automation that stays running is the one you built to solve an actual problem you had, not the one you built to demonstrate that automation was possible.

The Claude API is a genuinely useful tool for this kind of work. The abstractions are sensible, the documentation is clear, and the models are capable enough that you don’t spend most of your time fighting the AI. The interesting engineering is in the plumbing around it — the data sources, the output handling, the integrations — and that’s true for any language model integration.

If you’re thinking about building something similar and want to talk through the approach, my contact details are on the homepage. Happy to share what’s worked and what hasn’t.