Tote Taxi - LangGraph
Published on February 16, 2026
I Built an AI Chat Agent That Converts Conversations Into Bookings — In 3 Days
I just shipped a customer-facing AI chat agent for ToteTaxi, a luxury moving and logistics platform I built for a client serving the NYC tri-state area and the Hamptons. The agent doesn't just answer questions — it drives conversion. A customer says "I need to move a couch from Manhattan to Montauk on Saturday," and the agent checks service coverage, estimates pricing, verifies availability, and hands off to the booking wizard with 21 fields pre-filled. The customer reviews and pays.
Three days from first commit to production. 42 new tests. Six production bugs that only existed under the concurrency model I'd never used before. A security audit that found a critical vulnerability no standard tool would have caught. Here's what I learned.
Why Build It
ToteTaxi handles Mini Moves, Standard Delivery, Specialty Items (Pelotons, surfboards, cribs), and Airport Transfers. Customers have questions before they commit: Do you serve my zip code? How much for a Peloton delivery? Can I book this Saturday? What's the surcharge for Montauk?
The existing options were call, email, or bounce. A chat agent that answers instantly removes friction. But the real value isn't Q&A — it's the handoff. The agent gathers details through natural conversation, builds a complete booking snapshot, and drops a "Start Booking" button in the chat. One click, and the wizard opens with everything pre-filled. That's the conversion path that justifies the feature.
The Architecture
LangGraph + Claude Sonnet + Django SSE + custom React hook. Every piece of this stack was chosen for a specific reason, and the reasons all come back to the same principle: ship fast, keep it simple, maintain it solo.
LangGraph over a raw API loop. The agent needs to decide which tools to call based on conversation context. When a customer changes their mind mid-conversation — "actually, make that a full move instead of standard delivery" — the agent needs to re-invoke the handoff tool with corrected details. LangGraph's ReAct pattern (think, act, observe, think) handles multi-turn tool routing as a state machine. I wrote six tools that wrap existing business logic. No new database queries, just thin read-only wrappers around code that already works.
System prompt over RAG. The entire knowledge base — five service types, three pricing tiers, geographic surcharge rules, FAQs — fits in roughly 3,000 tokens. RAG would mean a vector database, an embedding pipeline, retrieval latency, a chunking strategy, and ongoing maintenance. For zero accuracy improvement. The system prompt approach means knowledge updates when the prompt updates. One file, one truth.
SSE over WebSockets. Server-Sent Events work over standard HTTP. No Django Channels, no ASGI migration, no protocol upgrade complexity. Django's StreamingHttpResponse handles it. The trade-off is one-directional streaming (server to client), which is exactly what chat needs — the client sends a POST, the server streams the response.
Gevent over ASGI. SSE connections hold a worker for the duration of the stream. With synchronous gunicorn, four workers means a maximum of four concurrent chat sessions. One flag change — --worker-class gevent — gives cooperative multitasking. Same four workers, hundreds of concurrent SSE streams. Zero code changes to existing Django views, DRF, Celery, Stripe, Redis. This decision also created the hardest bugs I've ever debugged, but I'll get to that.
Client-sent history over server-side sessions. The original plan was Redis-backed conversation state via LangGraph's checkpointer. Day one killed that — Docker's Redis didn't have the RedisJSON module, and the checkpointer API was unstable across versions. The fix was simple: the frontend sends the full chat history with every request. The backend rebuilds the conversation from it. Thirty-message cap to bound token cost. Slightly larger payloads, zero server-side state. For short customer service conversations, this is the right call.
The Booking Handoff
This is the piece that separates a chatbot from a conversion tool.
The agent's build_booking_handoff tool produces a snapshot with 21 fields: service type, tier, item count, item description, bag count, transfer direction, airport, terminal, flight date, flight time, pickup address, delivery address, date, packing preference, unpacking preference, COI requirements, and special instructions. A "Start Booking" button appears in the chat carrying this snapshot. Click it, and the booking wizard opens with everything pre-filled.
Getting this to work required solving a state management race condition. The booking wizard's authentication step runs initializeForUser() on mount, which resets the Zustand store to defaults. Any data set before auth resolves gets wiped. My first fix — a React useEffect watching the step counter — failed because React batches Zustand state updates within a single effect. The step goes from 0 to 1, but React never sees the intermediate value, so the effect never fires.
The real fix: put the prefill logic inside the store's initializeForUser() function itself. The store resets the data? The store also preserves the prefill during that reset. No React timing dependency, no effect ordering. The merge happens synchronously inside the function that was causing the wipe.
The pattern keeps coming up: when two systems fight over the same state, don't referee from the outside. Put the logic where the conflict lives.
The Gevent Bugs
Six production bugs across two days. All passed 308 backend tests. All worked in local development. All only manifested under gevent's cooperative multitasking model.
The hardest was an infinite SSL recursion that took down every outbound HTTPS call in the application — Onfleet task creation, LangSmith trace uploads, potentially Stripe.
The symptom: "maximum recursion depth exceeded" in requests.post(). My first hypothesis was that the recursion limit was too low for gevent's deeper stacks. I bumped it from 1,000 to 3,000. Some things started working. I bumped it to 10,000. Still failing. That's when it clicked: you can't fix infinite recursion by raising the limit.
The root cause was preload_app = True in the gunicorn configuration. When preload is enabled, the master process imports the entire Django application — including Python's ssl module — before forking worker processes. Then each worker runs gevent.monkey.patch_all(), which replaces standard library functions with gevent-aware versions. But ssl was already imported. Monkey-patching wraps the already-loaded ssl functions with gevent wrappers. The wrappers try to call the "original" functions — which are now the wrappers themselves. Infinite loop.
The fix: preload_app = False. One line. Workers import the application after monkey-patching runs, so ssl is imported into an already-patched environment. No double-wrapping.
The subtlest bug was a variable scoping issue in the frontend SSE parser. let eventType = '' was declared inside the while(true) reader loop, resetting on every reader.read() call. Under Django's dev server, SSE events arrive in large chunks — the event: and data: lines land in the same read. Under gevent, chunks are smaller. When event: tool_result arrives in one read and data: {...} arrives in the next, eventType has already reset to empty. The data line is silently skipped. The booking handoff data vanishes without an error.
Fix: move one variable declaration outside the loop.
The lesson from all six bugs is the same: if your production server uses a different concurrency model than your dev server, your dev environment is lying to you. Tests mock the streaming. The dev server serializes the concurrency. The only place these bugs exist is production.
The Security Audit
Before the chat agent, ToteTaxi had already gone through a full security audit — 36 fixes across six PRs, covering everything from payment amount verification to IDOR protection to error message sanitization. But traditional security tooling doesn't cover the attack surface that LLM-powered agents introduce.
I built a custom security agent — a Claude Code subagent with eight LLM-specific vulnerability categories baked into its system prompt: prompt injection, tool use safety, auth boundary manipulation, data exfiltration through tool outputs, conversation history injection, cost amplification, handoff security, and configuration leakage. Then I ran it alongside two built-in agents (a general code reviewer and an e-commerce security scanner) against the full codebase in parallel.
The custom agent found the single most important vulnerability: an IDOR through LLM-controlled tool arguments. The booking lookup tools accepted user_id as a parameter, and the value came from whatever the LLM decided to pass. The system prompt instructed the model to use the authenticated user's ID, but a prompt injection attack could override that instruction and query another user's bookings. The generic security agents flagged this as medium severity. The LLM-specific agent flagged it as critical — because it understood that an LLM tool argument isn't a validated form field. It's model output that can be manipulated.
Thirty-five unique findings across the codebase. Four critical, eight high. Zero overlap between the custom LLM agent and the built-in agents. Every finding from the custom agent was novel.
The takeaway: when you build an AI feature, you need AI-aware security tooling. A SAST scanner won't find a prompt injection vulnerability. A penetration test might, but it requires LLM-specific expertise. A custom agent with a domain-specific security checklist bridges that gap.
The Observability Stack
LangSmith was already wired into LangGraph — it auto-traces when it detects the environment variables. Every agent invocation produces a trace: the full message chain, each LLM call, tool arguments and results, token counts, latency. Zero code changes to enable it.
I connected LangSmith to my development environment via MCP (Model Context Protocol), which let me query production traces inline while debugging. When the booking handoff data was showing up empty, I could check the LangSmith trace in seconds and confirm the agent was producing correct data. That narrowed the bug from "everything might be broken" to "the data is correct at the agent layer, the problem is in delivery." Five minutes of trace inspection saved hours of guessing.
What I'd Do Differently
Integration tests that run under gevent, not just Django's dev server. Every production bug I hit passed unit tests because the tests don't exercise the concurrency model. A CI step that runs smoke tests under gunicorn with gevent workers would have caught all six bugs before deploy.
I'd also add caching for stable tool results. Zip code coverage doesn't change — there's no reason to hit the database every time someone asks if we serve their area.
The Numbers
- Three days, conception to production
- 42 new backend tests, 308 total, zero regressions
- Six read-only tools wrapping existing business logic
- 21 handoff fields pre-filling the booking wizard
- Six production bugs found and fixed (all gevent-related)
- 35 security findings from the three-agent audit (4 critical, 8 high)
- 36 security fixes already shipped before the agent (6 PRs)
- 180-line custom React streaming hook, zero new frontend dependencies
- One system prompt (~3,000 tokens) replacing an entire RAG pipeline
The chat agent is live at totetaxi.com. The full codebase is on GitHub.
I'm a software engineer building production AI systems. Previously a commercial fisherman. Currently looking for my next role — if you're hiring for AI/ML engineering, I'd love to talk. LinkedIn · Portfolio
Comments
No comments yet. Be the first to comment!