Hard-E: Technical Architecture

A Deep Dive into Multi-Agent Systems Engineering

← Back to Overview

Hard-E is a production-deployed AI application that clones the knowledge and capabilities of a Sales Director for a contracting company. This document outlines the technical decisions, architectural trade-offs, and engineering challenges involved in building a real-world agentic system.

The Core Problem

Sales teams need instant access to company knowledge, CRM data, and the ability to generate personalized content—but they need it to feel conversational and natural, not robotic. The challenge wasn't just connecting APIs; it was building a system that could handle complex, multi-turn conversations while managing state across specialized tasks like script generation and customer creation.

Key Insight: A single monolithic chatbot couldn't handle this complexity. The solution required a multi-agent architecture where specialized AI agents handle specific domains, coordinated by an intelligent routing system.

Architecture Overview: Hybrid Orchestration

The application uses a dual-flow architecture that routes requests based on complexity:

1. Manual Streaming Flow (Simple Queries)

2. SDK-Based Flow (Complex Workflows)

The Routing Decision: A keyword-based classifier in main() analyzes user input and session state to determine which flow handles the request. Keywords like "create script" or "add customer" trigger the SDK flow; everything else uses manual streaming.

State Management Strategy

One of the hardest problems was maintaining conversation state across multiple turns. The OpenAI Agents SDK isn't stateful by default—each Runner.run() call is independent.

The Solution: Context-Aware Dataclasses

Created custom state containers (ScriptDraftState, NewClientState) stored in Streamlit's session state and passed to the SDK via the context parameter:

This approach enabled multi-turn conversations where the agent remembers what it's already collected and asks for only missing information.

Multi-Agent System Design

Hard-E uses seven specialized agents, each with focused responsibilities:

  1. TriageAgent: Routes queries to the appropriate specialist (no direct answers)
  2. LeapInteractionAgent: Handles all CRM operations (fetch jobs, add notes, update stages)
  3. KnowledgeAgent: Queries multiple S3 knowledge bases (training docs, project-specific files)
  4. AnalysisAgent: Analyzes patterns across 90+ distilled transcript JSONs
  5. WebSearchAgent_SDK: Performs external research via Perplexity API
  6. ScriptingAgent: Manages stateful script generation (standard and primer videos)
  7. NewClientsAgent: Handles customer creation with LLM-powered field extraction

Why This Matters

Each agent has a narrow scope, making them easier to debug, update, and reason about. When script generation broke, I only had to look at ScriptingAgent logic—not the entire system.

Technical Challenges & Solutions

Challenge 1: Natural Language Entity Extraction

Problem: Voice input produced speech artifacts like "emily at demo dot com" instead of "emily@demo.com"

Solution: Built an LLM-powered extraction pipeline using GPT-4.1-mini with function calling:

Challenge 2: Package Version Compatibility

Problem: Building a streaming version required different package versions than production, leading to import errors

Solution: Implemented complete environment isolation:

Challenge 3: Virtual Environment Path Hardcoding

Problem: Moving directories broke virtual environments—executables had hardcoded shebangs pointing to old paths

Solution: Used python3.9 -m venv --upgrade .venv to rewrite all executable paths in place, preserving installed packages

Production Infrastructure

Hard-E runs on AWS EC2 (t3.small) with a robust deployment setup:

Dual-Domain Architecture

Infrastructure Components

Why This Setup Works

The dual-environment approach allows safe development and testing of the streaming version without touching production. If streaming crashes, production stays online. Both versions can be updated, restarted, and monitored independently.

API Integration Complexity

Hard-E orchestrates six external APIs simultaneously:

OpenAI (GPT-4.1, Whisper, TTS) xAI Grok (Response Refinement) Perplexity (Web Search) Leap CRM (Customer/Job Data) AWS S3 (Knowledge Bases) ElevenLabs (Voice Synthesis)

Each integration required:

Data Pipeline Architecture

Hard-E pulls from multiple data sources with different access patterns:

S3 Knowledge Bases

Real-Time CRM Access

Web Search Fallback

What I Learned

1. State Management is Hard

Multi-turn conversations require explicit state tracking. The SDK doesn't automatically remember context between calls—you have to architect for it from the start.

2. Rapid AI Evolution Requires Version Control

Package versions matter enormously. A minor OpenAI Agents SDK update broke production because it expected response types that didn't exist yet. Pinning versions and testing thoroughly before upgrades is non-negotiable.

3. Production Deployment is a Different Beast

Getting it to work locally is 30% of the job. The other 70% is systemd services, Nginx configs, SSL certificates, security hardening, and ensuring it stays running when you close your terminal.

4. User Experience Drives Architecture

The hybrid streaming model exists because users found the SDK's batched responses too slow for simple queries. Technical purity took a back seat to perceived responsiveness.

Current Status & Next Steps

Hard-E is production-ready with active dual-domain deployment. The system handles:

Planned enhancements include image analysis for job site photos, automated primer video creation via Shotstack API, and query logging for usage analytics.

Bottom Line: Hard-E demonstrates full-stack AI application development—from prompt engineering and state management to production infrastructure and API orchestration. It's not just connecting APIs; it's architecting a system that solves real business problems at scale.
← Back to Overview