Hard-E is a production-deployed AI application that clones the knowledge and capabilities of a Sales Director for a contracting company. This document outlines the technical decisions, architectural trade-offs, and engineering challenges involved in building a real-world agentic system.
The Core Problem
Sales teams need instant access to company knowledge, CRM data, and the ability to generate personalized content—but they need it to feel conversational and natural, not robotic. The challenge wasn't just connecting APIs; it was building a system that could handle complex, multi-turn conversations while managing state across specialized tasks like script generation and customer creation.
Architecture Overview: Hybrid Orchestration
The application uses a dual-flow architecture that routes requests based on complexity:
1. Manual Streaming Flow (Simple Queries)
- Use case: Knowledge base lookups, CRM queries, web searches
- Implementation: Direct OpenAI Chat Completions API calls with
stream=True
- Why: Provides real-time, word-by-word text streaming for immediate user feedback
- Trade-off: Sacrifices the SDK's built-in state management for perceived responsiveness
2. SDK-Based Flow (Complex Workflows)
- Use case: Multi-turn script generation, customer creation with validation
- Implementation: OpenAI Agents SDK (v0.0.17) with custom state management via dataclasses
- Why: Enables robust, stateful conversations that collect and validate information across multiple turns
- Trade-off: Slightly less responsive text display, but maintains conversation context perfectly
main()
analyzes user input and session state to determine which flow handles the request. Keywords like "create script" or "add customer" trigger the SDK flow; everything else uses manual streaming.
State Management Strategy
One of the hardest problems was maintaining conversation state across multiple turns. The OpenAI Agents SDK isn't stateful by default—each Runner.run()
call is independent.
The Solution: Context-Aware Dataclasses
Created custom state containers (ScriptDraftState
, NewClientState
) stored in Streamlit's session state and passed to the SDK via the context
parameter:
- ScriptDraftState: Tracks client name, project type, focus material, job ID across script generation conversation
- NewClientState: Manages customer field collection with validation, speech artifact handling, and "skip" functionality
- Access pattern: Tools receive state via
ctx.context
and can read/modify it directly
This approach enabled multi-turn conversations where the agent remembers what it's already collected and asks for only missing information.
Multi-Agent System Design
Hard-E uses seven specialized agents, each with focused responsibilities:
- TriageAgent: Routes queries to the appropriate specialist (no direct answers)
- LeapInteractionAgent: Handles all CRM operations (fetch jobs, add notes, update stages)
- KnowledgeAgent: Queries multiple S3 knowledge bases (training docs, project-specific files)
- AnalysisAgent: Analyzes patterns across 90+ distilled transcript JSONs
- WebSearchAgent_SDK: Performs external research via Perplexity API
- ScriptingAgent: Manages stateful script generation (standard and primer videos)
- NewClientsAgent: Handles customer creation with LLM-powered field extraction
Why This Matters
Each agent has a narrow scope, making them easier to debug, update, and reason about. When script generation broke, I only had to look at ScriptingAgent
logic—not the entire system.
Technical Challenges & Solutions
Challenge 1: Natural Language Entity Extraction
Problem: Voice input produced speech artifacts like "emily at demo dot com" instead of "emily@demo.com"
Solution: Built an LLM-powered extraction pipeline using GPT-4.1-mini with function calling:
- Advanced regex for spaced ("at") and fused ("emilyatdemo") email patterns
- Synonym mapping (user says "street" → system maps to "address")
- State code validation (MA → Massachusetts ID 21, not Missouri ID 25)
- Achieved 95%+ success rate for single-pass voice customer creation
Challenge 2: Package Version Compatibility
Problem: Building a streaming version required different package versions than production, leading to import errors
Solution: Implemented complete environment isolation:
- Separate project directories (
hard-e-production/
,hard-e-streaming/
) - Independent virtual environments with version-specific requirements
- Production:
openai-agents==0.0.4
| Streaming:openai-agents>=0.0.17,<0.1
- Dual systemd services managing lifecycle independently
Challenge 3: Virtual Environment Path Hardcoding
Problem: Moving directories broke virtual environments—executables had hardcoded shebangs pointing to old paths
Solution: Used python3.9 -m venv --upgrade .venv
to rewrite all executable paths in place, preserving installed packages
Production Infrastructure
Hard-E runs on AWS EC2 (t3.small) with a robust deployment setup:
Dual-Domain Architecture
- Production:
agent.harde.app
(port 8501) - Streaming:
streaming.harde.app
(port 8504) - SSL: Single Let's Encrypt certificate covering both domains
Infrastructure Components
- Nginx: Reverse proxy with dual server blocks, WebSocket support for Streamlit
- Systemd: Independent services (
harde-streamlit
,harde-streaming
) with auto-restart - Security: HTTP Basic Auth, HTTPS enforcement, restricted API key storage
- Process Management: Each version runs persistently, isolated from terminal sessions
Why This Setup Works
The dual-environment approach allows safe development and testing of the streaming version without touching production. If streaming crashes, production stays online. Both versions can be updated, restarted, and monitored independently.
API Integration Complexity
Hard-E orchestrates six external APIs simultaneously:
Each integration required:
- Error handling for rate limits, timeouts, and malformed responses
- Retry logic with exponential backoff
- Response validation and schema enforcement
- Secure credential management via
secrets.toml
Data Pipeline Architecture
Hard-E pulls from multiple data sources with different access patterns:
S3 Knowledge Bases
training/
: General sales process documentationYarmouth/
: Project-specific knowledge (isolated routing)distilled_transcripts/
: 90 structured JSON summaries for analysis
Real-Time CRM Access
- Live job status, customer addresses, notes via Leap API
- Write-back capabilities for adding notes and updating job stages
Web Search Fallback
- Perplexity API for product specs, warranty info, competitive intelligence
- Triggered when internal data sources don't contain answers
What I Learned
1. State Management is Hard
Multi-turn conversations require explicit state tracking. The SDK doesn't automatically remember context between calls—you have to architect for it from the start.
2. Rapid AI Evolution Requires Version Control
Package versions matter enormously. A minor OpenAI Agents SDK update broke production because it expected response types that didn't exist yet. Pinning versions and testing thoroughly before upgrades is non-negotiable.
3. Production Deployment is a Different Beast
Getting it to work locally is 30% of the job. The other 70% is systemd services, Nginx configs, SSL certificates, security hardening, and ensuring it stays running when you close your terminal.
4. User Experience Drives Architecture
The hybrid streaming model exists because users found the SDK's batched responses too slow for simple queries. Technical purity took a back seat to perceived responsiveness.
Current Status & Next Steps
Hard-E is production-ready with active dual-domain deployment. The system handles:
- Multi-turn script generation (standard + primer videos)
- Voice-to-text customer creation with validation
- Real-time CRM operations
- Knowledge base queries across multiple sources
- Web search for external information
Planned enhancements include image analysis for job site photos, automated primer video creation via Shotstack API, and query logging for usage analytics.