Hard-E: Technical Blueprint

← Return to Main Site

Hard-E is a production-deployed, multi-agent AI system designed to function as an autonomous "Sales Director Clone" for the residential contracting industry. This document provides a high-level overview of the system's architecture, key engineering decisions, and the technical solutions developed to overcome real-world challenges in building an AI employee.

Core Philosophy: From Assistant to Executor

The fundamental design goal was to create an AI that doesn't just answer questions but actively executes complex, multi-step business workflows. This required moving beyond a simple chatbot model to a sophisticated system of coordinated, specialized AI agents capable of state management, autonomous operation, and deep integration with third-party services.

Key Insight: A single, monolithic AI cannot efficiently manage disparate tasks like CRM updates, video script generation, and pricing calculations. The solution required a distributed-cognition model where specialized agents handle specific domains, orchestrated by an intelligent routing layer.

Architecture: The Hybrid Processing Model

To balance user-perceived responsiveness with the need for robust state management, Hard-E employs a dual-flow architecture that intelligently routes requests based on their complexity.

1. Manual Streaming Flow (for Speed)

Used For: Simple, stateless queries like knowledge base lookups, CRM data retrieval, and web searches.
Mechanism: Direct OpenAI API calls with stream=True.
Advantage: Delivers sub-second, word-by-word streaming responses, providing an immediate sense of interactivity.

2. SDK-Based Flow (for Complexity)

Used For: Complex, multi-turn workflows like conversational script generation and new customer creation.
Mechanism: The OpenAI Agents SDK, which manages agent handoffs and tool calls.
Advantage: Enables robust, stateful conversations that preserve context across multiple user interactions.

The Multi-Agent System

Hard-E's intelligence is distributed across several specialized agents. A central TriageAgent analyzes user intent and conversation history to route tasks to the appropriate expert.

LeapInteractionAgent: The CRM specialist. Executes all interactions with the Leap CRM, including the use of reverse-engineered undocumented API endpoints for critical functions like creating job notes with @mention notifications.
KnowledgeAgent: The internal expert. Synthesizes answers grounded in company-specific documents stored in AWS S3, including general training, project-specific knowledge, and meeting summaries.
ScriptingAgent: The creative writer. Manages the stateful, multi-turn process of generating personalized proposal and primer video scripts, pulling context from the CRM and inspiration from 90+ distilled historical transcripts.
NewClientsAgent: The onboarding specialist. Uses an LLM-powered pipeline to extract structured customer data from natural language, handling speech artifacts and allowing users to "skip" optional fields. Achieves over 95% accuracy in single-pass voice-to-CRM entry.
WebSearchAgent: The external researcher. Leverages the Perplexity API to fetch up-to-date product specifications, warranty information, and other external data.
AnalysisAgent: The data scientist. Analyzes patterns across historical transcripts to answer questions about sales trends and script structures.

Key Engineering Challenge: State Management

A core challenge was that the OpenAI Agents SDK is stateless by default. To enable multi-turn conversations, a custom state management solution was engineered.

Solution: Context-Aware Dataclasses

Python dataclasses like ScriptDraftState and NewClientState are used to model the required information for a workflow. This state object is maintained in the user's session and passed into the SDK's Runner.run() method via the context parameter, making it accessible to all tools. This pattern allows an agent to remember what it has already collected and only ask for what's missing.

Autonomous Operations: The Primer Video Pipeline

The system's most advanced capability is its fully autonomous primer video workflow, which demonstrates the shift from reactive assistant to proactive executor.

Trigger & Orchestration

An email listener daemon constantly monitors a dedicated inbox. When a CRM automation sends a "Please produce Primer Video" email, the master workflow is triggered. This decoupled architecture allows the process to start whether a job stage was changed by Hard-E or a human.

The 9-Step Automated Workflow:

Fetch Context: Pulls all relevant customer and job data from Leap CRM.
Generate Script: Creates a personalized 1-2 minute primer script.
Generate Audio: A two-stage audio pipeline uses OpenAI TTS for a consistent base, then enhances it with ElevenLabs Speech-to-Speech for superior quality.
Render Video: Uploads the audio to Cloudinary, which overlays it onto a base video template with background music.
Upload to Delivery: The final MP4 is uploaded to a shared Google Drive folder.
Notify Internally: Adds a note with the video link to the job in Leap CRM.
Wait for Processing: A deliberate 15-minute delay allows Google Drive to finish processing the video.
Deliver to Client: Sends a personalized email directly to the client with the playable video link.
Cleanup: A try/except/finally block ensures all temporary files and cloud assets are deleted, regardless of success or failure.

Technology & Integration Landscape

Hard-E orchestrates a wide array of modern cloud services and APIs to function.

AWS EC2 (t3.small) AWS S3 Python 3.9 Streamlit Nginx OpenAI Agents SDK GPT-4.1 & GPT-4.1-mini Whisper & OpenAI TTS ElevenLabs S2S Perplexity API Leap CRM API Cloudinary Google Drive API Gmail IMAP

Future Vision: v3.0 Architecture

While the current Streamlit-based application is fully functional, the next generation of Hard-E is being planned to enable true real-time voice and horizontal scalability.

The v3.0 Stack: The future architecture will migrate to a React frontend, a high-performance FastAPI backend, and Redis for distributed state management. This will enable multi-user support, webhook-based voice integration, and the ability to scale beyond a single server instance.

← Return to Main Site