Hard-E: Technical Blueprint

An In-Depth Look at an Autonomous AI Employee Architecture

← Return to Main Site

Hard-E is a production-deployed, multi-agent AI system designed to function as an autonomous "Sales Director Clone" for the residential contracting industry. This document provides a high-level overview of the system's architecture, key engineering decisions, and the technical solutions developed to overcome real-world challenges in building an AI employee.

Core Philosophy: From Assistant to Executor

The fundamental design goal was to create an AI that doesn't just answer questions but actively executes complex, multi-step business workflows. This required moving beyond a simple chatbot model to a sophisticated system of coordinated, specialized AI agents capable of state management, autonomous operation, and deep integration with third-party services.

Key Insight: A single, monolithic AI cannot efficiently manage disparate tasks like CRM updates, video script generation, and pricing calculations. The solution required a distributed-cognition model where specialized agents handle specific domains, orchestrated by an intelligent routing layer.

Architecture: The Hybrid Processing Model

To balance user-perceived responsiveness with the need for robust state management, Hard-E employs a dual-flow architecture that intelligently routes requests based on their complexity.

1. Manual Streaming Flow (for Speed)

2. SDK-Based Flow (for Complexity)

The Multi-Agent System

Hard-E's intelligence is distributed across several specialized agents. A central TriageAgent analyzes user intent and conversation history to route tasks to the appropriate expert.

Key Engineering Challenge: State Management

A core challenge was that the OpenAI Agents SDK is stateless by default. To enable multi-turn conversations, a custom state management solution was engineered.

Solution: Context-Aware Dataclasses

Python dataclasses like ScriptDraftState and NewClientState are used to model the required information for a workflow. This state object is maintained in the user's session and passed into the SDK's Runner.run() method via the context parameter, making it accessible to all tools. This pattern allows an agent to remember what it has already collected and only ask for what's missing.

Autonomous Operations: The Primer Video Pipeline

The system's most advanced capability is its fully autonomous primer video workflow, which demonstrates the shift from reactive assistant to proactive executor.

Trigger & Orchestration

An email listener daemon constantly monitors a dedicated inbox. When a CRM automation sends a "Please produce Primer Video" email, the master workflow is triggered. This decoupled architecture allows the process to start whether a job stage was changed by Hard-E or a human.

The 9-Step Automated Workflow:

  1. Fetch Context: Pulls all relevant customer and job data from Leap CRM.
  2. Generate Script: Creates a personalized 1-2 minute primer script.
  3. Generate Audio: A two-stage audio pipeline uses OpenAI TTS for a consistent base, then enhances it with ElevenLabs Speech-to-Speech for superior quality.
  4. Render Video: Uploads the audio to Cloudinary, which overlays it onto a base video template with background music.
  5. Upload to Delivery: The final MP4 is uploaded to a shared Google Drive folder.
  6. Notify Internally: Adds a note with the video link to the job in Leap CRM.
  7. Wait for Processing: A deliberate 15-minute delay allows Google Drive to finish processing the video.
  8. Deliver to Client: Sends a personalized email directly to the client with the playable video link.
  9. Cleanup: A try/except/finally block ensures all temporary files and cloud assets are deleted, regardless of success or failure.

Technology & Integration Landscape

Hard-E orchestrates a wide array of modern cloud services and APIs to function.

AWS EC2 (t3.small) AWS S3 Python 3.9 Streamlit Nginx OpenAI Agents SDK GPT-4.1 & GPT-4.1-mini Whisper & OpenAI TTS ElevenLabs S2S Perplexity API Leap CRM API Cloudinary Google Drive API Gmail IMAP

Future Vision: v3.0 Architecture

While the current Streamlit-based application is fully functional, the next generation of Hard-E is being planned to enable true real-time voice and horizontal scalability.

The v3.0 Stack: The future architecture will migrate to a React frontend, a high-performance FastAPI backend, and Redis for distributed state management. This will enable multi-user support, webhook-based voice integration, and the ability to scale beyond a single server instance.

← Return to Main Site