Building execution-grade AI systems

Agent architectures, durable workflows, and adversarial safety testing.

visitor@hertzfelt ~ terminal

aaron@hertzfelt.io

AI Systems R&D · Founder, Hertzfelt Labs

Welcome. I'm an AI assistant with knowledge of Aaron's work.

Type a command or ask a question. Use /help for options.

Agent Architecture

Execution Flow

User Input

Natural language query

AI Gateway

Vercel AI Gateway

Reasoning

Claude / GPT-4 / Gemini

Tool Execution

Structured outputs + function calls

State Management

Durable execution context

Response

Streaming output

Vercel AI SDK + AI Gateway

Ready

View work @hertzfelt_io

Aaron Diemel

Founder, Hertzfelt Labs

I spent 15 years as an audio engineer—signal flow, systems thinking, shipping under pressure. In 2022, I made a bet: reskill entirely with AI as my copilot. No bootcamps, no traditional path. Just me and the models, learning to build together.

It worked. The mental models from audio engineering—signal chains, iterative refinement, translating vision into execution—transferred directly to AI systems architecture. I've scaled my capabilities at the same rate the technology has scaled, staying on the frontier while shipping production systems.

In 2023, I founded Hertzfelt Labs as a consulting practice and applied R&D lab. I work with founders and engineering teams on agent architectures, adversarial evaluation, and AI systems that actually execute. This site is built with the same tools and workflows I bring to client work.

X·GitHub·LinkedIn·aaron@hertzfelt.io

Selected Work

Systems that execute, resume, and adapt.

Agentic Audit Platform

Durable Execution Environment

Designed and prototyped AI-native audit execution: agents ingest documents, reason over structured state, call tools, request missing inputs, and pause/resume with human control points.

Durable AgentsTool CallingHuman-in-the-Loop

VLM Training Pipeline

Vision-Language Fine-Tuning

End-to-end pipeline for IDEFICS2-8B on 107k financial documents. Full stack: dataset curation, preprocessing, QLoRA training on Modal GPUs, checkpointing, and inference API deployment.

VLMQLoRAModal

XTRAP

Autonomous Red Team System

Self-improving adversarial evaluation: automated jailbreak generation, recursive refinement, and attack memory with multi-model attacker/defender loops.

Red TeamingAdversarial AIMulti-Agent

AI Safety Research

Adversarial Testing & Evaluation

Top ~1% Gray Swan Jailbreak Challenge. Work cited in OpenAI o3-mini docs. Anthropic safety programs. Documented Grok3 vulnerabilities patched by xAI.

AI SafetyRed TeamingJailbreak Analysis

Rapid Prototyping

Agent Development

Shipping functional systems with Cursor, Claude Code, Codex, v0. Multi-agent 3D simulators, multi-model comparison tools, voice agents, and AI automation.

CursorClaude Codev0

Audio Engineering

Billboard #1 Credit

15 years signal processing and production. Eric Darius — Goin' All Out (Blue Note, 2008). Background informs multimodal and audio AI work.

Signal ProcessingAudio ProductionBillboard #1

How I Work With Agents

Speed comes from front-loaded architecture and domain translation, followed by delegation to coding agents. Here's the methodology behind shipping functional AI systems rapidly.

Architecture First

Front-load system design and domain translation before writing code. Define execution graphs, state schemas, and tool interfaces upfront.

System DesignDomain ModelingState Schemas

Claude.md & Agents.md

Maintain project context files that give coding agents full system awareness. Persistent memory across sessions for consistent architectural decisions.

CLAUDE.mdAGENTS.mdContext Files

MCP Servers

Custom and OSS Model Context Protocol servers for tool integration. Extend agent capabilities with workspace tools, database access, and external APIs.

Custom MCPSupabase MCPWorkspace Tools

Agent Delegation

Delegate implementation to coding agents after architecture is set. Claude Code for complex refactoring, Cursor for iteration, v0 for UI prototyping.

Claude CodeCursorCodexv0

Structured Outputs

Type-safe tool calling with Zod schemas and JSON Schema. Reliable structured extraction and function execution with validation at every step.

ZodJSON SchemaTool Calling

Evaluation & Iteration

Continuous evaluation against defined success criteria. Rapid iteration cycles with automated testing and human review gates.

Eval HarnessesTest SuitesReview Gates

Agent Tooling & Harness

Vercel AI SDKPrimary SDK

Claude Agent SDKAgent Framework

OpenAI Agents SDKAgent Framework

LangGraphOrchestration

MCP ServersTool Integration

Structured OutputsType Safety

CLAUDE.md

# Project Context

## Architecture
- Durable agent execution with checkpoint-based state
- Tool calling via Vercel AI SDK with Zod schemas
- MCP servers for workspace and database integration

## Conventions
- All tools must return structured outputs
- Human approval gates for destructive operations
- State persisted to Supabase between sessions

## Current Focus
- Implementing document extraction pipeline
- Adding multi-model fallback for reliability

Stack

Tools and platforms in active use.

Languages & Frameworks

TypeScriptJavaScriptPythonNext.jsReactPHPCSS

Agent Tooling

Vercel AI SDKClaude Agent SDKOpenAI Agents SDKLangGraphTool CallingStructured Schema (Zod)MCP Servers

Model Inference

AnthropicOpenAIGoogle GeminiVercel AI GatewayOpenRouterAWS BedrockNous Research

Development

CursorClaude CodeCodexWarp TerminalGitHubReplit

Deployment & Infrastructure

VercelAWSSageMakerSupabaseConvex

ML Ops

Modal LabsHugging FaceWeights & BiasesLangChainLangSmith

Multimodal - Voice

LiveKitOpenAI Realtime APIEleven LabsCartesia

Multimodal - Image

Flux ModelsFal AIStability AIHiggsField AI

Recent Thoughts

Insights on AI systems, safety research, and frontier model deployment.

Hertzfelt Labs@hertzfelt_io·Jan 14, 2025

Everyone calling 2025 the year of agents clearly missed the memo. That was 2024. This year? It's about operational intelligence—knowing when not to deploy agents. True flex isn't scaling agent frameworks; it's understanding their limits. Sometimes a lean LLM stack with precision tool integrations will outperform a bloated agent architecture. 2025 is the pivot point—less about chasing agent hype, more about architecting for efficiency, modularity, and actual ROI. It's the year of strategic deployment, not blind adoption.

71K

Hertzfelt Labs@hertzfelt_io·Feb 1, 2025

🚀 Pushing frontier #AI safety to the limit. Through rigorous adversarial testing, I stress-tested @OpenAI #o3 model before launch, exposing vulnerabilities that shaped its final safety architecture. Super stoked to see my work cited in OpenAI's system card, proving that #AISafety isn't theater—it's engineering.

3.7K