2024–Present

AI Safety Research

Adversarial Testing & Evaluation

AI SafetyRed TeamingJailbreak AnalysisModel Evaluation

Overview

Ongoing research in adversarial testing and model safety evaluation. Contributed to pre-release testing for major AI labs, with work cited in official documentation and vulnerabilities leading to production patches.

Challenges

Identifying novel attack vectors before public deployment
Operating under formal disclosure protocols
Systematic evaluation across model capabilities
Documenting findings for reproducibility

Approach

Developed systematic jailbreak taxonomy and testing methodology
Built automated evaluation harnesses for consistent scoring
Maintained responsible disclosure timelines with labs
Created detailed technical writeups for each finding

Tech Stack

PythonOpenAI APIAnthropic APIxAI APICustom Evaluation Tools

Outcomes

Top ~1% in Gray Swan AI Jailbreak Challenge (~58 verified)
Work cited in OpenAI o3-mini system card
Selected for Anthropic constitutional classifier testing
Grok3 vulnerabilities patched by xAI post-disclosure

Agentic Audit Platform VLM Training Pipeline XTRAP