2024–Present
AI Safety Research
Adversarial Testing & Evaluation
AI SafetyRed TeamingJailbreak AnalysisModel Evaluation
Overview
Ongoing research in adversarial testing and model safety evaluation. Contributed to pre-release testing for major AI labs, with work cited in official documentation and vulnerabilities leading to production patches.
Challenges
- Identifying novel attack vectors before public deployment
- Operating under formal disclosure protocols
- Systematic evaluation across model capabilities
- Documenting findings for reproducibility
Approach
- Developed systematic jailbreak taxonomy and testing methodology
- Built automated evaluation harnesses for consistent scoring
- Maintained responsible disclosure timelines with labs
- Created detailed technical writeups for each finding
Tech Stack
PythonOpenAI APIAnthropic APIxAI APICustom Evaluation Tools
Outcomes
- Top ~1% in Gray Swan AI Jailbreak Challenge (~58 verified)
- Work cited in OpenAI o3-mini system card
- Selected for Anthropic constitutional classifier testing
- Grok3 vulnerabilities patched by xAI post-disclosure
