AV // SEC

Systems & AI Security

ANISH
VARMA

Security Researcher & Founding Engineer

Researching AI/ML systems, offensive security, trust-boundary failures, autonomous agents, parser security, and modern distributed infrastructure. Focus areas include whitebox vulnerability discovery, activation steering, and mechanistic interpretability.

Selected Research Publications

Full Archive
AI SecurityPublished

Mechanistic Alignment Abliteration in Qwen-3.5: Steering Activation States

A deep dive into bypassing alignment guardrails in Qwen3.5 models at the weight level. By identifying safety representation subspaces and orthoganalizing steering weights, we study representations drift and guardrail activations without full model fine-tuning.

#mechanistic-interpretability#alignment-bypass#activation-patching#weight-steering
2026-05-1812 min readRead Document

Active Engineering Systems

All Projects
Systems Project

Kai Autonomous Cybersecurity Agent Framework

A high-performance autonomous agent framework built in Rust, engineered to execute offensive security diagnostics while maintaining rigid trust boundaries and sandboxed operation.

RustPythonLLM SecuritySandboxingExploit Dev
Systems Project

Qwen3.5 Alignment Abliteration Research

Abliterating alignment guardrails in Qwen3.5 via targeted weight modifications. Implementing activation patching to analyze representation steering and representation drift.

PyTorchTransformer LensMechanistic InterpretabilityActivation Patching

Recent Notes & Writing

All Writing