Tools

Updated often. Last update: December 19th 2025

Defensive Tools

TrojAI

Helps to map your models and applications to AI security standards and frameworks

Nightshade

A tool that turns any image into a data sample that is unsuitable for model training.

Insecure Plug-in

This shows common examples of vulnerability and example attack scenarios (listed in below sections also), but shows preventive strategies.

Frameworks

HarmBench

HarmBench is a standardized evaluation framework for automated red teaming.

Tensorflow

An end-to-end open source machine learning platform for everyone.

Models and Jailbreaks

LLM Attacks

Gives the primary methods of LLM attacks (in 2024, which are still relevant)

Prompt Injection Overview

Deep dive: Prompt Injection 101 for Large Language Models

Insecure Plug-ins

This shows common examples of vulnerability and example attack scenarios

Poetry Jailbreak

Shows evidence that adversarial poetry functions as a universal single-turn jailbreak technique for Large Language Models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR)

Shared Failure Modes

Some jailbreak and prompt-injection defenses use LLMs to evaluate LLM outputs, creating a shared failure mode that allows simple prompt injections to bypass guardrails undetected.

Token Smuggling

Token smuggling refers to techniques that bypass content filters while preserving the underlying meaning. It often focuses on exploiting the way language models process and understand text

Jailbreaking in the Haystack

NINJA (short for Needle-in-haystack jailbreak attack), a method that jailbreaks aligned LMs by appending benign, model-generated content to harmful user goal

Sites, blogs, communites

Hugging Face

The machine learning community collaborates on models, datasets, and applications.

Attacks and Countermeasure Blog

A Deep Dive into LLM Attacks and Countermeasures, by Nirvana El

Insecure Plug-ins (blog)

How insecure plug-in design enables attackers to automatically launch malicious requests

Jailbreak Cookbook

This is an extensive post. They suggest first reviewing the overview and empirical results sections to identify the most promising methods you’d like to explore or experiment with.