Helps to map your models and applications to AI security standards and frameworks
Nightshade
A tool that turns any image into a data sample that is unsuitable for model training.
Insecure Plug-in
This shows common examples of vulnerability and example attack scenarios (listed in below sections also), but shows preventive strategies.
Frameworks
HarmBench
HarmBench is a standardized evaluation framework for automated red teaming.
Tensorflow
An end-to-end open source machine learning platform for everyone.
Models and Jailbreaks
LLM Attacks
Gives the primary methods of LLM attacks (in 2024, which are still relevant)
Prompt Injection Overview
Deep dive: Prompt Injection 101 for Large Language Models
Insecure Plug-ins
This shows common examples of vulnerability and example attack scenarios
Poetry Jailbreak
Shows evidence that adversarial poetry functions as a universal single-turn jailbreak technique for Large Language Models (LLMs). Across 25 frontier proprietary and open-weight models, curated poetic prompts yielded high attack-success rates (ASR)
Shared Failure Modes
Some jailbreak and prompt-injection defenses use LLMs to evaluate LLM outputs, creating a shared failure mode that allows simple prompt injections to bypass guardrails undetected.
Token Smuggling
Token smugglingrefers to techniques that bypass content filters while preserving the underlying meaning. It often focuses on exploiting the way language models process and understand text
Jailbreaking in the Haystack
NINJA (short for Needle-in-haystack jailbreak attack), a method that jailbreaks aligned LMs by appending benign, model-generated content to harmful user goal
Sites, blogs, communites
Hugging Face
The machine learning community collaborates on models, datasets, and applications.
Attacks and Countermeasure Blog
A Deep Dive into LLM Attacks and Countermeasures, by Nirvana El
Insecure Plug-ins (blog)
How insecure plug-in design enables attackers to automatically launch malicious requests
Jailbreak Cookbook
This is an extensive post. They suggest first reviewing the overview and empirical results sections to identify the most promising methods you’d like to explore or experiment with.