AI Security

Llama Guard

Meta safety model family and recipes for classifying human-AI conversation safety risks.

aillmsafetymoderationclassifiermeta

Best For

Good for studying how model-based input/output moderation can support, but not replace, app security controls.

Responsible Use

Use this tool only in owned environments, classroom labs, CTFs, or engagements where you have explicit written permission. Keep notes focused on findings, risk, and remediation.

Official Resource

https://github.com/meta-llama/llama-recipes/tree/main/recipes/responsible_ai/llama_guard