Guardrails
Protect LLM interactions with prompt guards that evaluate and filter requests and responses for harmful or policy-violating content.
Guardrails are security policies that inspect LLM requests and responses to detect and block harmful, policy-violating, or inappropriate content before it reaches the model or the user. You can apply prompt guards to the request phase, the response phase, or both.
To learn more about guardrails, see the following topic.
To set up guardrails, check out the following guides.
To track guardrails and content safety, see the following guide.
About guardrails
Protect LLM requests and responses from sensitive data exposure and harmful content using layered …
Regex filters
Use custom regex patterns and built-in PII detectors to filter LLM requests and responses.
OpenAI moderation
Detects potentially harmful content across categories including hate, harassment, self-harm, sexual …
AWS Bedrock Guardrails
Apply AWS Bedrock Guardrails to filter LLM requests and responses for policy-violating content.
Google Model Armor
Apply Google Cloud Model Armor templates to sanitize LLM requests and responses.
Custom webhooks
Multi-layered guardrails
Run prompt guards in sequence, creating defense-in-depth protection.