Guardrails

Protect LLM interactions with prompt guards that evaluate and filter requests and responses for harmful or policy-violating content.

Guardrails are security policies that inspect LLM requests and responses to detect and block harmful, policy-violating, or inappropriate content before it reaches the model or the user. You can apply prompt guards to the request phase, the response phase, or both.

To learn more about guardrails, see the following topic.

To set up guardrails, check out the following guides.

To track guardrails and content safety, see the following guide.

About guardrails

Protect LLM requests and responses from sensitive data exposure and harmful content using layered …

Regex filters

Use custom regex patterns and built-in PII detectors to filter LLM requests and responses.

OpenAI moderation

Detects potentially harmful content across categories including hate, harassment, self-harm, sexual …

AWS Bedrock Guardrails

Apply AWS Bedrock Guardrails to filter LLM requests and responses for policy-violating content.

Google Model Armor

Apply Google Cloud Model Armor templates to sanitize LLM requests and responses.

Custom webhooks

Multi-layered guardrails

Run prompt guards in sequence, creating defense-in-depth protection.

Was this page helpful?

Guardrails

What could be improved?