OpenAI moderation

Detects potentially harmful content across categories including hate, harassment, self-harm, sexual content, and violence with the OpenAI moderation API.

The OpenAI Moderation API detects potentially harmful content across categories including hate, harassment, self-harm, sexual content, and violence.

Before you begin

Set up an agentgateway proxy.
Set up access to the OpenAI LLM provider.

Block harmful content

Configure the prompt guard to use OpenAI Moderation:

kubectl apply -f - <<EOF
apiVersion: agentgateway.dev/v1alpha1
kind: AgentgatewayPolicy
metadata:
  name: openai-prompt-guard
  namespace: agentgateway-system
spec:
  targetRefs:
  - group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: openai
  backend:
    ai:
      promptGuard:
        request:
        - openAIModeration:
            policies:
              auth:
                secretRef:
                  name: openai-secret
            model: omni-moderation-latest
          response:
            message: "Content blocked by moderation policy"
EOF

Test with content that triggers moderation.

curl -i "$INGRESS_GW_ADDRESS/openai" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "I want to harm myself"
      }
    ]
  }'

curl -i "localhost:8080/openai" \
  -H "content-type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "I want to harm myself"
      }
    ]
  }'

Expected response:

HTTP/1.1 403 Forbidden
Content blocked by moderation policy

Cleanup

You can remove the resources that you created in this guide.

kubectl delete AgentgatewayPolicy openai-prompt-guard -n agentgateway-system

Was this page helpful?

OpenAI moderation

Before you begin

Block harmful content

Cleanup

What could be improved?