Trust & Safety Infrastructure

Civitas AI

Enterprise-grade content moderation with ML-powered classification, configurable policies, human-in-the-loop review, and immutable audit trails.

EU AI Act Ready NIST AI RMF SOC 2 Controls

The Challenge

Content Moderation at Scale

  • Millions of user-generated content pieces daily
  • Toxic, harmful, and policy-violating content
  • Real-time decision requirements
  • Multi-platform, multi-language challenges

Regulatory Pressure

  • EU AI Act compliance requirements
  • Transparency and explainability mandates
  • Human oversight obligations
  • Immutable audit trail requirements

Operational Complexity

  • Inconsistent moderation decisions
  • No visibility into AI decision-making
  • Difficult policy enforcement
  • Missing evidence for appeals

The Solution

  • ML-powered automated classification
  • Configurable, versioned policies
  • Human-in-the-loop escalation
  • Cryptographically-secured audit trail

Architecture

🌐

Cloudflare Pages

React Frontend

🚪

Gateway

Rate Limiting, CORS, Auth

🤖

Moderation

HuggingFace ML

📋

Policy Engine

Configurable Rules

PostgreSQL Redis
Supabase PostgreSQL (Pooled)
Upstash Redis (TLS)
Cloud Run (Serverless)

Live Demo

Dashboard Overview

Moderation Demo

Policy Management

API in Action

Try It Live

Request / Response

POST /api/v1/moderate
{
  "content": "Hello...",
  "source": "demo"
}
Response
{
  "action": "allow",
  "category_scores": {...}
}

Policy Engine

Configurable Thresholds

Toxicity → Block 0.80
Hate → Block 0.70
Harassment → Warn 0.75
Profanity → Warn 0.90

Multi-Policy Support

Standard Community Guidelines
v1 • Global • Published
Active
Youth Safe Mode
v1 • Under 13 • Published
Active
Relaxed Forum Policy
v1 • US Forums • Draft
Draft

Human-in-the-Loop

Review Queue Workflow

1
Content Escalated
ML confidence below threshold or edge case detected
2
Moderator Review
Human reviews content with ML recommendations
3
Decision with Rationale
Approve/Reject/Escalate with mandatory explanation
4
Evidence Recorded
Immutable audit trail with cryptographic hash

Moderator Actions

Compliance & Audit

Immutable Evidence Records

{
  "id": "e1000000-0000-...",
  "control_id": "MOD-001",
  "decision_id": "d0000000-...",
  "automated_action": "block",
  "category_scores": {
    "toxicity": 0.92,
    "hate": 0.95
  },
  "submission_hash": "sha256:a7f3b...",
  "immutable": true,
  "integrity_hash": "sha256:c9d2e..."
}

Audit Trail Features

  • Cryptographic hash chain
  • Tamper detection triggers
  • Full decision lineage
  • CSV/JSON export
  • Policy version tracking
  • Human review rationale

Regulatory Compliance

🇪🇺

EU AI Act

Art. 9, 13, 14, 15, 17

12 controls mapped
🏛️

NIST AI RMF

MAP, MEASURE, MANAGE, GOVERN

8 controls mapped
🌐

ISO 42001

Clause 6, 8, 9

6 controls mapped
🔒

GDPR

Art. 22, 35

5 controls mapped

SOC 2

CC6, CC7, CC8

7 controls mapped
18 implemented controls with full traceability to regulatory requirements

Knowledge Graph

112 nodes • 138 relationships • Neo4j Aura

Integration Patterns

REST API

Direct HTTP integration with JSON payloads

POST /api/v1/moderate Authorization: Bearer {api_key} {"content": "...", "source": "web"}

Mobile SDK

Native iOS/Android with offline queue

CivitasSDK.moderate(text) { result -> when(result.action) { ALLOW -> publish() BLOCK -> reject() } }

LLM Guardrails

Pre/post-processing for LLM outputs

llm_output = model.generate(prompt) result = civitas.moderate(llm_output) if result.action == "block": return SAFE_FALLBACK

Webhooks

Event-driven notifications

{ "event": "moderation.decision", "action": "escalate", "decision_id": "..." }

Cloud Deployment

Live URLs

Frontend: civitas.pages.dev
API: gateway-xxx.run.app
Database: Supabase (us-west-2)
Graph: Neo4j Aura
4
Microservices
<100ms
API Latency (p95)
99.9%
Uptime SLA

Roadmap

Phase 1: Foundation

Core moderation, policy engine, review queue, audit trail

Complete ML Classification • Policy Rules • Evidence Chain

Phase 2: Scale

Multi-language support, custom ML models, real-time streaming

Q2 2026 i18n • Fine-tuning • WebSocket API

Phase 3: Enterprise

Multi-tenant, SSO, advanced analytics, SLA dashboard

Q4 2026 SAML/OIDC • Tenant Isolation • BI Integration

Get Started

Enterprise-grade content moderation, ready for production

Documentation

API reference, integration guides, and examples

Contact

proth1@gmail.com

License

MIT Open Source