The AI Policy Reviewer is responsible for evaluating AI-generated and user-generated content to ensure compliance with internal governance standards, regulatory requirements, and responsible AI principles. This role plays a key part in safeguarding model integrity by reviewing outputs for safety risks, bias, misinformation, harmful content, and policy violations, while ensuring consistent enforcement of AI usage guidelines. Responsibilities: - Review and score AI-generated responses against detailed policy rubrics. Assess outputs for safety, truthfulness, fairness, and alignment with community guidelines. - Act as a quality assurance checkpoint for automated systems. Identify instances where the AI misinterprets policy. - Handle complex "edge cases" where policy application is ambiguous. Make nuanced judgement calls regarding context, satire, or emerging risks. - Analyze and review data to identify systematic flaws in the AI’s reasoning. Report patterns of bias, hallucination, or policy gaps. - Collaborate with Policy teams to test and refine evaluation rubrics. - Participate in adversarial testing (red teaming) by attempting to "jailbreak" the model or provoke unsafe responses. - Work closely with Machine Learning Engineers to explain the "why" behind your ratings. - Write high-quality examples (prompts and ideal responses) that serve as "golden sets" for training.
- Minimum of 3 years of professional experience in Trust & Safety Operations, Content Policy, Risk Analysis, or Legal/Compliance review. - Deep understanding of content moderation principles, including hate speech, harassment, misinformation, and graphic violence policies. - Strong ability to deconstruct complex AI responses and identify logical flaws, hallucinations, or subtle biases. - Clear and concise written communication skills. - Proven emotional resilience and self-care strategies to handle exposure to disturbing AI-generated text and images. - Comfortable working with dashboards, spreadsheets, and specialized review tools. Familiarity with LLMs (ChatGPT, Gemini, etc.). - Proven ability to follow complex, detailed instructions and scoring rubrics with high consistency and accuracy. - Understanding of global cultural and political nuances.