Austin / North America Full-time Hybrid / remote options Last checked May 17, 2026
This role builds product workflows for voice-agent QA, including test creation, scenario management, evaluation results, analytics, debugging, and triage. The product helps teams test and improve production voice agents.
AI QA / Evaluation EngineerVoice AIEvaluationObservabilityAI Agent
San Francisco Full-time Hybrid Last checked May 17, 2026
This role executes complex coding and GenAI data projects for enterprise AI clients. It combines technical planning, code-quality protocols, delivery operations, applied AI experimentation, and evaluation-oriented project execution.
AI QA / Evaluation EngineerEvaluationAI CodingData PipelineForward Deployed
Remote US Full-time Remote Last checked May 17, 2026
This role designs, deploys, and optimizes GenAI data solutions for enterprise clients. It owns data quality, validation logic, automation pipelines, observable workflows, and reproducible quality frameworks for model development.
AI QA / Evaluation EngineerEvaluationData PipelineWorkflow AutomationGenAI Data
New York, United States Full-time Hybrid Last checked May 17, 2026
This Copilot Security role builds secure orchestration frameworks, AI-powered defenses, identity flows, guardrails, and privacy-first systems for agentic AI experiences that take actions on behalf of users.
AI QA / Evaluation EngineerAI AgentSecurityGuardrailsPrompt Injection
United States Full-time Remote Last checked May 17, 2026
This role builds safety and guardrail mechanisms for Airbnb AI systems spanning chatbots, AI assistants, and agent copilots. It covers risk monitoring, guardrails, failure-mode data loops, and responsible AI controls.
AI QA / Evaluation EngineerAI SafetyGuardrailsLLM EvaluationRAG