···
Log in / Register

QA Engineer

Indeed
Full-time
Onsite
No experience limit
No degree limit
79Q22222+22
Favourites
Share

Description

Summary: Seeking a QA Engineer (Mid/Senior) to own quality for AI-driven systems and integrations, focusing on testing LLM-powered features and non-deterministic systems. Highlights: 1. Own end-to-end QA for AI agents, prompt pipelines, and integrations. 2. Design test strategies for non-deterministic AI systems. 3. Work on genuinely novel problems in AI QA. ### **The Role** We're looking for a QA Engineer (Mid or Senior level) who can own quality across AI\-driven systems and the integrations that hang off them. This is not traditional app QA. You'll test LLM\-powered features, prompt pipelines, agent workflows, MCP integrations, and the GitHub\-based delivery pipelines that power them. You'll work directly inside our repos (including degen\-engine and skeleton), partner with engineers using Claude Code and Gemini Code Assist, and shape how we verify non\-deterministic systems. If "how do you QA an LLM?" is a question you've already started answering — keep reading. ### **What You'll Do** * Own end\-to\-end QA for Skeleton: AI agents, prompt pipelines, MCP server integrations, scheduled jobs (Vercel Cron), data ingestion (Apify), and database flows (Drizzle ORM). * Design test strategies for non\-deterministic systems: evaluation harnesses, golden datasets, regression suites for prompts, output quality scoring, hallucination and drift detection. * Write and maintain integration tests across our stack (Next.js, TypeScript, pnpm, Vercel, Sentry, Jira) including API contract tests for third\-party integrations. * Test inside GitHub directly: review PRs, run test suites in CI/CD, validate auto\-deploys to main, and verify fixes before they ship. * Partner with engineers using Claude Code, Gemini Code Assist, and our broader AI dev workflow — including writing test prompts, validating tool\-use outputs, and stress\-testing prompt caching strategies. * Build and maintain monitoring and observability for AI features in production (Sentry, custom eval dashboards, cost and latency tracking). * Define quality gates and release criteria for AI\-powered features, and partner with engineering on incident response when production outputs drift. * Triage and reproduce issues across integrated systems — when something breaks, you trace it from Slack notification through Vercel logs, Sentry traces, the database, and back to the prompt. ### **What We're Looking For** ### **Mid\-Level (3–5 years)** * 3\+ years of QA / SDET / Test Engineering experience on production software. * Hands\-on experience testing AI / LLM\-powered features in production (OpenAI, Anthropic, Gemini, or similar) — prompt evals, output validation, regression testing. * Strong TypeScript / JavaScript fundamentals; comfortable reading and writing code, not just black\-box testing. * Experience with modern web stacks: Next.js, REST/GraphQL APIs, serverless (Vercel / AWS Lambda), and at least one ORM (Drizzle, Prisma, etc.). * Fluency in Git and GitHub workflows: PR review, branch protection, CI/CD pipelines, status checks. * Experience writing automated tests with modern frameworks (Vitest, Jest, Playwright, Cypress). * Comfort working in repos alongside engineers and contributing test code directly — not just filing tickets. ### **Senior (5\+ years)** * Everything in Mid\-Level, plus: deep experience defining QA strategy for AI / ML systems in production. * Track record of building eval frameworks for LLM outputs (LLM\-as\-judge, golden datasets, A/B prompt testing, regression suites for non\-deterministic systems). * Experience with MCP (Model Context Protocol), tool use / function calling, agent frameworks, or multi\-step LLM workflows. * Comfort with observability stacks (Sentry, Datadog, custom dashboards) and ability to build them where they don't exist. * Experience mentoring engineers on quality practices and shaping team\-wide testing culture. * Familiarity with prompt caching, model selection, context management, and other techniques for keeping AI systems fast and cheap in production. ### **Nice to Have** * Direct experience with Claude (API, Claude Code, Anthropic SDK), Gemini Code Assist, or similar AI dev tools. * Experience with Apify, Playwright, or other scraping / browser automation frameworks. * Background testing data pipelines, ETL flows, or analytics systems. * Experience with Jira automation, Slack apps, or Notion API. * Open\-source contributions to AI tooling or testing frameworks. * Curiosity about prompt engineering, agent design, or the science of evaluating language models. ### **Why You'll Love It Here** * Work on genuinely novel problems: QA for AI systems is being invented right now, and you'll help invent it here. * Direct access to a small senior team building production AI pipelines from scratch — not a maintenance role, a frontier one. * Modern stack, modern tools, no legacy debt to drag through.

Source:  indeed View original post
Valentina Rodríguez
Indeed · HR

Company

Indeed
Valentina Rodríguez
Indeed · HR

Similar jobs

Cookie
Cookie Settings
Our Apps
Download
Download on the
APP Store
Download
Get it on
Google Play
© 2025 Servanan International Pte. Ltd.