Open role
[Remote] Senior AI Quality Engineer (LLM Evaluation & Automation) 1754
Note: The job is a remote job and is open to candidates in USA. Softgic is a technology company seeking a Senior AI Quality Engineer to own the evaluation harness and quality gate for measurable agent quality. This role involves building and maintaining the eval harness, integrating evaluations into CI, and defining release-gate thresholds.
Responsibilities
- Build and maintain the MVP eval harness: golden tasks, exception tasks, scorecard metrics, and regression packs
- Wire evals into CI so quality regressions fail builds and releases
- Define and maintain release-gate thresholds with Product and the Tech Lead
- Lay the path for later adversarial and drift-testing expansion without overbuilding MVP scope
Skills
- Experience evaluating ML, LLM, or non-deterministic systems
- Strong test and benchmark design capability
- Comfort working with noisy metrics, thresholds, and probabilistic behavior
- Good scripting and automation skills
Company Overview