Live roles

MLOps Engineer jobs (2026)

Listings refreshed

63 MLOps Engineer roles in San Francisco are open across AI-native companies in our index right now, with posted total comp typically $243k–$388k. 5 were added in the last week. Below: the live board, plus where demand is heading, the skills employers want, and how the interview has changed.

Open roles
63
Median total comp
$243k–$388k
New this week
5

Latest MLOps Engineer openings

A live sample from the 63 open roles — click any row for the full posting.

The MLOps Engineer hiring market in 2026

MLOps has become one of the highest-priority AI hires as companies move from prototypes to production. Acceler8talent's April 14, 2026 ranking places MLOps Specialist at #2 among the most in-demand machine-learning roles for 2026, noting the role is "often outpacing the need for traditional data scientists as companies move from AI prototypes to active deployment." The infrastructure buildout underneath it is broad: Spheron's June 10, 2026 AI-infrastructure roundup identifies seven live hiring layers — GPU compute, inference serving, training and finetune orchestration, observability, vector databases, security, and infrastructure-as-code — all actively staffing. Lightcast's Stanford AI Index 2026 work shows GPU, scalability, and workflow skills among the fastest-growing specialized skills in the market. On a single June 2026 Indeed snapshot there were roughly 1,799 ML Infrastructure Engineer postings, and NVIDIA, hyperscalers, and frontier labs are all competing for the same systems-plus-ML profile, keeping supply tight.

The skills employers actually want now

MLOps in 2026 demands a rare blend of distributed-systems depth and modern LLM-serving fluency. NVIDIA's live May 2026 postings are representative: the "Principal AI and ML Infra Software Engineer, GPU Clusters" listing requires distributed training at multi-thousand-GPU scale, while a companion "Staff ML Infrastructure Engineer (Compute)" role at GM emphasizes scaling, efficiency, and high utilization of cutting-edge GPUs. The framework mentions that recur most across postings are:

  • Serving/inference: TensorRT-LLM, Triton, vLLM, SGLang
  • Distributed comms and scheduling: NCCL, Kubernetes-on-GPU
  • Compilation and performance: PyTorch compile
  • Observability: Langfuse, Arize

The bar is genuinely two-sided — candidates need both ML understanding (what a model needs to train and serve well) and systems depth (how to schedule, scale, and observe GPU workloads). That dual requirement is exactly why supply is constrained and why the role frequently outranks more classical data-analytics hires.

How the MLOps interview has changed

MLOps loops in 2026 have reoriented around inference economics and production reliability rather than generic pipeline questions. Interviews now test inference-cost optimization directly — tokens per dollar, KV-cache hit ratios, and prefix caching — alongside multi-tenant GPU scheduling, observability and evals at scale, and on-call scenarios for model-rollout safety. At the deepest end, Anthropic's Performance Engineer topics include loop unrolling, memory coalescing, and operation fusion per JobRight's December 18, 2025 guide, reflecting how kernel-level performance work has crept into the infrastructure interview. Time-to-fill runs around 89 days on the broad AI/ML average (KORE1, February 13, 2026), but senior and staff roles at frontier labs and NVIDIA stretch to a 3-6 month time-to-hire because the bar requires both ML and systems depth simultaneously. Candidates should expect to reason quantitatively about serving cost and utilization on a whiteboard, not merely describe CI/CD for models.

How people are breaking in

MLOps is the strongest on-ramp for platform, SRE, and infrastructure engineers adding ML to an existing systems foundation — the Kubernetes-to-GPU-Kubernetes jump is repeatedly cited as the highest-leverage transition of 2026. Pluralsight's January 5, 2026 MLOps career guide lays out a concrete standard path: Kubernetes plus Python plus one cloud (AWS or GCP), one serving stack (vLLM, Triton, or TensorRT-LLM), and one observability tool (Langfuse or Arize). That stack maps cleanly onto what NVIDIA, hyperscalers, and neocloud providers such as CoreWeave, Lambda, Hyperbolic Labs, and Modal are hiring for. Course Report's April 24, 2026 ranking also lists dedicated MLOps bootcamps — most-tracked being Fullstack Academy and TripleTen — for candidates without a systems background. The practical playbook: take an existing platform/SRE skill set, layer on one serving stack and one observability tool, and demonstrate a GPU-serving or inference-optimization project that shows you can reason about cost and utilization, not just deploy a container.

How we read this

The counts, median comp, and skill frequencies here are computed from 63 live MLOps Engineer postings in San Francisco in our index (17 disclose a pay-transparency band), refreshed weekly; the board above is live. The market context is researched and cited below.

Sources

Ready to land a MLOps Engineer role?

Landed scores your readiness against real AI-native roles and drills the interview until you walk in ready.

Frequently asked

How many MLOps Engineer jobs are open right now?

Our index tracks 63 live MLOps Engineer roles in San Francisco, refreshed daily, with 5 added in the last week.

What do MLOps Engineer jobs pay?

Posted total comp typically runs $243k–$388k for roles that disclose a band. See the MLOps Engineer salary guide for percentiles and pay by level.

Is MLOps Engineer in demand in 2026?

MLOps has become one of the highest-priority AI hires as companies move from prototypes to production. Acceler8talent's April 14, 2026 ranking places MLOps Specialist at #2 among the most in-demand machine-learning roles for 2026, noting th

Related