MLOps Engineer - Machine Learning Platform
Job Description
**What We Do**
At Goldman Sachs, our Engineers don’t just make things – we make things possible. Change the world by connecting people and capital with ideas. Solve the most challenging and pressing engineering problems for our clients. Join our engineering teams that build massively scalable software and systems, architect low latency infrastructure solutions, proactively guard against cyber threats, and leverage machine learning alongside financial engineering to continuously turn data into action. Create new businesses, transform finance, and explore a world of opportunity at the speed of markets.
Engineering, which is comprised of our Technology Division and global strategists’ groups, is at the critical center of our business, and our dynamic environment requires innovative strategic thinking and immediate, real solutions. Want to push the limit of digital possibilities? Start here.
**Who We Look For**
We are seeking a skilled and motivated engineer to join our Artificial Intelligence Platforms organization as an MLOps Engineer on our Machine Learning Services team.
You will be part of an expert team building and operating production-grade platform and backend systems leveraged by ML engineers and application teams across the entire firm. A key focus of this role is enabling **reliable, scalable, and observable deployment of Machine Learning and Large Language Models (LLMs).**
This role is best suited for engineers who enjoy working on **infrastructure, backend services, and distributed systems**, rather than primarily on model experimentation and development.
**Key Responsibilities:**
- Deliver scalable, efficient, secure and automated processes for building, deploying and monitoring Machine Learning models
- Enable solutions that provide business customers with the ability to leverage the latest and greatest AI/ML infrastructure, frameworks, and tooling to deliver high impact outcomes
- Develop and demonstrate deep subject matter expertise on how to optimize machine learning model deployments to scale to the specific needs of each business customer
- Deliver high quality, production ready code leveraging CI/CD best practices
- Author and maintain high quality documentation for both the engineering team as well as for business customers
- Participate in **on-call and support rotations**, helping diagnose and resolve production issues.
- Continuously expand knowledge of platform architecture with a goal to **take ownership** of individual components.
- Stay up to date with advancements in **AI/ML frameworks, model serving technologies, and GenAI infrastructure.**
**Basic Qualifications**
- 2 years of experience in software engineering **(backend, platform, or infrastructure)**.
- 2 years of experience in **Python** or a similar backend programming language.
- 1 year of experience **supporting production ML systems** (MLOps, platform or inference-related work)
- Basic understanding of **APIs** (REST or similar) and service-to-service communication.
- Experience working with **containers** (e.g., Docker).
- Familiarity with **Unix-based systems**.
- Exposure to **public cloud environments** (e.g., AWS or GCP), including core concepts such as compute, storage, and basic IAM.
- Experience working with **databases** (SQL or NoSQL).
- Solid grasp of **software engineering fundamentals**, including debugging, testing, and maintainable code design.
- Strong problem-solving skills and the ability to work effectively in a fast-paced, collaborative environment.
- Curiosity and a strong desire to keep learning—especially in the **model inference and LLM platform space.**
**Preferred Qualifications:**
- 4 years of experience in software engineering (**backend, platform, or infrastructure**)
- 4 years of experience **supporting production ML systems** (MLOps, platform or inference-related work)
- 4 years of experience in **Python** or a similar backend programming language.
- Strong understanding of the end-to-end **Model Development Lifecycle (MDLC)**
- Basic understanding of **distributed systems concepts** and exposure to **observability** concepts (logging, metrics, tracing).
- Experience building containerized runtime environments for model serving (e.g. **vLLM, SGLang, TensorRT, Triton, AWS Multi Model Server**)
- Experience with infrastructure-as-code tools, such as Terraform or CloudFormation
- Experience with **Kubernetes** and other container orchestration platforms in the public cloud (e.g. AWS, GCP)
- Experience building Machine Learning models with frameworks such as **PyTorch and TensorFlow**
- Excellent communication skills and the ability to articulate complex technical concepts to both technical and non-technical stakeholders.
**What Success Looks like in This Role:**
- Can take a **well-defined task** and drive it to completion with minimal hand-holding.
- Asks **thoughtful questions** instead of getting blocked.
- Understands basic **trade-offs** (e.g., performance vs. simplicity, flexibility vs. reliability).
- Writes code that is **readable, testable, and easy for others to extend**.
- Shows curiosity about how the **entire system works end-to-end**, not just their assigned ticket.
**We Offer Best-In-Class Benefits**
Healthcare & Medical Insurance
We offer a wide range of health and welfare programs that vary depending on office location. These generally include medical, dental, short-term disability, long-term disability, life, accidental death, labor accident and business travel accident insurance.
Holiday & Vacation Policies
We offer competitive vacation policies based on employee level and office location. We promote time off from work to recharge by providing generous vacation entitlements and a minimum of three weeks expected vacation usage each year.
Financial Wellness & Retirement
We assist employees in saving and planning for retirement, offer financial support for higher education, and provide a number of benefits to help employees prepare for the unexpected. We offer live financial education and content on a variety of topics to address the spectrum of employees’ priorities.
Health Services
We offer a medical advocacy service for employees and family members facing critical health situations, and counseling and referral services through the Employee Assistance Program (EAP). We provide Global Medical, Security and Travel Assistance and a Workplace Ergonomics Program. We also offer state-of-the-art on-site health centers in certain offices.
Fitness
To encourage employees to live a healthy and active lifestyle, some of our offices feature on-site fitness centers. For eligible employees we typically reimburse fees paid for a fitness club membership or activity (up to a pre-approved amount).
Child Care & Family Care
We offer on-site child care centers that provide full-time and emergency back-up care, as well as mother and baby rooms and homework rooms. In every office, we provide advice and counseling services, expectant parent resources and transitional programs for parents returning from parental leave. Adoption, surrogacy, egg donation and egg retrieval stipends are also available.
Benefits at Goldman Sachs
Read more about the full suite of class-leading benefits our firm has to offer.
Opportunity Overview
CORPORATE TITLE
Associate
OFFICE LOCATION(S)
Jersey City
JOB FUNCTION
Software Engineering
DIVISION
Engineering Division
SALARY RANGE
USD 115,000 - 180,000
Verified Visa Sponsor
More from Goldman Sachs
Visa Sponsorship Data
AI Resume Tailoring
Tailor your resume for MLOps Engineer - Machine Learning Platform roles
Reach hiring managers at Goldman Sachs
AI Cover Letters for MLOps Engineer - Machine Learning Platform
Generate tailored cover letters, recruiter emails, and LinkedIn messages matched to your resume.
- Tailored to your resume & job
- Cover letters, emails, LinkedIn messages
- Professional tone, your experience
