Principal ML Infrastructure Engineer
Location: Elkins Park
Posted on: June 23, 2025
|
|
Job Description:
Upwork ($UPWK) is the world’s largest work marketplace,
connecting businesses with highly skilled professionals worldwide.
From entrepreneurs to Fortune 100 enterprises, companies trust
Upwork’s platform to access expert talent, leverage AI-powered work
solutions, and drive meaningful business outcomes. Upwork’s
AI-powered platform has facilitated over $20 billion in economic
opportunity for professionals worldwide. With professionals
spanning 10,000 skills, including AI and machine learning, software
development, sales and marketing, customer support, finance and
accounting, and more, Upwork empowers businesses of all sizes to
scale, innovate, and build agile teams. The Machine Learning
Infrastructure & Data team is responsible for architecting and
building the foundational ML systems and tools that enable
efficient development, deployment, and management of machine
learning models at scale. As a Principal ML Infrastructure Engineer
in the Machine Learning Infrastructure & Data team, you will play a
pivotal role in designing, developing, and maintaining robust and
scalable ML infrastructure components to support the companys
machine learning initiatives. You will collaborate closely with
cross-functional teams including machine learning researchers, data
scientists, and software engineers to build state-of-the-art
platforms and tools that accelerate the development and deployment
of machine learning models. Responsibilities: • Own technical
workstreams from start to finish, contribute to the team’s product
roadmap, and be responsible for major technical decisions and
tradeoffs. Effectively participate in team’s planning, code
reviews, and design discussions • Consider the effects of projects
across multiple teams and proactively manage conflicts. Work
together with partner teams to achieve cross-departmental goals and
satisfy broad requirements • Design, implement, and optimize
distributed systems and infrastructure components to support
large-scale machine learning workflows, including data ingestion,
feature engineering, model training, and serving. • Develop and
maintain frameworks, libraries, and tools to streamline the
end-to-end machine learning lifecycle, from data preparation, model
training, evaluation, deployment, and monitoring. • Architect and
implement highly available, fault-tolerant, and secure systems that
meet the performance and scalability requirements of production
machine learning workloads. • Collaborate and publish with machine
learning researchers and data scientists on novel research and
translate research into scalable and efficient software solutions.
• Stay current with the latest advancements in machine learning
infrastructure, distributed computing, and cloud technologies, and
integrate them into our platform to drive innovation. • Mentor
teammates, conduct code reviews, and uphold engineering best
practices to ensure the delivery of high-quality software
solutions. What it takes to catch our eye: • Senior/Leadership
level experience in ML infrastructure engineering, ideally at an
innovative technology company. • Proven Impact: Show us your track
record of delivering impactful solutions. • Innovative Thinker:
Bring creativity and fresh ideas to the table. • Technical
Proficiency: Solid foundation in software engineering and ML
concepts. • Collaborative Mindset: Strong communication and
teamwork skills are a must. • Continuous Learner: Stay updated with
the latest advancements in the field of AI. • Our Teams Tech stack:
Compute: AWS, EKS, Databricks - Data: Snowflake, S3, SQLMesh, Feast
- Workflow Automation: Airflow - Experiment Tracking: Weights &
Biases, MLflow - LLM Inference: Fireworks, in-house deployment on
EKS At Upwork, you’ll shape talent solutions for how the world
works today. We are a remote-first organization working together to
create exciting remote work opportunities for a global community of
professionals. While we have physical offices in San Francisco and
Chicago, currently we also hire full-time employees in 19 states in
the United States. At the core of our vibrant culture are shared
values that form the foundation of our organization. These values
revolve around trust, risk-taking, customer focus, and excellence.
Our overarching mission is to create economic opportunities so that
people have better lives. We foster an environment where
individuals are encouraged to bring their authentic selves to work,
nurturing personal and professional growth through development
opportunities, mentorship programs, and participation in Upwork
Belonging Communities.
Keywords: , Trenton , Principal ML Infrastructure Engineer, IT / Software / Systems , Elkins Park, New Jersey