Lead Software Engineer - SRE
Company: Kontakt.io
Location: New York City
Posted on: April 1, 2026
|
|
|
Job Description:
Kontakt.io is building the platform that care operations run on.
We reduce waste, cut costs, and improve revenue by improving
throughput, asset utilization and staff productivity. Our platform
uses AI, RTLS, and EHR data to enable self-learning agents to
automate workflows, adapt in real-time, and orchestrate all of care
delivery operations. Easy to deploy and scale, it gives a clear
picture of spaces, equipment, and people, eliminating
inefficiencies and enhancing the patient experience. With
measurable 10X ROI and over 20 use cases, Kontakt.io is the go-to
platform for better and faster care delivery operations. We are
looking for a Lead Software Engineer - SRE with a strong software
engineering foundation and a strategic mindset to drive the
reliability, scalability, and performance of our platform. This
role is part of our Infrastructure Engineering team and will play a
central part in shaping the architecture and direction of our SRE
function. The ideal candidate brings a deep understanding of
software engineering principles applied to infrastructure. Rather
than maintaining systems, you will lead the design and build them ,
developing automation, tooling, and resilient architecture that
enable high availability and fault tolerance across our entire
AWS-based platform. You’ll work hands-on in designing resilient
systems, improving deployment pipelines, and driving incident
management practices. As a technical leader, you’ll also mentor
engineers, shape technical strategy, and help build a culture of
accountability, ownership, and continuous improvement across the
organization. Responsibilities Lead the design and implementation
of scalable, fault-tolerant, and self-healing infrastructure and
services across AWS and Kubernetes. Collaborate with Product,
Engineering, and Infrastructure teams to align SRE initiatives with
business priorities and platform needs. Define and drive adoption
of SLIs, SLOs, and SLAs to ensure consistent performance and high
reliability across the platform. Own and evolve observability
strategies using Prometheus, OpenTelemetry, Grafana, and related
tooling. Design and maintain infrastructure as code (Terraform) and
drive GitOps best practices. Oversee major incident response and
on-call practices, including incident reviews and long-term
remediation planning. Mentor and support the growth of SRE and
platform engineers, fostering a culture of engineering rigor and
operational excellence. Contribute to the long-term reliability
roadmap and architecture of high-throughput, real-time systems in
healthcare operations. Drive process improvements in CI/CD, service
ownership, chaos engineering, disaster recovery, and secure
deployment. What You Bring 5 years of experience in Site
Reliability Engineering, Cloud Infrastructure, or Platform
Engineering. 5 years of software engineering experience building
production-grade systems (Java, Python, Go, or similar). Proven
success scaling high-traffic, mission-critical platforms in SaaS,
IoT, or healthcare environments. Deep expertise in cloud platforms
(especially AWS ), Kubernetes , and distributed system
architecture. Hands-on experience with monitoring, logging, and
observability tools (Prometheus, OpenTelemetry, Datadog, etc.).
Extensive knowledge of CI/CD automation, GitOps workflows, and
infrastructure-as-code ( Terraform , Helm, ArgoCD). A track record
of leading major incident response and running postmortems with a
blameless, learning-focused approach. Strong understanding of
networking, access control, and security within regulated
environments (HIPAA, SOC 2). A leadership mindset—able to drive
cross-functional alignment, lead initiatives, and mentor a
high-performance SRE team. Why You'll Love It Here Own
Mission-Critical Reliability – Ensure hospitals and care facilities
always stay online with a 99.99 % uptime healthcare platform .
Scale AI-Powered Infrastructure – Work on real-time automation and
self-healing cloud systems that orchestrate care delivery . Drive
Big Impact in Healthcare – Help reduce waste, optimize resources,
and improve patient care with technology that delivers 10X ROI .
Automation-First Culture – Minimize manual ops with cutting-edge
automation, observability, and incident response strategies . Join
a High-Performing Team – Work with top engineers, AI experts, and
healthcare innovators solving real-world challenges . Ready to
Build the Future of Healthcare? Apply now and help scale the
platform that care operations run on. We may use artificial
intelligence (AI) tools to support parts of the hiring process,
such as reviewing applications, analyzing resumes, or assessing
responses. These tools assist our recruitment team but do not
replace human judgment. Final hiring decisions are ultimately made
by humans. If you would like more information about how your data
is processed, please contact us.
Keywords: Kontakt.io, Trenton , Lead Software Engineer - SRE, IT / Software / Systems , New York City, New Jersey