Chaos & Reliability Engineer 2 (remote)
Posted on: May 15, 2022
Are you interested in using the hottest tech to deploy
containers and manage large clusters of machines where all an
enterprise's critical apps run? Do you take joy in figuring out how
to make applications more reliable?
Nordstrom is investing in how we design and deploy software. We are
obsessed with creating a joyful and automatic DevOps experience,
consistent across clouds and data centers, for our users.
As a Chaos and Resiliency Engineer, you will be working closely
with Application Development teams and Platform teams to guide them
in optimizing their applications and platforms to deliver
best-in-class reliability to Nordstrom's applications. Our Load and
Resiliency tests are business-critical components supporting our
customer's experience. In addition, the CRE team is responsible for
providing capabilities that engineering teams can use to ensure its
software is scalable, highly available, and resilient.
A day in the life...
Design, maintain and optimize the Go-based Load Testing Engine for
Nordstrom applications. Research and develop chaos and reliability
tests into the CI pipeline. Work with development teams to
integrate their systems into routine load/reliability testing.
Contribute code back to open-source projects to fix bugs and add
features needed by our team. Build sensors to monitor
load/reliability tests; troubleshoot and fix the load/reliability
tests when they break. Share the responsibility of being on-call
and conducting system architecture reviews.
You own this if you have...
Proficiency with software development in a well-known language,
Go/Python/Rust strongly preferred Experience with Chaos Engineering
tools like Gremlin, Litmus, Chaos Toolkit. Experience with managing
and scaling workloads on Kubernetes or another Container
Orchestration Platform (e.g., OpenShift). Familiarity with
networking, APIs, Kafka, and distributed systems is a plus. Ability
to debug, optimize, and automate routine tasks. Experience with
Dispatcher / Worker architecture and experience with secure vault
concepts and tools. Strong interest in SRE topics like SLOs,
Resiliency, Scaling, and Performance. Strong interest in Chaos
Engineering topics like Failure Mode, Fault Analysis, Alerting, and
Synthetic Failures. Eager to learn and soak in new information.
We've got you covered---
Our employees are our most important asset and that's reflected in
our benefits. Nordstrom is proud to offer a variety of benefits to
support employees and their families, including:
Medical/Vision, Dental, Retirement and Paid Time Away
Life Insurance and Disability
Merchandise Discount and EAP Resources
A few more important points...
The job posting highlights the most critical responsibilities and
requirements of the job. It's not all-inclusive. There may be
additional duties, responsibilities and qualifications for this
Nordstrom will consider qualified applicants with criminal
histories in a manner consistent with all legal
Keywords: Nordstrom, Trenton , Chaos & Reliability Engineer 2 (remote), Engineering , Dayton, New Jersey
Didn't find what you're looking for? Search again!