amwell

Staff Site Reliability Engineer

🇨🇴 Colombia, CO Hybrid Posted May 7, 2026
LocationColombia, CO
WorkplaceHybrid
LanguageEnglish
PostedMay 7, 2026
Last verifiedMay 8, 2026

Company Description

At Amwell, we’re transforming healthcare for all—powered by technology and inspired by people. Here, your ideas don’t just matter—they drive real change, improving lives on a global scale.

We marry technology and innovation with clinical excellence to provide trusted solutions that solve the healthcare industry’s biggest pain points and are on a mission to enable greater access to more convenient, affordable, and effective care.

We do this through our technology-enabled care platform that is designed to help our clients achieve their digital care ambitions – today and in the future. We offer programs spanning the full care continuum, including urgent, acute and specialty care, behavioral health, and services for the treatment of chronic conditions such as heart and cardiometabolic diseases. Programs are powered by Amwell as well as our growing partner network.

For almost two decades, Amwell has proudly served some of the largest and most sophisticated healthcare organizations in the U.S. and worldwide. Our team is passionate about technology’s role in transforming care delivery and making it more equitable, accessible, efficient, cost-effective and navigable for all.

Brief Overview 

As a Staff Site Reliability Engineer (P4), you will define and elevate the reliability standards across the platform. This role goes beyond owning individual services — you will establish the patterns, practices, and tooling that enable all teams to build and operate reliable systems at scale.

You will operate across team boundaries, identifying systemic reliability risks and designing cross-cutting solutions that improve the overall health of the platform. Acting as a bridge between service-level reliability and organizational maturity, you will help ensure reliability becomes a built-in property of the system rather than a reactive effort.

This role combines deep technical expertise with strong leadership and influence. You will mentor senior engineers, guide architectural decisions, and promote a culture of proactive reliability, observability, and operational excellence across the organization.

 

Core Responsibilities 

  • Define and evolve reliability standards, patterns, and tooling adopted across the platform.
  • Own the reliability posture for critical service domains and drive architectural reviews to ensure reliability, operability, and recovery are first-class concerns.
  • Design and implement cross-cutting reliability mechanisms such as circuit breakers, retry policies, graceful degradation, and load shedding.
  • Establish and maintain scalable SLO frameworks that teams can adopt with minimal friction.
  • Lead complex, multi-service incident response as an incident commander and drive high-quality postmortems focused on systemic improvements.
  • Identify recurring incident patterns and implement structural solutions to prevent future failures.
  • Improve incident response processes, tooling, escalation paths, and communication practices.
  • Design and drive observability strategies across services, including metrics, logs, traces, and alerting systems.
  • Ensure alerting is actionable and aligned with SLOs, and build shared dashboards and runbooks to reduce time to resolution.
  • Collaborate with Platform Engineering to strengthen infrastructure reliability across Kubernetes (EKS), networking, and data systems.
  • Contribute to infrastructure as code for reliability-critical components and validate disaster recovery, backup, and restore strategies.
  • Drive chaos engineering practices and ensure deployment pipelines include reliability safeguards such as health checks, canary releases, and rollback automation.
  • Lead capacity planning and performance optimization efforts across services and shared infrastructure.
  • Identify bottlenecks and failure risks across distributed systems and design solutions that improve resilience and recovery.
  • Mentor engineers through design reviews, incident leadership, and knowledge sharing.
  • Promote best practices, improve operational maturity across teams, and influence engineering culture toward proactive reliability investments.
  • Represent reliability concerns in cross-functional planning and contribute to long-term platform strategy.

Qualifications

  • 8+ years of experience in Site Reliability Engineering, infrastructure, or production engineering roles.
  • Strong experience operating and improving large-scale production systems in AWS environments.
  • Deep expertise in Kubernetes (preferably EKS), including networking, scheduling, and observability.
  • Hands-on experience with Infrastructure as Code tools such as Terraform or CDKTF.
  • Advanced understanding of distributed systems, networking, and failure modes.
  • Experience designing and managing observability stacks (e.g., Prometheus, Grafana, OpenSearch, OpenTelemetry).
  • Proven experience leading incident response for complex, multi-service production environments.
  • Demonstrated ability to drive systemic reliability improvements across teams and platforms.
  • Strong written communication skills, including experience creating postmortems, design documents, runbooks, and architectural proposals.
  • Experience with service mesh technologies (e.g., Istio) and mTLS is a plus.
  • Familiarity with GitOps workflows (e.g., ArgoCD, Flux) is a plus.
  • Experience working in compliance-driven environments (e.g., HIPAA, SOC2, FedRAMP) is preferred.
  • Exposure to chaos engineering practices and cost-aware infrastructure design (FinOps) is a plus.

 

Do Well. Live Well. At Amwell. 

Driven by our mission and values, we foster a workplace where Delivering Awesome, being Customer First and operating as One Team aren’t just aspirations – they are how we work, every day.  

Our people are our greatest asset. We strive to empower their growth and development not only as Amwellians but as individuals, through generous total rewards packages, a virtual-first work environment, work-life flexibility, including Summer Fridays and designated Mental Health Days, as well as opportunities to stretch and learn – to name a few. It’s our people who truly differentiate us. Ask anyone and they’ll tell you – you’ll never work with more passionate, more driven and more caring team members.    

We champion a culture of respect and inclusion, accountability and integrity, innovation and collaboration. At Amwell, you’ll do the most meaningful work of your career—improving healthcare for millions, growing alongside incredible teammates, and being valued for who you are.  

 

Working at Amwell:

Amwell is changing how care is delivered through online and mobile technology. We strive to make the hard work of healthcare look easy. In order to make this a reality, we look for people with a fast-paced, mission-driven mentality. We’re a culture that prides itself on quality, efficiency, smarts, initiative, creative thinking, and a strong work ethic. 

Our Core Values include One Team, Customer First, and Deliver Awesome. Customer First and Deliver Awesome are all about our product and services and how we strive to serve. As part of One Team, we operate the Amwell Cares program, which brings needed assistance to our communities, whether that be free healthcare for the underserved or for people affected by natural disasters, support for equality, honoring doctors and nurses, or annual Amwell-matched donations to food banks. Amwell aims to be a force for good for our employees, our clients, and our communities.

Amwell cares deeply about and supports Diversity, Equity and Inclusion. These initiatives are highlighted and reflected within our Three DE&I Pillars - our Workplace, our Workforce and our Community! 


Benefits

Additional Benefits

  • Medical Plan Coverage provided by Colmédica 
  • Plan Coverage provided by Pan American 
  • Hybrid Allowance 
  • Additional Paid Time Off 
  • Maternity Leave 18 weeks 
  • Parental/Paternity Leave 2 mandatory weeks + 4 weeks 
  • Mental Health and Resiliency 
  • Virtual Second Opinion with the Cleveland Clinic Coverage 
  • LinkedIn Learning 
  • Rewards and Recognition 
  • Service Anniversaries 
  • Annual Bonus 
  • Referral Program
  • Amwell tuition reimbursement benefit

 

https://business.amwell.com/privacy-policy/

 

 

Privacy Notice 

Before you leave

Leave your email to track this opening and receive relevant alerts. You can also continue without sharing it.