CloudDevs: Senior Web site Reliability Engineer (SRE)

November 21, 2025

123

Headquarters: San Francisco

URL: https://clouddevs.com/

LOCATION : LATAM, ERUOPE

CloudDevs works with fast-moving, venture-backed startups throughout the US. We’re constructing a pool of world-class Web site Reliability Engineers for present roles and for upcoming alternatives. You’ll both be positioned straight into considered one of our associate startups or added to our vetted SRE community for future tasks.

This function is good for engineers who care about reliability, metrics, efficiency, and constructing easy, scalable methods. In the event you take pleasure in designing for scale and bettering how groups ship software program, you’ll match proper in.

Key Duties
Work as a hands-on engineer targeted on system reliability, efficiency, and observability.
Outline and monitor SLIs, SLOs, and error budgets.
Optimize monitoring value and sign high quality throughout metrics, logs, and traces.
Enhance deployment security, canary rollouts, and UAT pipelines.
Construct instruments for automated and native efficiency testing and monitor benchmarks.
Lead resilience work like failover drills, chaos assessments, and redundancy checks.
Companion with engineering groups to enhance scaling patterns and structure because the product grows.
Assist incident response processes and assist cut back operational noise.
Write clear, maintainable code in Go, Python, or Node.js.
Contribute to CI/CD enhancements and automation efforts.
Collaborate with engineers throughout groups to lift reliability requirements.

Necessities
5+ years in SRE, DevOps, or Platform Engineering roles.
Sturdy expertise with cloud infrastructure (AWS most popular), Terraform, and Kubernetes.
Deep data of observability instruments like DataDog, Prometheus, or OpenTelemetry.
Sturdy debugging expertise throughout providers, networking, and knowledge layers.
Arms-on expertise designing and monitoring SLIs/SLOs.
Expertise with CI/CD instruments akin to GitHub Actions, Jenkins, or ArgoCD.
Skill to write down production-grade code in Go, Python, or Node.js.
Consolation working independently in fast-paced environments.

Good to Have
Expertise tuning observability prices and optimizing knowledge ingestion.
Publicity to chaos engineering and progressive deployments.
Background with high-throughput or latency-sensitive methods.
AWS at scale (EKS, Lambda, DynamoDB, S3).
Expertise in regulated industries like fintech, funds, or SOC2 environments.
Efficiency testing pipelines or load-testing automation.
Expertise dealing with methods processing tens of hundreds of thousands of API calls.

Open Pool for SREs
Even for those who don’t meet each requirement or aren’t a match for the present function, sturdy SREs with actual manufacturing expertise are welcome to hitch our expertise pool. We recurrently place engineers with completely different strengths throughout reliability, DevOps, platform, observability, backend, and infrastructure engineering.

To use: https://weworkremotely.com/remote-jobs/clouddevs-senior-site-reliability-engineer-sre