Senior Site Reliability Engineer

Description:

Tenable's cloud-based exposure management platform helps organizations see, understand, and reduce cyber risk across their entire attack surface. Our SRE teams keep that platform reliable, scalable, and secure — and we're building the next generation of tooling to do it smarter.

This role sits within our SRE Infrastructure Management organization on a focused team dedicated to reducing operational toil through AI-powered automation. You'll build intelligent systems that replace manual workflows — from incident diagnostics to infrastructure provisioning to upgrade automation — using LLMs, agentic architectures, and deep SRE domain knowledge.

This isn't an operations role with some AI on the side. You'll spend most of your time writing production code: designing and building agentic workflows, integrating across observability and infrastructure platforms, and measuring the impact of what you ship against real toil data.

Your Opportunity

Design and build AI-powered agentic workflows that automate complex SRE operations — incident investigation, infrastructure provisioning, deployment reliability, and more.
Improve the accuracy, reliability, and observability of agent pipelines through evaluation frameworks, prompt engineering, retrieval strategies, and structured output validation.
Build developer tools and internal platforms — CLI tools, IDE plugins, and workflow automation — that engineers across the organization use daily.
Build tooling that connects across the SRE tech stack — Kubernetes, Terraform, Helm, CI/CD pipelines, observability platforms, and cloud infrastructure APIs.
Work on a focused team where everyone writes code, owns what they ship, and drives prioritization from measured toil data.
Participate in SRE on-call rotation — we use on-call as a direct input into what we build, not just a firefighting duty.
Collaborate with SRE teams across the organization to identify automation opportunities and deliver tooling that gives engineers hours back.

What Success Looks Like

Within your first few months, you've shipped an agentic workflow that automates a real SRE toil category — and engineers are using it.
Within 6 months, you're independently designing and building AI-powered pipelines, contributing to the team's evaluation and accuracy practices, and your work is driving measurable toil reduction.
Within a year, you've become a go-to contributor on the team — shaping the roadmap, mentoring others on AI + SRE patterns, and building systems that scale the team's impact across engineering.

What You'll Need

5+ years of SRE, platform engineering, or infrastructure engineering experience.
Strong software engineering skills — you write production-quality code, not just scripts. Python is the primary language for our tooling stack.
Experience building with LLMs and AI in production or infrastructure contexts — integrating models into real systems, not just experimentation.
Experience building developer tools or internal platforms — CLI tools, IDE plugins, or workflow automation that other engineers use daily.
Deep experience with Kubernetes (EKS preferred) — deployment, troubleshooting, helm chart management, and cluster operations.
Experience with Infrastructure as Code (Terraform preferred) and CI/CD pipeline development.
Strong experience with AWS services and APIs.
Experience with observability platforms (Datadog, Coralogix, or similar) — both as a user during incidents and as an integration target for tooling.
Solid background in bash scripting and Linux systems.
Comfortable working on a distributed team with emphasis on asynchronous collaboration and documented decision-making.
Bachelor's or Master's degree in Computer Science, Engineering, or equivalent experience.

Organization	Tenable
Industry	Engineering
Occupational Category	Senior Site Reliability Engineer
Job Location	Dublin,Ireland
Shift Type	Morning
Job Type	Full Time
Gender	No Preference
Career Level	Experienced Professional
Experience	5 Years
Posted at	2026-04-20 9:42 pm
Expires on	2026-09-02