Staff Site Reliability Engineer

Description:

This is an exciting opportunity for someone who is passionate about driving innovation, enhancing service reliability, and making a tangible impact on the organization's success.

What You Get To Do In This Role

Provide relief and sustainable resolution to issues within our infrastructure.
Use your knowledge and experience in software development, systems engineering, and networking to proactively prevent repeatable issues.
Lead internal stakeholders and partner teams to improve the reliability, scalability and performance of the infrastructure through improved system design.
Champion and contribute to a culture of intolerance to manual activity, which results in an automation environment delivering repeatable and scalable response to system issues.

Qualifications

To be successful in this role you have:

Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry.
Excellent Knowledge of Linux systems.
Comfortable designing, authoring, testing, and debugging code in a team setting in one of the following languages such as Python, Go, Java, or Ruby.
Experience working with systems at scale - supporting critical services with focus on automation, observability, availability, and performance.
Experience with MySQL and PostgreSQL database administration, troubleshooting, and performance tuning.
Develop and maintain telemetry and monitoring solutions using OpenTelemetry standards to gain deep insights into system behaviour, proactively address issues, optimise performance, and improve efficiency through comprehensive data collection, analysis, and visualisation.
Proven experience in defining and managing SLAs.
Collaborate with development teams to ensure new services align with architectural standards and best practices.

Good To Have

Expertise in Observability and Monitoring of applications, services, and networks at scale.
Experience with DevOps automation, CI/CD pipeline and agile methodologies such as Gitlab CI-CD.
Experience writing test specifications and understand the fundamentals of test automation.
Experience working with Cloud technologies such as Azure and AWS.
Experience in configuration management of infrastructure using Ansible.
Experience with Kubernetes to orchestrate the deployment, scaling, and management of containers.
Hands-on experience with Microsoft Azure, Google Cloud (GCP) and Amazon Web Services (AWS), including designing, implementing, and maintaining reliable and scalable systems.

Organization	ServiceNow
Industry	Engineering
Occupational Category	Staff Site Reliability Engineer
Job Location	Dublin,Ireland
Shift Type	Morning
Job Type	Full Time
Gender	No Preference
Career Level	Intermediate
Experience	2 Years
Posted at	2025-09-16 10:24 am
Expires on	2026-07-22