Staff Site Reliability Engineer

 

Description:

This is an exciting opportunity for someone who is passionate about driving innovation, enhancing service reliability, and making a tangible impact on the organization's success.

What You Get To Do In This Role
 

  • Provide relief and sustainable resolution to issues within our infrastructure.
  • Use your knowledge and experience in software development, systems engineering, and networking to proactively prevent repeatable issues.
  • Lead internal stakeholders and partner teams to improve the reliability, scalability and performance of the infrastructure through improved system design.
  • Champion and contribute to a culture of intolerance to manual activity, which results in an automation environment delivering repeatable and scalable response to system issues.
     

Qualifications

To be successful in this role you have:
 

  • Experience in leveraging or critically thinking about how to integrate AI into work processes, decision-making, or problem-solving. This may include using AI-powered tools, automating workflows, analyzing AI-driven insights, or exploring AI's potential impact on the function or industry.
  • Excellent Knowledge of Linux systems.
  • Comfortable designing, authoring, testing, and debugging code in a team setting in one of the following languages such as Python, Go, Java, or Ruby.
  • Experience working with systems at scale - supporting critical services with focus on automation, observability, availability, and performance.
  • Experience with MySQL and PostgreSQL database administration, troubleshooting, and performance tuning.
  • Develop and maintain telemetry and monitoring solutions using OpenTelemetry standards to gain deep insights into system behaviour, proactively address issues, optimise performance, and improve efficiency through comprehensive data collection, analysis, and visualisation.
  • Proven experience in defining and managing SLAs.
  • Collaborate with development teams to ensure new services align with architectural standards and best practices.
     

Good To Have
 

  • Expertise in Observability and Monitoring of applications, services, and networks at scale.
  • Experience with DevOps automation, CI/CD pipeline and agile methodologies such as Gitlab CI-CD.
  • Experience writing test specifications and understand the fundamentals of test automation.
  • Experience working with Cloud technologies such as Azure and AWS.
  • Experience in configuration management of infrastructure using Ansible.
  • Experience with Kubernetes to orchestrate the deployment, scaling, and management of containers.
  • Hands-on experience with Microsoft Azure, Google Cloud (GCP) and Amazon Web Services (AWS), including designing, implementing, and maintaining reliable and scalable systems.

Organization ServiceNow
Industry Engineering
Occupational Category Staff Site Reliability Engineer
Job Location Dublin,Ireland
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Intermediate
Experience 2 Years
Posted at 2025-09-16 10:24 am
Expires on 2025-10-31