Senior Site Reliability Engineer

 

Description:

OpenShift SRE is a sophisticated, global, fast-paced team inside the world's open source leader with constant opportunities to learn new skills and innovate new solutions to meet our customers' demands. As a Senior Software Engineer on this team, you will directly contribute to Red Hat's success in the rapidly growing Kubernetes as a Service (KaaS) market.

What You Will Do
 

  • Design and write automation software to provision, upgrade, monitor, and heal a large global fleet of Red Hat OpenShift clusters deployed across multiple public clouds
  • Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions
  • Participate in the release cycles of our offerings, deploying code to integration, staging, and production environments, integrating with continuous integration (CI) and continuous delivery (CD) tooling, monitoring, and change management
  • Perform software updates, peer code reviews, testing, and Common Vulnerabilities and Exposures (CVE) analysis; respond to security threats
  • Interact with automated monitoring and healing infrastructure to ensure healthy environments
  • Provide engineering support to Red Hat's global technical support team to resolve customer issues
  • Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediating problems in our environment
  • Participate in a global on-call rotation, including periodic weekend and holiday on-call duties
     

What you will bring
 

  • 3+ years of software engineering experience using object-oriented languages; Golang and Python are preference
  • Experience managing Linux-based systems in a public cloud like Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure
  • Commercial experience with enterprise system monitoring; knowledge of Prometheus is a plus
  • Experience with container technology, Kubernetes, Openshift and configuration management tools ( Red Hat Ansible Automation, Puppet, or Chef) is a big plus
  • Demonstrated ability to quickly and accurately troubleshoot systems issues
  • Solid written and verbal communication skills in English

Organization Red Hat
Industry Engineering
Occupational Category Senior Site Reliability Engineer
Job Location Dublin,Ireland
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Experienced Professional
Experience 3 Years
Posted at 2025-09-25 9:18 am
Expires on 2025-11-09