Description:
OpenShift SRE is a sophisticated, global, fast-paced team inside the world's open source leader with constant opportunities to learn new skills and innovate new solutions to meet our customers' demands. As a Senior Software Engineer on this team, you will directly contribute to Red Hat's success in the rapidly growing Kubernetes as a Service (KaaS) market.
What You Will Do
- Design and write automation software to provision, upgrade, monitor, and heal a large global fleet of Red Hat OpenShift clusters deployed across multiple public clouds
- Identify single points of failure and other high-risk architecture issues; propose and implement more resilient resolutions
- Participate in the release cycles of our offerings, deploying code to integration, staging, and production environments, integrating with continuous integration (CI) and continuous delivery (CD) tooling, monitoring, and change management
- Perform software updates, peer code reviews, testing, and Common Vulnerabilities and Exposures (CVE) analysis; respond to security threats
- Interact with automated monitoring and healing infrastructure to ensure healthy environments
- Provide engineering support to Red Hat's global technical support team to resolve customer issues
- Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediating problems in our environment
- Participate in a global on-call rotation, including periodic weekend and holiday on-call duties
What you will bring
- 3+ years of software engineering experience using object-oriented languages; Golang and Python are preference
- Experience managing Linux-based systems in a public cloud like Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure
- Commercial experience with enterprise system monitoring; knowledge of Prometheus is a plus
- Experience with container technology, Kubernetes, Openshift and configuration management tools ( Red Hat Ansible Automation, Puppet, or Chef) is a big plus
- Demonstrated ability to quickly and accurately troubleshoot systems issues
- Solid written and verbal communication skills in English