Principal Site Reliability Engineer

 

Description:

As the Information Company, our mission at OpenText is to create software solutions and deliver services that redefine the future of digital. Be part of a winning team that leads the way in Enterprise Information Management.
As the Information Company, our mission at OpenText is to create software solutions and deliver services that redefine the future of digital. Be part of a winning team that leads the way in Enterprise Information Management
You will join a team of globally located Site Reliability Engineers to design, automate, build, operate, and continuously improve some of the services that back our customer facing SaaS products. Your focus will be on NoSQL and Big Data technologies such as Elasticsearch, Cassandra, Kafka, Redis, and RabbitMQ, integrated with the underlying IaaS (VMware, AWS, GCP, Azure) and PaaS (Cloud Foundry, K8s, Anthos). You will be responsible for delivering and operating a highly available design that includes security, scalability, monitoring, upgradeability, and data backup and recovery, across non-production and production environments. You’ll work in a fast-paced organization while quickly learning new skills and creating ways to consistently meet service-level agreements for our global cloud services.

 

The best person for this role is someone that has a collaborative spirit - in our world, it’s not about being a hero and having all the answers, it’s about sometimes saying "I don't know" and working on finding solutions rather than starting with an assumption. The team needs someone who can ask questions, learn from others, and turn chaos into order. This role would be a great fit for someone with creative and innovative problem-solving skills. You will develop and implement solutions that operate at scale. Our teams are empowered and expected to improve our products to truly deliver a reliable experience to customers.

  • Designing, automating, building, operating, and continuously improving multiple backing services including Cassandra, Elasticsearch, Kafka, RabbitMQ, Redis, and Solr
  • Building software and systems to manage infrastructure and backing services for customer-facing OpenText applications through automation tools such as Terraform and Ansible
  • Working closely with development and application support teams to design, build, deploy, support, and monitor new and existing deployments, including gathering requirements and documenting the solution
  • Identifying tactical and strategic opportunities to improve service health, performance, reliability, and telemetry
  • Contributing to capacity planning and management processes
  • Supporting the migration of legacy deployments to modernized design patterns
  • Supporting and responding to service requests that satisfy our OLAs
  • Supporting incident resolution process for backing services that we are responsible for
  • Participating in training and information sharing activities
  • Interacting with third party provider(s) who provide additional expertise and a layer of escalation support for our services
  • Implementing best practices and operating environments for Kafka, helping with topic creation and management, owning and managing the Kafka Schema Registry, helping new teams with Kafka usage, educating teams on Kafka capabilities, and helping teams to adopt new features
  • Implementing best practices and operating environments for Elasticsearch, helping with index creation and lifecycle management, shard scaling, index rollover and rollup strategies and performance tuning
  • Implementing best practices and operating environments for Cassandra, helping with node sizing, datacenter and rack topology, replication factor, quorum configuration and driving settings
  • Implementing best practices and operating environments for Redis, RabbitMQ, and Solr
  • Acting as backup for other team members when necessary
  • Problem solving and finding solutions to resolve issues
  • Building repeatable application technology design patterns
  • Learning new technology on your own or in conjunction with an online learning platform
  • Creating and updating documentation such as operational procedures, change execution plans, and incident write-ups
  • May require shift work
  • On-call rotation is required, as 7x24x365 support is required

 

Organization opentext
Industry IT / Telecom / Software
Occupational Category Principal Site Reliability Engineer
Job Location Cork,Ireland
Shift Type Morning
Job Type Full Time
Gender No Preference
Career Level Experienced Professional
Experience 10 Years
Posted at 2023-01-17 4:01 pm
Expires on 2024-06-04