Description:
When you join iCIMS, you join the team helping global companies transform business and the world through the power of talent. Our customers do amazing things: design rocket ships, create vaccines, deliver consumer goods globally, overnight, with a smile. As the Talent Cloud company, we empower these organizations to attract, engage, hire, and advance the right talent. We’re passionate about helping companies build a diverse, winning workforce and about building our home team. We're dedicated to fostering an inclusive, purpose-driven, and innovative work environment where everyone belongs
Responsibilities
- Ensure Production Stability: Monitor availability and performance across the entire production environment to maintain optimal operations.
- Leverage Monitoring Tools: Track cloud resource utilization and performance metrics to identify trends and potential issues proactively.
- Data-Driven Insights: Generate regular performance reports and recommend enhancements based on detailed analysis.
- Incident Management Excellence: Lead the restoration of normal service operations swiftly, including assessment, research, escalation, communication, and resolution management.
- Execute Production Changes: Implement necessary changes to support both internal and external customer needs
- Operational Support: Provide effective triage and resolution for operational support requests.
- Documentation & Standards: Review and refine SOPs, policies, procedures, and system requirements to ensure accuracy and relevance.
- Automation Development: Create and maintain automation scripts using Python and Java to streamline processes and reduce manual effort.
- Infrastructure as Code (IaC): Apply IaC practices to improve deployment efficiency, consistency, and scalability.
- Comprehensive Documentation: Prepare detailed electronic documentation, including SLAs, performance metrics, installation guides, and implementation guides.
- Reduce Manual Work: Identify repetitive tasks and implement automation solutions to eliminate inefficiencies.
- Performance Reviews: Participate in monthly metric reviews to support uptime goals of 99.9%-99.99%.
- Drive Innovation: Demonstrate passion, initiative, and urgency in seeking innovative solutions and resolving issues effectively.
Qualifications
Technical Expertise
- 2 to 4 years in administration and production support experience with on-call responsibilities
- 1+ years of strong Cloud provider experience and demonstrated knowledge
- Observability tooling experience
Preferred
Preferred Qualifications
- Experience with AWS / AWS Certifications
- Exposure to other cloud technologies like Azure and GCP