Description:
Manage and operate Apache Kafka (must-have skill): configure topics, manage partitions, ensure high availability, monitor metrics (e.g., consumer lag, throughput), and troubleshoot issues like message loss or latency.
Work with Axon Framework (must-have skill): design and maintain event-driven systems using CQRS/ES (Command Query Responsibility Segregation / Event Sourcing) patterns, integrate with Kafka for event streaming, and ensure scalability and resilience of distributed applications.
Manage and operate other messaging/streaming platforms such as NATS or MQ as needed.
Requirements
Responsibilities:
- Design, develop, and maintain automation scripts, tools, and integrations using languages such as Python, Go, Java, or Bash.
- Write clean, maintainable code, perform debugging, interact with APIs, and manage version control (Git) with unit testing.
- Administer Linux/Unix systems, including process management, file systems, permissions, kernel tuning, shell scripting, server configuration, updates, and security hardening.
- Build and manage cloud infrastructure on AWS, GCP, or Azure, leveraging IaC tools like Terraform or CloudFormation.
- Architect and operate scalable and highly available systems using Kubernetes, ECS, or other container orchestration tools.
- Configure and troubleshoot network protocols and services (TCP/IP, HTTP, DNS, VPNs, firewalls, load balancing) with diagnostic tools (e.g., Wireshark, traceroute).
- Implement observability practices using Splunk, Dynatrace, Prometheus, Grafana, Datadog, Jaeger/Zipkin. Define SLIs/SLOs and build dashboards for actionable insights into system health.
- Develop and maintain CI/CD pipelines with Jenkins, GitLab CI, or GitHub Actions to automate build, test, and deployment processes, including rollback strategies.
- Diagnose and resolve production issues through logs, metrics, and debugging tools. Participate in incident management, perform root cause analysis (RCA), and contribute to blameless postmortems.
- Implement security best practices: secrets management (Vault), zero-trust architectures, vulnerability management, and compliance standards (SOC 2, GDPR).
- Manage and operate Apache Kafka (must-have skill): configure topics, manage partitions, ensure high availability, monitor metrics (e.g., consumer lag, throughput), and troubleshoot issues like message loss or latency.
- Work with Axon Framework (must-have skill): design and maintain event-driven systems using CQRS/ES (Command Query Responsibility Segregation / Event Sourcing) patterns, integrate with Kafka for event streaming, and ensure scalability and resilience of distributed applications.
- Manage and operate other messaging/streaming platforms such as NATS or MQ as needed.
Qualifications:
- BS in Computer Science or a related technical field (e.g., Physics, Mathematics) OR equivalent practical experience.
- 4–5 years of hands-on experience in software development, systems administration, and cloud infrastructure management.
- Proven expertise in Apache Kafka and Axon Framework (must-have).