Staff Systems Engineer (f/m/d)
Aleph Alpha
Overview:
We are seeking an experienced Staff Systems Engineer to join our growing infrastructure team. As we advance our AI stack and scale our infrastructure, you will play a pivotal role in designing, maintaining, and optimizing our systems. Your expertise will help ensure high availability, security, and performance while enabling seamless deployment for our customers and internal teams.
As a key technical leader, you will guide architectural decisions, mentor engineers, and drive improvements across our infrastructure. Your contributions will be instrumental in shaping the future of our AI-powered solutions.
Your Responsibilities:
Lead the design, development, and optimization of the Pharia AI stack and the supporting infrastructure.
Define best practices and guide teams in writing Helm charts and deploying their artifacts efficiently.
Architect, set up, and maintain highly available Kubernetes (K8s) clusters on StackIT or similar cloud platforms.
You know how to design, build and maintain Kubernets Operators.
Provide strategic guidance and hands-on assistance to customers for deploying and maintaining our products on their infrastructure.
Ensure compliance with security and reliability best practices; represent the team in audits and respond to security questionnaires.
Act as a technical leader, mentoring engineers through pair programming, code reviews, and technical discussions.
Drive automation efforts and improve CI/CD pipelines to enhance deployment efficiency and system resilience.
Collaborate with cross-functional teams to align infrastructure with business and product goals.
Your Profile:
Extensive experience in designing, deploying, and maintaining Kubernetes clusters in production environments.
Automation & CI/CD Expertise: Proficiency in tools such as Helm, Ansible, Terraform, ArgoCD, GitLab CI, and JFrog.
Experience with Kubernetes Operators design and implementation.
Strong programming skills in at least one language from our stack: Rust or Go.
Deep understanding of security, reliability, and scalability best practices for infrastructure.
Proven experience mentoring engineers, leading technical projects, and setting best practices.
Excellent communication and collaboration skills, with a track record of contributing to a culture of learning and innovation.
Experience working in fast-paced startup environments is a plus.
What You Can Expect From Us:
Become part of an AI revolution!
30 days of paid vacation
Access to a variety of fitness & wellness offerings via Wellhub
Mental health support through nilo.health
Substantially subsidized company pension plan for your future security
Subsidized Germany-wide transportation ticket
Budget for additional technical equipment
Flexible working hours for better work-life balance and hybrid working model
Virtual Stock Option Plan