Location This is a hybrid role that requires on-site work at our London office three (3) days a week. Our office is conveniently located in WeWork at 1 Mark Square, London, EC2A 4EG.
Elevator Pitch Stacklok Cloud is a comprehensive security platform that combines open source package intelligence with a policy platform built on the open source project, Minder, allowing developers to securely consume open source software while enabling security teams to effectively manage and maintain a robust security posture across the entire software supply chain.
As Stacklok Cloud is delivered to major companies across the world, ensuring its scalability, security, performance, and reliability is essential. We're seeking a Site Reliability Engineer II to contribute to initiatives within a product team, focusing on automation, monitoring, configuration management, continuous delivery, and incident response. This role involves applying both SRE and software engineering expertise to ship new features and serve as a resource for best practices in reliability, performance metrics, and system resilience. Additionally, participation in Stacklok's SRE guild will be integral, collaborating with peers to drive consistent practices in automation, observability, and reliability across all products, fostering a seamless and high-performing SaaS platform.
Join our team of exceptionally talented engineers and become part of a groundbreaking field that tackles critical challenges for developers and the OSS community. Contribute to an open source strategy that focuses on building and expanding an ecosystem for diverse OSS tools, and help shape the future of open source development with innovative and impactful work.
Success In The Role: 6-12 Months Expectations Acclimatize to the Team: Familiarize yourself with our engineering processes. Build connections with team members, immerse yourself in our company culture, understand our virtues, and learn the way we work and collaborate.Solid Understanding of Our Products and Services: Gain a solid understanding of Stacklok Cloud products and services, our vision of the platform as well as short and long-term goals to align your contributions to our objectives.
Deep Dive Into Stacklok Cloud Architecture: Become comfortable with the current infrastructure-as-code environment using Terraform to deploy SaaS software to Kubernetes on AWS.
Proficiency in Go and Python:Develop strong proficiency in Go, our primary programming language, focusing on best practices, idiomatic design patterns, and effective error handling, and unit testing. Demonstrate strong foundational knowledge of Python, specifically in lavering its capabilities for automation, scripting, and building internal tools.
Hybrid Contribution: Be an integral part of our product engineering team, advancing the reliability and operational efficiency of our systems. With a primary focus on platform functionalities, you'll also have opportunities to make direct contributions to feature development, enhancing the capabilities of our products
Technical Guidance and Documentation: Support production infrastructure by contributing to and maintaining comprehensive documentation, including playbooks and architectural diagrams, to ensure team alignment. In This Role You Will Have The Opportunity To Shape The Future of Stacklok Cloud: As a site reliability engineer, you'll play a key role in supporting and enhancing our platform's reliability and performance. Your focus will include regular platform upgrades and the instrumentation and monitoring of production systems. You'll help advance our platform and shape strategies for the future of software supply chain security.
Embrace an Automate Everything Mindset: Contribute to a culture of automation by streamlining operational tasks and enhancing efficiency across the environment. You'll support automation initiatives for incident management tooling, application autoscaling, and recovery processes to ensure resilient systems that adapt to changing demands. Collaborating with a skilled team, you'll help automate playbooks, continuous delivery pipelines, and GitHub Terraform processes, driving improvements in service delivery and incident response.
Monitor and Improve Service Performance: Support end-to-end monitoring of service KPIs to drive improvements and maintain optimal performance. You'll regularly review logs and performance metrics, using shared tools and incident response automations to enhance system reliability. With an analytical mindset, you'll contribute to identifying areas for KPI improvement, helping us consistently meet and exceed our performance goals.
Learn and Grow with Mentorship Opportunities: Work alongside experienced engineers who will support your professional growth and skill development. By collaborating in a culture of empathy, curiosity, and psychological safety, you'll deepen your understanding of infrastructure and site reliability best practices. Engaging in code reviews and team discussions will allow you to refine your skills, share insights, and contribute to a strong, capable team. This role offers a clear path for growth, helping you build toward new responsibilities and technical expertise. We understand that not everyone will meet every requirement listed, and that's perfectly okay! We encourage you to apply regardless of your self-assessment. We value a diverse range of skills and experiences and believe that your unique attributes can make a significant impact. We want to hear from you!
Desired Skill & Experience Experience in Site Reliability Engineering supporting an enterprise SaaS service with evidence of maintaining high availability and performance in production environments. Proficient in programming languages, particularly Go and Python, demonstrating the ability to write clean, efficient, and maintainable code. Familiarity with Infrastructure as Code (IaC) principles, with proficiency in automation tools like Terraform for environment provisioning and configuration management. Experience with a major cloud provider (AWS, Azure, Google), preferably AWS. Understanding of cloud-native application deployment and management using technologies like Docker and Kubernetes with exposure to scaling and recovery strategies. Experience in automating incident response processes using platforms such as PagerDuty to improve response times and incident management efficiency. Proficient in log aggregation and analysis tools such as AWS Athena and Cloudwatch enabling thorough performance reviews and proactive issue identification. Exposure to defining and implementing Service Level Objectives (SLOs) and key performance indicators (KPIs) to drive service quality and operational excellence. Knowledge of security best practices in site reliability, with an emphasis on operational security measures and maintaining a secure software supply chain. Impact-Driven and Collaborative: Track record of delivering solutions that drive business outcomes; excellent written and verbal communication skills for engaging diverse stakeholders. Committed to fostering growth and continuous improvement within teams. Versatile and Self-Starting: Adaptable in dynamic, startup environments, comfortable in varied roles—from individual contributor to conference presenter—and skilled at making technical topics accessible to broad audiences. #LI-Hybrid
#J-18808-Ljbffr
THE COMPANY Well established engineering HVAC company based in Hertfordshire who have been operating for over 20 years. The business is split up into 2 divis...
Pertemps Tm - England
Published 7 days ago
JJ Rhatigan & Company is one of Ireland's leading Building Contractors, with operations throughout Ireland and the UK. We are currently recruiting for a Quan...
Jj Rhatigan Building Contractors - England
Published 7 days ago
About Aira Clean-energy tech leads to a positive global transformation. Heating homes contribute to 15% of Europe's CO2 emissions. Switching to sustainable s...
Aira - England
Published 7 days ago
Company DescriptionAt Turner & Townsend we're passionate about making the difference. That means delivering better outcomes for our clients, helping our peop...
Turner & Townsend - England
Published 7 days ago
Built at: 2024-11-22T04:33:08.283Z