The OpportunityAre you interested in making a difference? To work for a tech-for-good company whose reason for being is to help all boards and leadership teams to be a powerful driver of performance and a force for good? Board Intelligence is on a mission to bring kindness and success together and to drive companies to think about what matters. We work with over 30,000 Chairs, CEOs, and board members to embed the discipline of focus into their organisations, and we're helping a new board every day to focus on what matters. We are in it for the long term, come join us on this journey.
As a Senior Site Reliability Engineer (SRE), you'll be joining a team whose mission is to ensure the availability, performance, security and reliability of our platform and core services, ensuring that they meet the needs of our internal and external users. You will take the lead on projects across the entire breadth of our tech stack, from planning all the way through to delivery and maintenance - you will bring others on the team with you on the journey too and not just go it alone. You will be responsible for visibility and monitoring of those systems, for building tooling and automation to reduce TOIL and for responding to incidents as part of our 24/7 SRE on-call team.
Key responsibilities of the roleWe're looking for a great Senior SRE to be a hands-on individual contributor to key technical projects and to help us build a first-class SRE function. This role will involve:
Hands-on work with technical projects, taking direction from the team PrincipalsImplement and maintain monitoring solutions / metric-driven alerting, logging and tracingTroubleshoot in complex environmentsEstablish and measure SLIs and SLOs with engineering teams and continuously improve relationships and ways of working with other engineering teamsParticipate in periodic 24x7 paid on-call dutiesBuild and manage systems, infrastructure and applications using infrastructure as code and automation (Terraform, Ansible, K8s, Helm, Go)Pair programming, knowledge sharing and running appropriate training sessions for the teamWriting well-defined tickets (and supporting documentation when required) as well as keeping them up-to-dateTraitsStrong communication skills with the ability and openness to work across a range of varied stakeholders and confidence to check and challenge when required.Cares about evolving SRE best practices (through a security lens) and is driven to find the right ways of working with the teamIs self-driven and constantly striving to improve everything with automation and monitoringIs able and willing to travel to our physical datacenters in the U.K should the need ariseDemonstrates and promotes positive attitudes and behaviours: collaboration, learning, sharing, respect and kindnessWhat experience and skills might you haveWe prefer to work with the best talent regardless of whether you are familiar with all of the tools that we use. We don't need you to be familiar with everything on this list but experience in some or all of these areas will be useful and a willingness to dive in and learn the others, essential.
A strong background in SRE/DevOps or Linux System AdministrationA strong background in system automation using configuration management systems such as Ansible, Chef or Puppet.A solid understanding of containerisation and container orchestration using tools such as KubernetesExperience with creation of automation using APIsExperience of automation testing in an Agile Software environmentClose familiarity with some or all of:Network management and optimisationPostgresql Database management and optimisationWith common security frameworks CIS, NIST, OWASPFamiliarity with Public Cloud Services like AWS | GCP | AzureFamiliarity with co-located physical infrastructure (we're currently hybrid)Solid understanding of Continuous Integration (CI) and Continuous Deployment (CD)Close familiarity with or direct experience of the trade-offs and design decisions Software Engineers need to make when developing applications that must perform and scale well in the real worldExperience with technical writing and or reviewing technical designsStrong experience and understanding of Agile practices including Scrum, Kanban etcAn understanding of one or more of the following languages: Ruby, Java, Go, Bash/ShellStrong experience with issue tracking software like Jira and story management lifecycle in generalEngineering at Board IntelligenceEveryone says it, but in our case it's true: Each member of our engineering team is amazing in their own right, but together they are what brings our product to life.
We're very proud of the team we've built – there's around 50 of us in Product and Tech now after growing quickly in 2023/24. We have ambitious plans to further improve our ways of engineering and to continue to enable boards to 'see what matters'. You'll play a big role in helping us achieve this in 2025/26 and beyond.
Tech StackOur applications are written in Ruby (with Rails) or Java. Client-side web apps are written in React, and some services in Clojure, Java and Go.
Our platform consists of:
Multiple Kubernetes Cluster for Container orchestrationApache Kafka and Redis shortly Postgres for event messagingPostgres for data storageOpenStack Swift for Object storageJuniper & Cisco networking devicesA number of internally written tools for managing the platform written in GoWe run our own physical infrastructure co-located in three datacentres across the UK. We also run a public cloud Production Environment on GCP for one of our products and we're moving in the direction of more public cloud for production and pre-production environments and pipelines.
BenefitsCompetitive salary & pension schemePersonal performance bonus26 days holiday each calendar yearBupa health & dental coverGroup life insuranceEAP; AIG Smart Health and Bereavement Counselling & Probate HelplineRegular training & development, mini MBA series, lunch & learnsCycle to work schemeCompetitive parental policiesGym membership discountsMonthly company socials
#J-18808-Ljbffr