A World-Changing Company Palantir builds the world's leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.
The Role Product Reliability Engineers (PREs) are the driving forces of stability across Palantir's products and help to ensure our products are available 24/7. When something goes wrong, we are the first to respond and are responsible for triaging, troubleshooting, and coordinating the resolution.
Every day at Palantir is different: we're constantly evolving to better respond to customer needs, and as a PRE you will work closely with our engineering and business teams to minimize risks. You are a resourceful, creative, and agile problem solver who is able to work collaboratively and independently to resolve the most difficult and nebulous technical issues. This includes everything from creating product health metrics and automated alerts, to fixing product bugs, streamlining operational tasks, and developing and documenting strategies for responding to incidents.
Whatever the technical root cause of the issue is, you'll play a central and critical role in resolving it — seeking not just a one-time fix, but a permanent solution.
Core ResponsibilitiesDevelop a deep understanding of Palantir's products and processes.Collaborate with customer-facing, product, and infrastructure teams on the development and deployment of scalable, reliable software for our customers.Deliver end-to-end improvements to stability by proactively preventing issues via telemetry and automation and directly reducing the need for reactive support.Maintain and improve the operational capacity of our production databases, including resolving incidents and streamlining operational workflows.Reduce the operational overhead and make data-driven decisions about investments in stability and reliability.Take part in an on-call rotation responsible for coordinating Palantir's response to critical incidents, ensuring efficient resolution with minimal customer impact.What We ValueExcellent problem solving skills, ability to break down and explain complex concepts, and strong attention to detail.Comfort working in a fast moving environment with dynamic objectives that require creative thinking to address product and customer needs.Ability to work both independently and make decisions autonomously, as well as collaborate as part of a distributed team with members from our offices across America and Europe.Experience coding with Java, Go and/or web technologies (e.g. HTML, CSS, JavaScript, Python/Ruby, Django/Flask/Ruby on Rails, etc.) is a plus.Experience with distributed computing systems and/or cloud infrastructures (e.g. Spark, Hadoop, YARN, Kubernetes, AWS, etc.) is a plus.What We RequireBackground in Computer Science, Engineering, Information Systems, or other technical field.
#J-18808-Ljbffr