Are you concerned with creating scalable and highly reliable software systems? Do you know how to apply the finer aspects of software engineering to the areas of infrastructure and operations? Then you might be just the one we are looking for!
WPP Open Dynamic Content (ODC in short) is a young, ambitious and fast-growing organization where flexibility, commitment and quality are highly valued. We are here to change digital advertising globally: quality is an essential precondition. We are not looking for the quick fix or the workarounds, but we want to develop our product with the utmost concern for stability, scalability and flexibility from a long-term vision. This is where your expertise comes in!
We are currently conquering the globe with our solutions and this brings new and interesting challenges. As such we are looking for a Site Reliability Engineer who can help design a platform that works across multiple data centers reliably with low latency. Responsibilities include, but are not limited to:
• Designing and implementing software that improves stability, scalability, availability and latency and designing, building and running tests to verify this.
• Setting up system health monitoring and automated processes to prevent outages
• Defining correcting actions and support in recovering quickly from actual outages
• Implementing automation tools for continuous integration/delivery/deployment
• Help the team to develop good practices around monitoring and response
• You can debug both automated and human processes.
• You can work both in software engineering and in automation.
• You enjoy teaching and practice.
• You're able to find a balance in all things
• You have some experience managing stateful distributed systems.
Bonus points if you meet one or more of the following requirements:
• Ability to program with one or more high level languages (such as Python, Go or Clojure) with a proven track record of automation and an algorithmic approach to solving problems.
• In-depth knowledge and experience in at least one of: troubleshooting, host-based networking, Linux or UNIX engineering, systems programming, distributed systems, databases, cloud computing, and a desire to learn more.
• Experience with one or more of the following: Terraform, AWS, Kubernetes, Helm, Prometheus, Grafana, ElasticSearch, Kibana, Redis, Kafka.