When was the last time you booked accommodation without checking its photos? Most probably never! Because having imagery information makes our decision-making process much easier and faster. However, picking up the best possible images of a hotel to show to the user is an interesting problem to solve, because it can be a naive random selection or a sophisticated machine learning model to know what the user truly wants at that moment.
Cloud at trivago
Insights, experiences and learnings from trivago's tech teams.
When operating applications in Kubernetes, proper lifecycle management is crucial to enable Kubernetes to manage applications correctly throughout their different phases: startup, runtime and shutdown. Improper or incomplete lifecycle management can lead to incidents with unforeseen and difficult to debug application behavior, such as random CrashLoopBackOffs, broken/zombie services not being restarted or even entire services not becoming healthy after a scheduled restart.
At trivago we operate a hybrid infrastructure of both on-premise machines and clusters on Google Cloud. Over time, we came up with a set of deployment guidelines for running our workloads as more and more of them are migrating to Google Cloud. These are not strict rules, but rather suggestions to best serve each team’s needs.
The price of reliability is the pursuit of the utmost simplicity.
— C.A.R. Hoare, Turing Award lecture
“The Cloud Native Computing Foundation (CNCF) hosts critical components of the global technology infrastructure. CNCF brings together the world’s top developers, end users, and vendors and runs the largest open source developer conferences. CNCF is part of the nonprofit Linux Foundation.” - CNCF
Many of our data pipelines interact with external services. The availability of an external service can adversly affect the health our pipelines. This is how we handle it using AWS Step Functions