Scalability and availability are key aspects of cloud native computing. If your microservice takes five minutes to start up, it becomes very difficult to meet the expectations because adjustments to traffic changes, regional failovers, hot-fixes and rollbacks are simply too slow. In this article, we show how we solved this and a few other problems by taking control of the process of updating our data and storing it in a highly available Redis setup.
Backend at trivago
Insights, experiences and learnings from trivago's tech teams.
At trivago we operate on petabytes of data. In live-traffic applications that are related to the bidding business cases we use our in-house in-memory key-value storage-service written in Java to keep data as close to calculation logic as possible.
When was the last time you booked accommodation without checking its photos? Most probably never! Because having imagery information makes our decision-making process much easier and faster. However, picking up the best possible images of a hotel to show to the user is an interesting problem to solve, because it can be a naive random selection or a sophisticated machine learning model to know what the user truly wants at that moment.
When operating applications in Kubernetes, proper lifecycle management is crucial to enable Kubernetes to manage applications correctly throughout their different phases: startup, runtime and shutdown. Improper or incomplete lifecycle management can lead to incidents with unforeseen and difficult to debug application behavior, such as random CrashLoopBackOffs, broken/zombie services not being restarted or even entire services not becoming healthy after a scheduled restart.
This article presents how trivago's search backend team used reactive programming in Java effectively when designing and implementing one of our many Java backend services. Compared to traditional imperative and functional programming, reactive programming requires a mindset-shift in order to apply the concepts and techniques effectively. The benefits we gain support us in some key challenges that every engineer is facing with essentially every (micro-) service in today’s backend architectures: handling of blocking IO, backpressure, managing highly varying loads as well as message and error propagation.
In the trivago backend, we use the reactive programming pattern for fetching prices from advertisers and updating our caches. This helps us to increase the responsiveness (i.e., scalability and resilience) of our backend. Thus, our backend system can alleviate high response times from internal components and our advertisers while staying responsive, even if downstream components fail entirely. Here is how we use the Java library Reactor Core to ensure those guarantees:
Metrics are one of the main building blocks in the topic of observability.Metrics are one of the main building blocks in the topic of observability.
In our new series, trivago Tech Check-in, we're introducing you to some of our tech talents from across the globe who help keep our metasearch engine running smoothly everyday. In this first edition, you'll meet Fabian Fritzsche, an engineering intern that works on the Microservice-System that feeds our GraphQL API with up-to-date hotel data.
At trivago we operate a hybrid infrastructure of both on-premise machines and clusters on Google Cloud. Over time, we came up with a set of deployment guidelines for running our workloads as more and more of them are migrating to Google Cloud. These are not strict rules, but rather suggestions to best serve each team's needs.