trivago provides travelers with an extensive collection of hotels, empowering them to compare prices and uncover the best vacation deals. With so many exceptional options available, we have introduced a new feature called "Favorites" to streamline the navigation process. This feature enables users to effortlessly save their preferred accommodations and access them later, ensuring ease of use. To access this feature, visit https://www.trivago.com/en-US/favorites.
How to substantially slow down your Node.js server
Back in March 2022, after spending a considerable amount of effort migrating our monolithic Node.js GraphQL server from Express to Fastify, we noticed absolutely no performance improvements in production. That hit us like a bombshell, especially because Fastify performed exceptionally well in our
k6 load tests in staging, where it responded to HTTP requests 107% (more than two times) faster on average than Express!
How to Survive a Regional Outage
As I’m writing this, we’re in the middle of our yearly load testing process.
Since a couple of years now, trivago conducts regular production load tests. We do this to test if all our services sustain the increased load we experience during the summer and winter months.
This year is now the 2nd time, where we also do another test: A "regional failover" test.
How we got on top of our data
Scalability and availability are key aspects of cloud native computing. If your microservice takes five minutes to start up, it becomes very difficult to meet the expectations because adjustments to traffic changes, regional failovers, hot-fixes and rollbacks are simply too slow. In this article, we show how we solved this and a few other problems by taking control of the process of updating our data and storing it in a highly available Redis setup.
Why and how we use primitive maps
At trivago we operate on petabytes of data. In live-traffic applications that are related to the bidding business cases we use our in-house in-memory key-value storage-service written in Java to keep data as close to calculation logic as possible.
How we build the Image Gallery on trivago
When was the last time you booked accommodation without checking its photos? Most probably never! Because having imagery information makes our decision-making process much easier and faster. However, picking up the best possible images of a hotel to show to the user is an interesting problem to solve, because it can be a naive random selection or a sophisticated machine learning model to know what the user truly wants at that moment.
Proper (Java) application life cycle management in Kubernetes
When operating applications in Kubernetes, proper lifecycle management is crucial to enable Kubernetes to manage applications correctly throughout their different phases: startup, runtime and shutdown. Improper or incomplete lifecycle management can lead to incidents with unforeseen and difficult to debug application behavior, such as random CrashLoopBackOffs, broken/zombie services not being restarted or even entire services not becoming healthy after a scheduled restart.
Java Reactive Programming - Effective Usage in a Real World Application
This article presents how trivago's search backend team used reactive programming in Java effectively when designing and implementing one of our many Java backend services. Compared to traditional imperative and functional programming, reactive programming requires a mindset-shift in order to apply the concepts and techniques effectively. The benefits we gain support us in some key challenges that every engineer is facing with essentially every (micro-) service in today’s backend architectures: handling of blocking IO, backpressure, managing highly varying loads as well as message and error propagation.
Reactive Programming - The Price You Have To Pay For A Responsive Backend
In the trivago backend, we use the reactive programming pattern for fetching prices from advertisers and updating our caches. This helps us to increase the responsiveness (i.e., scalability and resilience) of our backend. Thus, our backend system can alleviate high response times from internal components and our advertisers while staying responsive, even if downstream components fail entirely. Here is how we use the Java library Reactor Core to ensure those guarantees:
How To Get Fooled By Metrics
Metrics are one of the main building blocks in the topic of observability.
Hence, we have a lot of metrics within our applications and especially for the connections between our applications. Every outgoing request has its latency measured and we also record the sizes of the request and the response. These numbers are collected in histograms and based on that data, in our Grafana graphs, we create corresponding graphs that show us e.g. the median size of request- and response payloads or the 99th percentile of call durations.
trivago Tech Check-in: Meet Fabian
In our new series, trivago Tech Check-in, we're introducing you to some of our tech talents from across the globe who help keep our metasearch engine running smoothly everyday. In this first edition, you'll meet Fabian Fritzsche, an engineering intern that works on the Microservice-System that feeds our GraphQL API with up-to-date hotel data.
Google Cloud Workload-Placement-Guide
At trivago we operate a hybrid infrastructure of both on-premise machines and clusters on Google Cloud. Over time, we came up with a set of deployment guidelines for running our workloads as more and more of them are migrating to Google Cloud. These are not strict rules, but rather suggestions to best serve each team's needs.
Cross-Cluster Traffic Mirroring with Istio
The price of reliability is the pursuit of the utmost simplicity.
— C.A.R. Hoare, Turing Award lecture
Have you ever enthusiastically released a new, delightful version to production and then suddenly started hearing a concerning number of notification sounds? Gets your heart beating right? After all, you didn't really expect this to happen because it worked in the development environment.
ElasticWars Episode IV: A new field
On a normal day, we ingest a lot of data into our ELK clusters (~6TB across all of our data centers). This is mostly operational data (logs) from different components in our infrastructure. This data ranges from purely technical info (logs from our services) to data about which pages our users are loading (intersection between business and technical data).
Accommodation Consolidation: How we created an ETL pipeline on cloud
Imagine you go to your hotel for check-in and they say that your dog is not allowed even though the website clearly states that it is!
trivago gets information about millions of accommodations from hundreds of partners and they keep on updating. There are many differences not just in the data format, but also in the data itself. There can be many discrepancies in the information and consolidating them can be a very complex process. But it's our responsibility to provide the most accurate information to the best of our knowledge.
Follow us on