Backend at trivago

Accelerating experimentations through Simulations

During the development of customer-facing applications, time is crucial, especially when it comes to testing and analyzing changes before accepting them in production. This blog post explores how we developed a Java-based reactive tool to simulate production requests, that allows us to have quicker hints about the effects of changes introduced and be more confident about the hypotheses that are formulated. As a long term vision, we wish to significantly reduce the A/B testing time and ensure seamless transitions.

Rishav Jayswal Ricardo Hernandez-Montoya · 20 Nov 2023 · 8 min read

Building Our First GraphQL Server with Go: An Implementation Guide

trivago provides travelers with an extensive collection of hotels, empowering them to compare prices and uncover the best vacation deals. With so many exceptional options available, we have introduced a new feature called "Favorites" to streamline the navigation process. This feature enables users to effortlessly save their preferred accommodations and access them later, ensuring ease of use. To access this feature, visit https://www.trivago.com/en-US/favorites.

Kutlu Eren · 17 May 2023 · 5 min read

How to substantially slow down your Node.js server

Back in March 2022, after spending a considerable amount of effort migrating our monolithic Node.js GraphQL server from Express to Fastify, we noticed absolutely no performance improvements in production. That hit us like a bombshell, especially because Fastify performed exceptionally well in our k6 load tests in staging, where it responded to HTTP requests 107% (more than two times) faster on average than Express!

Abdelrahman Abdelhafez · 15 Sept 2022 · 7 min read

How to Survive a Regional Outage

As I’m writing this, we’re in the middle of our yearly load testing process.
Since a couple of years now, trivago conducts regular production load tests. We do this to test if all our services sustain the increased load we experience during the summer and winter months.
This year is now the 2nd time, where we also do another test: A "regional failover" test.

Arne Claus · 15 Aug 2022 · 7 min read

How we got on top of our data

Scalability and availability are key aspects of cloud native computing. If your microservice takes five minutes to start up, it becomes very difficult to meet the expectations because adjustments to traffic changes, regional failovers, hot-fixes and rollbacks are simply too slow. In this article, we show how we solved this and a few other problems by taking control of the process of updating our data and storing it in a highly available Redis setup.

Kevin Beineke · 4 May 2022 · 9 min read

Why and how we use primitive maps

At trivago we operate on petabytes of data. In live-traffic applications that are related to the bidding business cases we use our in-house in-memory key-value storage-service written in Java to keep data as close to calculation logic as possible.

Mikhail Chernyakov · 9 Mar 2022 · 8 min read

How we build the Image Gallery on trivago

When was the last time you booked accommodation without checking its photos? Most probably never! Because having imagery information makes our decision-making process much easier and faster. However, picking up the best possible images of a hotel to show to the user is an interesting problem to solve, because it can be a naive random selection or a sophisticated machine learning model to know what the user truly wants at that moment.

Praneeth Peiris · 7 Jul 2021 · 16 min read

Proper (Java) application life cycle management in Kubernetes

When operating applications in Kubernetes, proper lifecycle management is crucial to enable Kubernetes to manage applications correctly throughout their different phases: startup, runtime and shutdown. Improper or incomplete lifecycle management can lead to incidents with unforeseen and difficult to debug application behavior, such as random CrashLoopBackOffs, broken/zombie services not being restarted or even entire services not becoming healthy after a scheduled restart.

Stefan Nothaas Lars Heß · 9 Jun 2021 · 8 min read

Java Reactive Programming - Effective Usage in a Real World Application

This article presents how trivago's search backend team used reactive programming in Java effectively when designing and implementing one of our many Java backend services. Compared to traditional imperative and functional programming, reactive programming requires a mindset-shift in order to apply the concepts and techniques effectively. The benefits we gain support us in some key challenges that every engineer is facing with essentially every (micro-) service in today’s backend architectures: handling of blocking IO, backpressure, managing highly varying loads as well as message and error propagation.

Stefan Nothaas · 16 Mar 2021 · 15 min read

Reactive Programming - The Price You Have To Pay For A Responsive Backend

In the trivago backend, we use the reactive programming pattern for fetching prices from advertisers and updating our caches. This helps us to increase the responsiveness (i.e., scalability and resilience) of our backend. Thus, our backend system can alleviate high response times from internal components and our advertisers while staying responsive, even if downstream components fail entirely. Here is how we use the Java library Reactor Core to ensure those guarantees:

Kevin Beineke · 24 Feb 2021 · 3 min read

How To Get Fooled By Metrics

Metrics are one of the main building blocks in the topic of observability.

Hence, we have a lot of metrics within our applications and especially for the connections between our applications. Every outgoing request has its latency measured and we also record the sizes of the request and the response. These numbers are collected in histograms and based on that data, in our Grafana graphs, we create corresponding graphs that show us e.g. the median size of request- and response payloads or the 99th percentile of call durations.

Dominik Sandjaja · 4 Dec 2020 · 6 min read

trivago Tech Check-in: Meet Fabian

In our new series, trivago Tech Check-in, we're introducing you to some of our tech talents from across the globe who help keep our metasearch engine running smoothly everyday. In this first edition, you'll meet Fabian Fritzsche, an engineering intern that works on the Microservice-System that feeds our GraphQL API with up-to-date hotel data.

Ankia Wolf Fabian Fritzsche · 3 Aug 2020 · 6 min read

Google Cloud Workload-Placement-Guide

At trivago we operate a hybrid infrastructure of both on-premise machines and clusters on Google Cloud. Over time, we came up with a set of deployment guidelines for running our workloads as more and more of them are migrating to Google Cloud. These are not strict rules, but rather suggestions to best serve each team's needs.

Arne Claus · 17 Jul 2020 · 10 min read

Cross-Cluster Traffic Mirroring with Istio

The price of reliability is the pursuit of the utmost simplicity.
— C.A.R. Hoare, Turing Award lecture

Have you ever enthusiastically released a new, delightful version to production and then suddenly started hearing a concerning number of notification sounds? Gets your heart beating right? After all, you didn't really expect this to happen because it worked in the development environment.

Mert Acikportali · 10 Jun 2020 · 10 min read

ElasticWars Episode IV: A new field

On a normal day, we ingest a lot of data into our ELK clusters (~6TB across all of our data centers). This is mostly operational data (logs) from different components in our infrastructure. This data ranges from purely technical info (logs from our services) to data about which pages our users are loading (intersection between business and technical data).

Jorge Luis Betancourt · 3 Jun 2020 · 8 min read

Backend at trivago

Accelerating experimentations through Simulations

Building Our First GraphQL Server with Go: An Implementation Guide

How to substantially slow down your Node.js server

How to Survive a Regional Outage

How we got on top of our data

Why and how we use primitive maps

How we build the Image Gallery on trivago

Proper (Java) application life cycle management in Kubernetes

Java Reactive Programming - Effective Usage in a Real World Application

Reactive Programming - The Price You Have To Pay For A Responsive Backend

How To Get Fooled By Metrics

trivago Tech Check-in: Meet Fabian

Google Cloud Workload-Placement-Guide

Cross-Cluster Traffic Mirroring with Istio

ElasticWars Episode IV: A new field

Popular tags

Featured articles

Implementing Data Validation with Great Expectations in Hybrid Environments

How we scaled our Prometheus setup

Being on-call as a software engineer - a challenging and fast learning experience

Java Reactive Programming - Effective Usage in a Real World Application

Learn Redis the hard way (in production)

trivago tech newsletter

Popular tags

Featured articles

Career? trivago.