Read How to Survive a Regional Outage

How to Survive a Regional Outage

As I’m writing this, we’re in the middle of our yearly load testing process.
Since a couple of years now, trivago conducts regular production load tests. We do this to test if all our services sustain the increased load we experience during the summer and winter months.
This year is now the 2nd time, where we also do another test: A "regional failover" test.

SRE: On-Call Procedure at trivago

One of the many responsibilities of a Site Reliability Engineer (SRE), is to ensure uptime, availability and in some cases, consistency of the product. In this context, the product refers to the website, APIs, microservices, and servers. This responsibility of keeping the product up and running becomes particularly interesting if the product is used around the world 24 hours every day like trivago. And just like in the medical profession, someone has to be on call to react on failures and outages outside of the office hours.

How we got on top of our data

Scalability and availability are key aspects of cloud native computing. If your microservice takes five minutes to start up, it becomes very difficult to meet the expectations because adjustments to traffic changes, regional failovers, hot-fixes and rollbacks are simply too slow. In this article, we show how we solved this and a few other problems by taking control of the process of updating our data and storing it in a highly available Redis setup.

Read Improving Evaluation Practices in Natural Language Generation

Improving Evaluation Practices in Natural Language Generation

Throughout last year I had the opportunity to participate and collaborate on multiple research initiatives in the field of Natural Language Generation (NLG) in addition to my responsibilities as a Data Scientist at trivago. NLG is the process of automatically generating text from either text and/or non-linguistic data inputs. Some NLG applications include chatbots, image captioning, and report generation. These are application areas of high interest internally within trivago as we seek to leverage our rich data environment to enrich the user experience with potential NLG applications.

Read A preview of CSS Container Queries

A preview of CSS Container Queries

Have you ever wondered about how to create components that not only change their appearance according to the viewport’s width but instead also transform according to their parent element’s sizing? This might be the case when you have a card component that has the classical vertical layout on mobile:

Read trivago Tech Check-in: Meet Mohammad

trivago Tech Check-in: Meet Mohammad

trivago is the home to 500+ tech specialists from all corners of the globe – each with their own unique background and story of how they ended up here. Our trivago Tech Check-in series focuses on individual engineers' experience during their time at trivago. In this edition, you'll meet Mohammad Abed – a frontend software engineer who has been with trivago for 11 months now and is working on our Express Booking product.

Read trivago Tech Week 2021 in Review

trivago Tech Week 2021 in Review

In the middle of summer 2021, we hosted one of our favourite annual events of the year - trivago Tech Week! This year’s tech week had a new, hybrid format, featuring a wide variety of talks and exchange forums, hosted both by internal talents and external speakers. To make things even more engaging, the week included virtual opportunities for talents to gather and converse across departments, a gaming tournament and a highly anticipated live-music concert to tie everything together!

Read Postmortem: Removing all users from github.com/trivago

Postmortem: Removing all users from github.com/trivago

While engineering, we fix bugs, create new systems, build workflows and establish processes. Our job is to change things. Changing things can involve mistakes that ultimately lead to the failure of a particular system. To learn from these failures, a retrospective is helpful to get to the root of this problem. In the tech industry, a Blameless PostMortem is the right tool for this job.

How we build the Image Gallery on trivago

When was the last time you booked accommodation without checking its photos? Most probably never! Because having imagery information makes our decision-making process much easier and faster. However, picking up the best possible images of a hotel to show to the user is an interesting problem to solve, because it can be a naive random selection or a sophisticated machine learning model to know what the user truly wants at that moment.

Read Proper (Java) application life cycle management in Kubernetes

Proper (Java) application life cycle management in Kubernetes

When operating applications in Kubernetes, proper lifecycle management is crucial to enable Kubernetes to manage applications correctly throughout their different phases: startup, runtime and shutdown. Improper or incomplete lifecycle management can lead to incidents with unforeseen and difficult to debug application behavior, such as random CrashLoopBackOffs, broken/zombie services not being restarted or even entire services not becoming healthy after a scheduled restart.