Read Cross-Cluster Traffic Mirroring with Istio

Cross-Cluster Traffic Mirroring with Istio

The price of reliability is the pursuit of the utmost simplicity.
— C.A.R. Hoare, Turing Award lecture

Have you ever enthusiastically released a new, delightful version to production and then suddenly started hearing a concerning number of notification sounds? Gets your heart beating right? After all, you didn't really expect this to happen because it worked in the development environment.

Read ElasticWars Episode IV: A new field

ElasticWars Episode IV: A new field

On a normal day, we ingest a lot of data into our ELK clusters (~6TB across all of our data centers). This is mostly operational data (logs) from different components in our infrastructure. This data ranges from purely technical info (logs from our services) to data about which pages our users are loading (intersection between business and technical data).

Read Accommodation Consolidation: How we created an ETL pipeline on cloud

Accommodation Consolidation: How we created an ETL pipeline on cloud

Imagine you go to your hotel for check-in and they say that your dog is not allowed even though the website clearly states that it is!

trivago gets information about millions of accommodations from hundreds of partners and they keep on updating. There are many differences not just in the data format, but also in the data itself. There can be many discrepancies in the information and consolidating them can be a very complex process. But it's our responsibility to provide the most accurate information to the best of our knowledge.

Read Why We Chose Go

Why We Chose Go

To the outside, trivago appears to be one single software product providing our popular hotel meta search. Behind the scenes, however, it is home to dozens of projects and tools to support it. Teams are encouraged to choose the programming languages and frameworks that will get the job done best. Only few restrictions are placed on the teams in these decisions, primarily long-term maintainability. As a result, trivago has a largely polyglot code base that fosters creativity and diverse thinking. It allows us to make informed decisions based on actual requirements rather than legacy code or antiquated projects.

Read Getting Ready For The Big Data Apocalypse

Getting Ready For The Big Data Apocalypse

trivago Intelligence was born in 2013 with two main objectives: First, to provide bidding capability to the advertisers, who are listed on trivago, and second, to provide them with metrics related to their own hotels; like clicks, revenue, and bookings (typical BI data). This project faced a wave of inevitable data growth which lead to a refactoring process which produced a lot of learnings for the team. As I expect it to be useful for other teams who deal with similar challenges, this article will describe why a team started a full migration of technologies, how we did it and the result of it.

Read Circuit Breaker with AWS Step Functions

Circuit Breaker with AWS Step Functions

At trivago, we have several workflows which interact with external services. The health and availability of external services can have an impact on keeping our workflows alive and responsive. Think of an API call made to an external service which is down. Our workflows have to be prepared to expect these errors and adapt to it.

Read Nomad - our experiences and best practices

Nomad - our experiences and best practices

Hello from trivago's performance & monitoring team. One important part of our job is to ship more than a terabyte of logs and system metrics per day, from various data sources into elasticsearch, several time series databases and other data sinks. We do so by reading most of the data from multiple Kafka clusters and processing them with nearly 100 Logstashes. Our clusters currently consists of ~30 machines running Debian 7 with bare-metal installations of the aforementioned services. This summer we decided to migrate all of this to an on-premise [Nomad](https://www.nomadproject.io/ cluster) cluster.

Read Building fast and reliable web applications

Building fast and reliable web applications

Test, test, test. If you don’t, an issue is bound to crop up in production sooner or later.

We’ve all heard this mantra in one form or another. The importance of testing your software has been covered by countless articles, books and conferences. You worked hard on your code coverage and your downtime due to regression-related bugs has severely decreased.

Read Efficient Image Recovery at Scale Using Amazon S3 Versioning

Efficient Image Recovery at Scale Using Amazon S3 Versioning

If you’re using Amazon Web Services, then there is a higher possibility that you’re familiar with Amazon S3. Amazon S3 ( Simple Storage Service ) is a widely used service where we can store (theoretically unlimited amount of) our data with a high availability 99.99%. That’s why we, the Visual Content team at trivago, use Amazon S3 to store the images which you see on our website and many other tools.

Read Improving Your Data Layer with Rebase on Python

Improving Your Data Layer with Rebase on Python

Technology keeps getting better and better which, at some point, makes us think "Should I migrate to the latest version/technology or not?" Well when you decide to use a better technology for your application, you have to also consider rewriting the code that your application runs on. The business logic remains the same in most of the cases but the data model would definitely change if you are switching from SQL to some NoSQL Technology for example.

Read AWS Kinesis with Lambdas: Lessons Learned

AWS Kinesis with Lambdas: Lessons Learned

Almost six months ago, our team started the journey to replicate some of our data stored in on-premise MySQL machines to AWS. This included over a billion records stored in multiple tables. The new system had to be responsive enough to transfer any new incoming data from the MySQL database to AWS with minimal latency.

Read How trivago Reduced Memcached Memory Usage by 50%

How trivago Reduced Memcached Memory Usage by 50%

If you’ve never heard about Memcached, it is simply a high-performance, distributed memory caching system which uses a key-value store for strings and objects. Usually, it serves for saving data originally retrieved from a database or external services. As simple as it is, it can improve the performance of your website quite a bit. The API of Memcached is very simple and accessible from most of modern programming languages. A simple example: