Backend at trivago
Insights, experiences and learnings from trivago's tech teams.
Configuration management tools have recently gained a lot of popularity. At trivago we use SaltStack to automate our infrastructure. As the complexity of configuration files and formulas is increasing, we need a fast, reliable way to test our changes.
At trivago we store a subset of our realtime metric data in InfluxDB and we are quite impressed by the load it can handle. Despite all the joy, we had to learn some lessons the hard way. It is pretty easy to overload the database or the web browser by executing queries that return too many datapoints. To prevent that, we wrote Protector - a circuit breaker for Time series databases that blocks malicious queries.
At trivago we rely heavily on the ELK stack for our log processing. We stream our webserver access logs, error logs, performance benchmarks and all kind of diagnostic data into Kafka and process it from there into Elasticsearch using Logstash. Our preferred encoding within this pipeline is Google's Protocol Buffers, short protobuf. In this blog post, we will explain with an example how to read protobuf encoded messages from Kafka using Logstash.
The advances and growth of our Selenium based automated testing infrastructure generated an unexpected number of test results to evaluate. We had to rethink our reporting systems. Combining the power of Selenium with Kibana's graphing and filtering features totally changed our way of working. Now we have real-time testing feedback and the ability of filtering between thousands of tests, all in one Dashboard.
Caching data is an essential part in many high-load scenarios. A local 1st-level cache can augment a shared 2nd-level cache like Redis and Memcached to further boost performance. An in-process cache involves no network overhead, so the cache speed is only limited by local resources like CPU, memory transfer speed and locking. tCache is a production-proof local in-process cache for the JVM, which is part of trivago's OpenSource Java library triava (Apache v2 license). This article outlines cutting-edge features like data-aware evictions that are operating near lock-free.
Learn how we managed to move fast and create a new Symfony application without breaking our old legacy session handling. We write to our legacy session (which is file based) from our new project which uses PDO as the session storage.
Here at trivago we write a huge number of log messages every day that need to be stored and monitored. To handle all these messages we created Gollum, a tool that enables us to conveniently send messages from multiple sources to different services. While initially only covering log messages Gollum quickly evolved to a routing framework for all kinds of data. This blogpost is a short introduction to Gollum and how we use it at trivago.
At trivago we love hotels above everything else, but we also like metrics, we love to measure everything, compare, decide, improve and then rinse and repeat. In this blog entry we are going to describe our experience with InfluxDB, a time series database that we are using to store some real time metrics.