tl;dr: continuously monitor your CDN and origin servers on layer 3 with tools like MTR. Layer 3 issues on external middleware can have a significant impact on layer 7 web performance. In a recent rollout of a new cloud service, we monitored the impact of this service on web performance, UX and business metrics. For all cloud regions and origin servers, we had Synthetic and Real User Monitoring for our site in place.
Posts about Monitoring
Hello from trivago’s performance & monitoring team. One important part of our job is to ship more than a terabyte of logs and system metrics per day, from various data sources into elasticsearch, several time series databases and other data sinks. We do so by reading most of the data from multiple Kafka clusters and processing them with nearly 100 Logstashes. Our clusters currently consists of ~30 machines running Debian 7 with bare-metal installations of the aforementioned services.
Ever heard about Microservices? Those tiny litte pieces of code that are used to split a big pile of magic into smaller pieces of magic? Well, they’re not that tiny after all and require lots of preliminary work to use them properly. Have a look at this post to hear about my journey of splitting an existing monolith written in PHP up into several microservices written in Go.
We were not as happy as we could be with out Cucumber test reporting solution - so we decided to build a new and shiny one from scratch.
We’re a data-driven company. At trivago we love measuring everything. Collecting metrics and making decisions based on them comes naturally to all our engineers. This workflow also applies to performance, which is key to succeed in the modern Internet.
At trivago we store a subset of our realtime metric data in InfluxDB and we are quite impressed by the load it can handle. Despite all the joy, we had to learn some lessons the hard way. It is pretty easy to overload the database or the web browser by executing queries that return too many datapoints.
At trivago we rely heavily on the ELK stack for our log processing. We stream our webserver access logs, error logs, performance benchmarks and all kind of diagnostic data into Kafka and process it from there into Elasticsearch using Logstash.
The advances and growth of our Selenium based automated testing infrastructure generated an unexpected number of test results to evaluate. We had to rethink our reporting systems. Combining the power of Selenium with Kibana’s graphing and filtering features totally changed our way of working.
At trivago we love hotels above everything else, but we also like metrics, we love to measure everything, compare, decide, improve and then rinse and repeat. In this blog entry we are going to describe our experience with InfluxDB, a time series database that we are using to store some real time metrics.
Tackling hard problems is like going on an adventure. Solving a technical challenge feels like finding a hidden treasure. Want to go treasure hunting with us?View all current job openings