Hello from trivago's performance & monitoring team. One important part of our job is to ship more than a terabyte of logs and system metrics per day, from various data sources into elasticsearch, several time series databases and other data sinks. We do so by reading most of the data from multiple Kafka clusters and processing them with nearly 100 Logstashes. Our clusters currently consists of ~30 machines running Debian 7 with bare-metal installations of the aforementioned services. This summer we decided to migrate all of this to an on-premise [Nomad](https://www.nomadproject.io/ cluster) cluster.
Automation at trivago
Insights, experiences and learnings from trivago's tech teams.
We do think that our tech blog is full of interesting things powered by our engineers' great stories. Let me take you on a journey of how we maintain the trivago tech blog from the technical perspective and how we recently automated its deployment process.
When I joined trivago a year ago, we had problems with our releases. The traffic was increasing each day. When we put the server back into the load balancer without warming up the OPcache it would die. From time to time the warmup failed silently. Our DCO (data center operations) crew had to log into the servers and restart a few processes manually. During this time every release was very intense.
I love to take complex and tedious processes and automate the pain out of them until they are reduced to three or four steps!I love to take complex and tedious processes and automate the pain out of them until they are reduced to three or four steps!
The advances and growth of our Selenium based automated testing infrastructure generated an unexpected number of test results to evaluate. We had to rethink our reporting systems. Combining the power of Selenium with Kibana's graphing and filtering features totally changed our way of working. Now we have real-time testing feedback and the ability of filtering between thousands of tests, all in one Dashboard.