At trivago we operate a hybrid infrastructure of both on-premise machines and clusters on Google Cloud. Over time, we came up with a set of deployment guidelines for running our workloads as more and more of them are migrating to Google Cloud. These are not strict rules, but rather suggestions to best serve each team’s needs. Teams are meant to have full control over their workloads, but most of the time they “just want to run things” and do not have the time or resources to care much about managing the underlying infrastructure.
Posts about DevOps
The price of reliability is the pursuit of the utmost simplicity. — C.A.R. Hoare, Turing Award lecture Introduction Have you ever enthusiastically released a new, delightful version to production and then suddenly started hearing a concerning number of notification sounds? Gets your heart beating right? After all, you didn’t really expect this to happen because it worked in the development environment. This “But it worked in the development environment!
On a normal day, we ingest a lot of data into our ELK clusters (~6TB across all of our data centers). This is mostly operational data (logs) from different components in our infrastructure. This data ranges from purely technical info (logs from our services) to data about which pages our users are loading (intersection between business and technical data). At trivago,we use Kafka as a central hub for moving data between our systems (including logs).
“The Cloud Native Computing Foundation (CNCF) hosts critical components of the global technology infrastructure. CNCF brings together the world’s top developers, end users, and vendors and runs the largest open source developer conferences. CNCF is part of the nonprofit Linux Foundation.” - CNCF Last year, when visiting CloudNativeCon/KubeCon Europe in Barcelona (one of the biggest cloud-focused conferences in Europe), I noticed that there were some companies present in the exhibition space whose primary focus wasn’t software development.
To the outside, trivago appears to be one single software product providing our popular hotel meta search. Behind the scenes, however, it is home to dozens of projects and tools to support it. Teams are encouraged to choose the programming languages and frameworks that will get the job done best. Only few restrictions are placed on the teams in these decisions, primarily long-term maintainability. As a result, trivago has a largely polyglot code base that fosters creativity and diverse thinking.
Make was created in 1976 by Stuart Feldman at Bell Labs to help build C programs. But how can this 40+ year old piece of software help us develop and maintain our ever-growing amount of cloud-based microservices?
Hello from trivago’s performance & monitoring team. One important part of our job is to ship more than a terabyte of logs and system metrics per day, from various data sources into elasticsearch, several time series databases and other data sinks. We do so by reading most of the data from multiple Kafka clusters and processing them with nearly 100 Logstashes. Our clusters currently consists of ~30 machines running Debian 7 with bare-metal installations of the aforementioned services.
Testing your functionality is important, but what happens if other factors come into play? In this blog post we show how trivago handles non-functional testing for every commit and how we scaled it.
We do think that our tech blog is full of interesting things powered by our engineers’ great stories. Let us take you on a journey of how we maintain trivago tech blog from the technical perspective and how we recently automated its deployment process.
Tackling hard problems is like going on an adventure. Solving a technical challenge feels like finding a hidden treasure. Want to go treasure hunting with us?View all current job openings