Data Science at trivago

Women in Tech Meetup

Learning about risk-taking, sentiment analysis, cybersecurity, and psychological safety in one evening? That’s quite a mix, isn't it?

That’s what the audience got offered at the Women in Tech meetup, which took place on February 29th in trivago's office space. Together with iteratec, trivago had the pleasure of bringing together the vibrant community of female tech talents and their allies on the occasion of International Women's Day, for an evening featuring stories about risk, safety, and continuous improvement.

Saskia Keil · 27 Mar 2024 · 4 min read

Real-world Insights: Anomaly Detection in Internet Traffic

This article is written for individuals in data science or analytics roles who are familiar with terms such as confidence interval, databases, or workflows. It is aimed at those who need to implement anomaly detection techniques for various types of users with different needs. In this article, I will share my experience from working at trivago, with a specific focus on internet traffic. Rather than delving into the details of mathematical models (as there are already many well-covered articles on this topic), I aim to provide insights into real-world situations encompassing a wide range of business needs. These situations require tailored solutions to cater to different types of stakeholders.

Peter Brejcak · 13 Feb 2024 · 8 min read

Accelerating experimentations through Simulations

During the development of customer-facing applications, time is crucial, especially when it comes to testing and analyzing changes before accepting them in production. This blog post explores how we developed a Java-based reactive tool to simulate production requests, that allows us to have quicker hints about the effects of changes introduced and be more confident about the hypotheses that are formulated. As a long term vision, we wish to significantly reduce the A/B testing time and ensure seamless transitions.

Rishav Jayswal Ricardo Hernandez-Montoya · 20 Nov 2023 · 8 min read

Marketing Attribution: Evaluating The Path to Purchase in the Product Ecosystem

While working with data and analyzing the interactions of our users with the products we have today, it is essential to understand their behaviors by tracking their past actions, such as opening notifications, interacting with a blog, or creating a new login in the platform. In that context, the attribution study refers to the method of grouping together all of those actions in a specific pattern to generate one desired end result.

Shelly Leal · 6 Dec 2022 · 9 min read

Explore-exploit dilemma in Ranking model

Imagine, out of thousands of accommodations that match a user search, you have to select the “best” 25 to show to the user. Which ones would you show- the ones you know perform well or ones that have never been shown before, so that you can discover new high-potential accommodations? In the Data Science world, this is known as exploitation (continue doing what works well) versus exploration (try something new to discover hidden potential) problem and is often explained using the well-known multi-armed bandit problem. The objective of the problem is to divide a fixed number of resources between competing choices to maximize their expected gains, given that the properties of each choice are not fully known at the time of allocation.

Aida Orujova · 4 Nov 2022 · 8 min read

Powering ML-Based Systems With Reliable Data: The Data Annotation Journey

In the last few years, organisations have been increasing their investments in building Machine Learning (ML) based systems. In practice, such systems often took longer than expected to be built or failed to deliver the promised outcome. Data availability and quality have been among the most significant reasons behind this phenomenon. Since each organisation had its custom problems, open datasets or even logged data were not always directly usable. As a result, data collection and annotation processes became more crucial, yet remained under-documented.

Omayma Said Srinivas Ramesh Kamath · 1 Sept 2022 · 13 min read

Improving Evaluation Practices in Natural Language Generation

Throughout last year I had the opportunity to participate and collaborate on multiple research initiatives in the field of Natural Language Generation (NLG) in addition to my responsibilities as a Data Scientist at trivago. NLG is the process of automatically generating text from either text and/or non-linguistic data inputs. Some NLG applications include chatbots, image captioning, and report generation. These are application areas of high interest internally within trivago as we seek to leverage our rich data environment to enrich the user experience with potential NLG applications.

Saad Mahamood · 31 Mar 2022 · 8 min read

Deep Dive Into Data Science at trivago

What does Data Science at trivago look like in practice? Which major challenges have we encountered as a travel-tech company since the COVID-19 outbreak? What's it like to work in Data Science at trivago? In this Q&A with James Neaves (Business Intelligence Lead), Andrea Fernandez (Data Science Team Lead), and Sheetij Jain (Product Manager in User Profiling) we'll answer all these questions and more.

Andrea Fernandez Sheetij Jain James Neaves · 22 Oct 2020 · 16 min read

Getting Ready For The Big Data Apocalypse

trivago Intelligence was born in 2013 with two main objectives: First, to provide bidding capability to the advertisers, who are listed on trivago, and second, to provide them with metrics related to their own hotels; like clicks, revenue, and bookings (typical BI data). This project faced a wave of inevitable data growth which lead to a refactoring process which produced a lot of learnings for the team. As I expect it to be useful for other teams who deal with similar challenges, this article will describe why a team started a full migration of technologies, how we did it and the result of it.

Biel Mir · 16 Dec 2019 · 7 min read

How to Analyze SurveyMonkey Data in Python

As a user researcher, it is important to know more about our users and their preferences concerning our product. One way to do that is by conducting surveys.

In order to gather user feedback from our global markets, we need to conduct a survey with a slightly different set of questions/translations for different countries, and then analyze the results and compare if there is any difference across countries concerning user needs.

Ruoyun Lin · 23 Sept 2019 · 8 min read

Machine Learning and Bathtubs - How Small Visual Changes Improve User Experience

While searching for "Spa and Wellness hotels in Berlin..." I land on trivago. Surprisingly the main images of the hotels exactly reflect the spa concept that I am searching for. It helped me better compare hotels on the list for finding my ideal accommodation for my vacation!

Sayon Kumar Saha · 21 Aug 2019 · 14 min read

RecSys Challenge 2019

Our data scientists and engineers love the challenges that their work presents to them on a daily basis and thrive in our agile environment where they can share their knowledge, learn from others, and work together to solve any problems that arise. We are always looking for ways to share the unique problem settings we encounter and to inspire a productive exchange on algorithm development and evaluation.

Jens Adamczak · 11 Mar 2019 · 5 min read

A New Functional Approach to Complex Types in Apache Hive

When faced with the challenge to store, retrieve and process small or large amounts of data, structured query languages are typically not far away. These languages serve as a nice abstraction between the goal that is to be achieved and how it is actually done. The list of successful applications of this extra layer is long. MySQL users could switch from MyISAM to InnoDB or use new algorithms like Multi-Range-Read without a change to their application. We, as Hive users, can effortlessly switch our complete processing from MapReduce to, say, Tez or Spark. All this is possible because of SQL serving as an abstraction layer in between. However, in this article, I will outline the effects when SQL - specifically hiveQL - misbehaves and which steps we are taking to recover.

Jan Filipiak · 30 Jan 2019 · 8 min read

Teardown, Rebuild: Migrating from Hive to PySpark

Machine Learning (ML) engineering and software development are both fundamentally about writing correct and robust algorithms. In ML engineering we have the extra difficulty of ensuring mathematical correctness and avoiding propagation of round-off errors in the calculations when working with floating-point representations of a number.

German I. Ramirez-Espinoza · 3 Dec 2018 · 14 min read

Data Science at trivago

Women in Tech Meetup

Real-world Insights: Anomaly Detection in Internet Traffic

Accelerating experimentations through Simulations

Marketing Attribution: Evaluating The Path to Purchase in the Product Ecosystem

Explore-exploit dilemma in Ranking model

Powering ML-Based Systems With Reliable Data: The Data Annotation Journey

Improving Evaluation Practices in Natural Language Generation

Deep Dive Into Data Science at trivago

Getting Ready For The Big Data Apocalypse

How to Analyze SurveyMonkey Data in Python

Machine Learning and Bathtubs - How Small Visual Changes Improve User Experience

RecSys Challenge 2019

A New Functional Approach to Complex Types in Apache Hive

Teardown, Rebuild: Migrating from Hive to PySpark

Popular tags

Featured articles

Implementing Data Validation with Great Expectations in Hybrid Environments

How we scaled our Prometheus setup

Being on-call as a software engineer - a challenging and fast learning experience

Java Reactive Programming - Effective Usage in a Real World Application

Learn Redis the hard way (in production)

trivago tech newsletter

Popular tags

Featured articles

Career? trivago.