Data Science at trivago

Implementing Data Validation with Great Expectations in Hybrid Environments

Data validation is an essential step in any data processing pipeline, as it ensures the integrity and accuracy of the data to be used across all subsequent processing steps. Great Expectations (GX) is an open-source framework that provides a flexible and efficient way to perform data validation, allowing data scientists and analysts to quickly identify and correct any issues with their data. In this article, we share our experience implementing Great Expectations for data validation in our Hadoop environment, and our take on its benefits and limitations.

Kamila Widyanto · 25 Apr 2023 · 6 min read

Marketing Attribution: Evaluating The Path to Purchase in the Product Ecosystem

While working with data and analyzing the interactions of our users with the products we have today, it is essential to understand their behaviors by tracking their past actions, such as opening notifications, interacting with a blog, or creating a new login in the platform. In that context, the attribution study refers to the method of grouping together all of those actions in a specific pattern to generate one desired end result.

Shelly Leal · 6 Dec 2022 · 9 min read

Explore-exploit dilemma in Ranking model

Imagine, out of thousands of accommodations that match a user search, you have to select the “best” 25 to show to the user. Which ones would you show- the ones you know perform well or ones that have never been shown before, so that you can discover new high-potential accommodations? In the Data Science world, this is known as exploitation (continue doing what works well) versus exploration (try something new to discover hidden potential) problem and is often explained using the well-known multi-armed bandit problem. The objective of the problem is to divide a fixed number of resources between competing choices to maximize their expected gains, given that the properties of each choice are not fully known at the time of allocation.

Aida Orujova · 4 Nov 2022 · 8 min read

Powering ML-Based Systems With Reliable Data: The Data Annotation Journey

In the last few years, organisations have been increasing their investments in building Machine Learning (ML) based systems. In practice, such systems often took longer than expected to be built or failed to deliver the promised outcome. Data availability and quality have been among the most significant reasons behind this phenomenon. Since each organisation had its custom problems, open datasets or even logged data were not always directly usable. As a result, data collection and annotation processes became more crucial, yet remained under-documented.

Omayma Said Srinivas Ramesh Kamath · 1 Sept 2022 · 13 min read

Data Science at trivago

Implementing Data Validation with Great Expectations in Hybrid Environments

Marketing Attribution: Evaluating The Path to Purchase in the Product Ecosystem

Explore-exploit dilemma in Ranking model

Powering ML-Based Systems With Reliable Data: The Data Annotation Journey

Popular tags

Featured articles

Implementing Data Validation with Great Expectations in Hybrid Environments

How we scaled our Prometheus setup

Being on-call as a software engineer - a challenging and fast learning experience

Java Reactive Programming - Effective Usage in a Real World Application

Learn Redis the hard way (in production)

trivago tech newsletter

Popular tags

Featured articles

Career? trivago.