Read Real-world Insights: Anomaly Detection in Internet Traffic

Real-world Insights: Anomaly Detection in Internet Traffic

This article is written for individuals in data science or analytics roles who are familiar with terms such as confidence interval, databases, or workflows. It is aimed at those who need to implement anomaly detection techniques for various types of users with different needs. In this article, I will share my experience from working at trivago, with a specific focus on internet traffic. Rather than delving into the details of mathematical models (as there are already many well-covered articles on this topic), I aim to provide insights into real-world situations encompassing a wide range of business needs. These situations require tailored solutions to cater to different types of stakeholders.

Read Accelerating experimentations through Simulations

Accelerating experimentations through Simulations

During the development of customer-facing applications, time is crucial, especially when it comes to testing and analyzing changes before accepting them in production. This blog post explores how we developed a Java-based reactive tool to simulate production requests, that allows us to have quicker hints about the effects of changes introduced and be more confident about the hypotheses that are formulated. As a long term vision, we wish to significantly reduce the A/B testing time and ensure seamless transitions.

Read Marketing Attribution: Evaluating The Path to Purchase in the Product Ecosystem

Marketing Attribution: Evaluating The Path to Purchase in the Product Ecosystem

While working with data and analyzing the interactions of our users with the products we have today, it is essential to understand their behaviors by tracking their past actions, such as opening notifications, interacting with a blog, or creating a new login in the platform. In that context, the attribution study refers to the method of grouping together all of those actions in a specific pattern to generate one desired end result.

Read Explore-exploit dilemma in Ranking model

Explore-exploit dilemma in Ranking model

Imagine, out of thousands of accommodations that match a user search, you have to select the “best” 25 to show to the user. Which ones would you show- the ones you know perform well or ones that have never been shown before, so that you can discover new high-potential accommodations? In the Data Science world, this is known as exploitation (continue doing what works well) versus exploration (try something new to discover hidden potential) problem and is often explained using the well-known multi-armed bandit problem. The objective of the problem is to divide a fixed number of resources between competing choices to maximize their expected gains, given that the properties of each choice are not fully known at the time of allocation.

Read Powering ML-Based Systems With Reliable Data: The Data Annotation Journey

Powering ML-Based Systems With Reliable Data: The Data Annotation Journey

In the last few years, organisations have been increasing their investments in building Machine Learning (ML) based systems. In practice, such systems often took longer than expected to be built or failed to deliver the promised outcome. Data availability and quality have been among the most significant reasons behind this phenomenon. Since each organisation had its custom problems, open datasets or even logged data were not always directly usable. As a result, data collection and annotation processes became more crucial, yet remained under-documented.

Read Improving Evaluation Practices in Natural Language Generation

Improving Evaluation Practices in Natural Language Generation

Throughout last year I had the opportunity to participate and collaborate on multiple research initiatives in the field of Natural Language Generation (NLG) in addition to my responsibilities as a Data Scientist at trivago. NLG is the process of automatically generating text from either text and/or non-linguistic data inputs. Some NLG applications include chatbots, image captioning, and report generation. These are application areas of high interest internally within trivago as we seek to leverage our rich data environment to enrich the user experience with potential NLG applications.

Read Deep Dive Into Data Science at trivago

Deep Dive Into Data Science at trivago

What does Data Science at trivago look like in practice? Which major challenges have we encountered as a travel-tech company since the COVID-19 outbreak? What's it like to work in Data Science at trivago? In this Q&A with James Neaves (Business Intelligence Lead), Andrea Fernandez (Data Science Team Lead), and Sheetij Jain (Product Manager in User Profiling) we'll answer all these questions and more.

Read Getting Ready For The Big Data Apocalypse

Getting Ready For The Big Data Apocalypse

trivago Intelligence was born in 2013 with two main objectives: First, to provide bidding capability to the advertisers, who are listed on trivago, and second, to provide them with metrics related to their own hotels; like clicks, revenue, and bookings (typical BI data). This project faced a wave of inevitable data growth which lead to a refactoring process which produced a lot of learnings for the team. As I expect it to be useful for other teams who deal with similar challenges, this article will describe why a team started a full migration of technologies, how we did it and the result of it.

Read How to Analyze SurveyMonkey Data in Python

How to Analyze SurveyMonkey Data in Python

As a user researcher, it is important to know more about our users and their preferences concerning our product. One way to do that is by conducting surveys.

In order to gather user feedback from our global markets, we need to conduct a survey with a slightly different set of questions/translations for different countries, and then analyze the results and compare if there is any difference across countries concerning user needs.

Read RecSys Challenge 2019

RecSys Challenge 2019

Our data scientists and engineers love the challenges that their work presents to them on a daily basis and thrive in our agile environment where they can share their knowledge, learn from others, and work together to solve any problems that arise. We are always looking for ways to share the unique problem settings we encounter and to inspire a productive exchange on algorithm development and evaluation.

Read A New Functional Approach to Complex Types in Apache Hive

A New Functional Approach to Complex Types in Apache Hive

When faced with the challenge to store, retrieve and process small or large amounts of data, structured query languages are typically not far away. These languages serve as a nice abstraction between the goal that is to be achieved and how it is actually done. The list of successful applications of this extra layer is long. MySQL users could switch from MyISAM to InnoDB or use new algorithms like Multi-Range-Read without a change to their application. We, as Hive users, can effortlessly switch our complete processing from MapReduce to, say, Tez or Spark. All this is possible because of SQL serving as an abstraction layer in between. However, in this article, I will outline the effects when SQL - specifically hiveQL - misbehaves and which steps we are taking to recover.