Data Science at trivago

Insights, experiences and learnings from trivago's tech teams.

Read A New Functional Approach to Complex Types in Apache Hive

A New Functional Approach to Complex Types in Apache Hive

When faced with the challenge to store, retrieve and process small or large amounts of data, structured query languages are typically not far away. These languages serve as a nice abstraction between the goal that is to be achieved and how it is actually done. The list of successful applications of this extra layer is long. MySQL users could switch from MyISAM to InnoDB or use new algorithms like Multi-Range-Read without a change to their application. We, as Hive users, can effortlessly switch our complete processing from MapReduce to, say, Tez or Spark. All this is possible because of SQL serving as an abstraction layer in between. However, in this article, I will outline the effects when SQL - specifically hiveQL - misbehaves and which steps we are taking to recover.

Read the whole article ›
Read Teardown, Rebuild: Migrating from Hive to PySpark

Teardown, Rebuild: Migrating from Hive to PySpark

Machine Learning (ML) engineering and software development are both fundamentally about writing correct and robust algorithms. In ML engineering we have the extra difficulty of ensuring mathematical correctness and avoiding propagation of round-off errors in the calculations when working with floating-point representations of a number.

Read the whole article ›