Read Implementing Data Validation with Great Expectations in Hybrid Environments

Implementing Data Validation with Great Expectations in Hybrid Environments

Data validation is an essential step in any data processing pipeline, as it ensures the integrity and accuracy of the data to be used across all subsequent processing steps. Great Expectations (GX) is an open-source framework that provides a flexible and efficient way to perform data validation, allowing data scientists and analysts to quickly identify and correct any issues with their data. In this article, we share our experience implementing Great Expectations for data validation in our Hadoop environment, and our take on its benefits and limitations.

Read Marketing Attribution: Evaluating The Path to Purchase in the Product Ecosystem

Marketing Attribution: Evaluating The Path to Purchase in the Product Ecosystem

While working with data and analyzing the interactions of our users with the products we have today, it is essential to understand their behaviors by tracking their past actions, such as opening notifications, interacting with a blog, or creating a new login in the platform. In that context, the attribution study refers to the method of grouping together all of those actions in a specific pattern to generate one desired end result.

Read Explore-exploit dilemma in Ranking model

Explore-exploit dilemma in Ranking model

Imagine, out of thousands of accommodations that match a user search, you have to select the “best” 25 to show to the user. Which ones would you show- the ones you know perform well or ones that have never been shown before, so that you can discover new high-potential accommodations? In the Data Science world, this is known as exploitation (continue doing what works well) versus exploration (try something new to discover hidden potential) problem and is often explained using the well-known multi-armed bandit problem. The objective of the problem is to divide a fixed number of resources between competing choices to maximize their expected gains, given that the properties of each choice are not fully known at the time of allocation.

Read Powering ML-Based Systems With Reliable Data: The Data Annotation Journey

Powering ML-Based Systems With Reliable Data: The Data Annotation Journey

In the last few years, organisations have been increasing their investments in building Machine Learning (ML) based systems. In practice, such systems often took longer than expected to be built or failed to deliver the promised outcome. Data availability and quality have been among the most significant reasons behind this phenomenon. Since each organisation had its custom problems, open datasets or even logged data were not always directly usable. As a result, data collection and annotation processes became more crucial, yet remained under-documented.