Skip to content

Using Instacart data on customer orders over time to predict which previously purchased products will be in a user’s next order.

Notifications You must be signed in to change notification settings

roshansridhar/Market-basket-analysis-using-bayseian-network

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Market basket analysis using bayseian network

Using Instacart data on customer orders over time to predict which previously purchased products will be in a user’s next order. The dataset can be found here

Topic: Repeat buyer prediction for E-Commerce

Problem Statement:

Create a Bayesian Belief network to analyze the behavior of the customers, and predict their next purchase, along with the order in which they are bought in that transaction

Data:

We used “The Instacart Online Grocery Shopping Dataset 2017” dataset for our project. The dataset consists of 75,000 users and 49,000 products. For each product, we are given the aisle and department in which they are located. For each user, we are given the history of their past ‘x’ transactions (where x can be anywhere between 5 and 100), the order in which each product is purchased in those transactions, and whether a particular product is reordered or not by that particular user.

A brief overview of the data

Work flow

Bayesian network

The image above gives an overview of the dependent and independent variables in our data. The arrow indicates a dependency. So, as shown in the diagram, the variable we are trying to predict (Reordered) is directly dependent on Reorder count, Add to cart order and Aisle ID. In addition, the Reorder count and Add to cart order are also dependent on Add to cart order. Add to cart order is the independent variable. The underlying mathematics of the network is explained below.

Methodology:

Our goal is to predict

alt text

Where reordered = 1 indicates that the product was reordered, given e. The random variable ‘e’ is the event where
Add to cart order = ‘atco1’
Aisle = ‘aisle_id’
Number of times reordered = ‘recount_c’

alt text alt text

alt text

Using Bayes’ Theorem, we calculate the posterior probability using:
alt text

We can replace the equality with a proportionality with the following formula:
alt text

Once the posterior is calculated, it becomes our new prior.
alt text

For our first order, we get the prior calculating an informed prior using the formula:
alt text

Prediction:

The predictions are stored in predictions.csv in the format that was needed to be submitted for the competition.

About

Using Instacart data on customer orders over time to predict which previously purchased products will be in a user’s next order.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published