Using Instacart data on customer orders over time to predict which previously purchased products will be in a user’s next order. The dataset can be found here
Create a Bayesian Belief network to analyze the behavior of the customers, and predict their next purchase, along with the order in which they are bought in that transaction
We used “The Instacart Online Grocery Shopping Dataset 2017” dataset for our project. The dataset consists of 75,000 users and 49,000 products. For each product, we are given the aisle and department in which they are located. For each user, we are given the history of their past ‘x’ transactions (where x can be anywhere between 5 and 100), the order in which each product is purchased in those transactions, and whether a particular product is reordered or not by that particular user.
The image above gives an overview of the dependent and independent variables in our data. The arrow indicates a dependency. So, as shown in the diagram, the variable we are trying to predict (Reordered) is directly dependent on Reorder count, Add to cart order and Aisle ID. In addition, the Reorder count and Add to cart order are also dependent on Add to cart order. Add to cart order is the independent variable. The underlying mathematics of the network is explained below.
Our goal is to predict
Where reordered = 1 indicates that the product was reordered, given e. The random variable ‘e’ is the event where
Add to cart order = ‘atco1’
Aisle = ‘aisle_id’
Number of times reordered = ‘recount_c’
Using Bayes’ Theorem, we calculate the posterior probability using:
We can replace the equality with a proportionality with the following formula:
Once the posterior is calculated, it becomes our new prior.
For our first order, we get the prior calculating an informed prior using the formula:
The predictions are stored in predictions.csv in the format that was needed to be submitted for the competition.