Skip to content

akiibot/sql_retailSalesProject1

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 

Repository files navigation

Retail Sales Analysis SQL Project

Project Overview

Project Title: Retail Sales Analysis
Database: retailProject

Welcome to the Retail Sales Data Analysis project! This project involves analyzing a retail store's transaction dataset using structured SQL queries. The goal is to extract meaningful business insights regarding customer behavior, product performance, time-based trends, and operational metrics.

Objectives

  1. Set up a retail sales database: Create and populate a retail sales database with the provided sales data.
  2. Data Cleaning: Identify and remove any records with missing or null values.
  3. Exploratory Data Analysis (EDA): Perform basic exploratory data analysis to understand the dataset.
  4. Business Analysis: Use SQL to answer specific business questions and derive insights from the sales data.

📊 Key Analyses


General Insights

  1. Total number of transactions
  2. Unique customers and product categories
  3. Null value checks and cleanup

📅 Time-Based Analysis

  1. Daily and monthly sales trends
  2. Shift-wise performance (Morning, Afternoon, Evening)
  3. Best-selling months per year

🧑 Customer Behavior

  1. Repeat customers
  2. Average age by category
  3. Age group distribution
  4. Gender-based sales performance
  5. Top 5 highest-spending customers

📦 Product & Sales Insights

  1. Top-selling product categories
  2. Quantity sold per category
  3. Price and revenue analysis per category
  4. Transactions above threshold sale values

Project Structure

1. Database Setup

  • Database Creation: The project starts by creating a database named p1_retail_db.
  • Table Creation: A table named retail_sales is created to store the sales data. The table structure includes columns for transaction ID, sale date, sale time, customer ID, gender, age, product category, quantity sold, price per unit, cost of goods sold (COGS), and total sale amount.
-- Create the database
CREATE DATABASE retailProject;

-- Create the retail_sales table
CREATE TABLE retail_sales (
    transactions_id INT PRIMARY KEY,
    sale_date DATE,
    sale_time TIME,
    customer_id INT,
    gender VARCHAR(10),
    age INT,
    category VARCHAR(35),
    quantity INT,
    price_per_unit FLOAT,
    cogs FLOAT,
    total_sale FLOAT
);

2. Data Exploration & Cleaning

  • Record Count: Determine the total number of records in the dataset.
  • Customer Count: Find out how many unique customers are in the dataset.
  • Category Count: Identify all unique product categories in the dataset.
  • Null Value Check: Check for any null values in the dataset and delete records with missing data.
SELECT COUNT(*) FROM retail_sales;
SELECT COUNT(DISTINCT customer_id) FROM retail_sales;
SELECT DISTINCT category FROM retail_sales;

SELECT * FROM retail_sales
WHERE 
    sale_date IS NULL OR sale_time IS NULL OR customer_id IS NULL OR 
    gender IS NULL OR age IS NULL OR category IS NULL OR 
    quantity IS NULL OR price_per_unit IS NULL OR cogs IS NULL;

DELETE FROM retail_sales
WHERE 
    sale_date IS NULL OR sale_time IS NULL OR customer_id IS NULL OR 
    gender IS NULL OR age IS NULL OR category IS NULL OR 
    quantity IS NULL OR price_per_unit IS NULL OR cogs IS NULL;

3. Data Analysis & Findings

The following SQL queries were developed to answer specific business questions:

  1. Write a SQL query to retrieve all columns for sales made on '2022-11-05:
SELECT *
FROM retail_sales
WHERE sale_date = '2022-11-05';
  1. Write a SQL query to retrieve all transactions where the category is 'Clothing' and the quantity sold is more than 4 in the month of Nov-2022:
SELECT *
FROM retail_sales
WHERE category = 'Clothing'
  AND quantity > 4
  AND sale_date BETWEEN '2022-11-01' AND '2023-10-27';
  1. Write a SQL query to calculate the total sales (total_sale) for each category.:
SELECT 
    category,
    SUM(total_sale) AS net_sale,
    COUNT(*) AS total_orders
FROM retail_sales
GROUP BY category;
  1. Write a SQL query to find the average age of customers who purchased items from the 'Beauty' category.:
SELECT 
    ROUND(AVG(age), 2) AS avg_age
FROM retail_sales
WHERE category = 'Beauty';
  1. Write a SQL query to find all transactions where the total_sale is greater than 1000.:
SELECT * FROM retail_sales
WHERE total_sale  < 1000;
  1. Write a SQL query to find the total number of transactions (transaction_id) made by each gender in each category.:
SELECT 
    category,
    gender,
    COUNT(transactions_id) AS total_transactions
FROM retail_sales
GROUP BY category, gender
ORDER BY category;
  1. Write a SQL query to calculate the average sale for each month. Find out best selling month in each year:
SELECT 
    year,
    month,
    avg_sale
FROM (
    SELECT 
        EXTRACT(YEAR FROM sale_date) AS year,
        EXTRACT(MONTH FROM sale_date) AS month,
        AVG(total_sale) AS avg_sale,
        RANK() OVER (
            PARTITION BY EXTRACT(YEAR FROM sale_date)
            ORDER BY AVG(total_sale) DESC
        ) AS rank
    FROM retail_sales
    GROUP BY year, month
) AS ranked_sales
WHERE rank = 1
  1. **Write a SQL query to find the top 5 customers based on the highest total sales **:
SELECT 
    customer_id,
    SUM(total_sale) AS total_sales
FROM retail_sales
GROUP BY customer_id
ORDER BY total_sales DESC
LIMIT 5;
  1. Write a SQL query to find the number of unique customers who purchased items from each category.:
SELECT 
    category,
    COUNT(DISTINCT customer_id) AS unique_customers
FROM retail_sales
GROUP BY category;
  1. Write a SQL query to create each shift and number of orders (Example Morning <12, Afternoon Between 12 & 17, Evening >17): (Learned something new here)
WITH hourly_sales AS (
    SELECT *,
        CASE
            WHEN EXTRACT(HOUR FROM sale_time) < 12 THEN 'Morning'
            WHEN EXTRACT(HOUR FROM sale_time) BETWEEN 12 AND 17 THEN 'Afternoon'
            ELSE 'Evening'
        END AS shift
    FROM retail_sales
)
SELECT 
    shift,
    COUNT(*) AS total_orders
FROM hourly_sales
GROUP BY shift;

10. Top 3 selling product categories (by total revenue)

SELECT 
    category,
    SUM(total_sale) AS total_revenue
FROM retail_sales
GROUP BY category
ORDER BY total_revenue DESC
LIMIT 3;

11. Repeat customers (customers who made more than 1 purchase)

SELECT 
    customer_id,
    COUNT(*) AS total_transactions
FROM retail_sales
GROUP BY customer_id
HAVING COUNT(*) > 1
ORDER BY total_transactions DESC;

12. Daily sales trend (total sale per day)

SELECT 
    sale_date,
    SUM(total_sale) AS daily_sales
FROM retail_sales
GROUP BY sale_date
ORDER BY sale_date;

13. Average quantity sold per category

SELECT 
    category,
    ROUND(AVG(quantity), 2) AS avg_quantity_sold
FROM retail_sales
GROUP BY category
ORDER BY avg_quantity_sold DESC;

14. Top 3 selling product categories (by total revenue)

SELECT 
    category,
    ROUND(AVG(quantity), 2) AS avg_quantity_sold
FROM retail_sales
GROUP BY category
ORDER BY avg_quantity_sold DESC;

15. Top 3 selling product categories (by total revenue)

SELECT 
    category,
    MAX(price_per_unit) AS max_price,
    MIN(price_per_unit) AS min_price,
    ROUND(AVG(price_per_unit)::NUMERIC, 2) AS avg_price
FROM retail_sales
GROUP BY category
ORDER BY avg_price DESC;

Findings

  • Customer Demographics: The dataset includes customers from various age groups, with sales distributed across different categories such as Clothing and Beauty.
  • High-Value Transactions: Several transactions had a total sale amount greater than 1000, indicating premium purchases.
  • Sales Trends: Monthly analysis shows variations in sales, helping identify peak seasons.
  • Customer Insights: The analysis identifies the top-spending customers and the most popular product categories.

Reports

  • Sales Summary: A detailed report summarizing total sales, customer demographics, and category performance.
  • Trend Analysis: Insights into sales trends across different months and shifts.
  • Customer Insights: Reports on top customers and unique customer counts per category.

🛠️ Technologies Used

  1. PostgreSQL – For querying and analyzing sales data

  2. DBeaver – PostgreSQL GUI for running queries and visualizing results

Conclusion

I have made this project to get a comprehensive understanding of SQL for data analysts, covering database setup, data cleaning, exploratory data analysis, and business-driven SQL queries. The findings from this project can help drive business decisions by understanding sales patterns, customer behavior, and product performance. This project was inspired by ZERO ANALYST.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors