Skip to content

UBC-MDS/salesanalyzer

Repository files navigation

salesanalyzer_mds

Project Status: Active Documentation Status ci-cd codecov Python Version PyPI

A Python package designed to simplify retail sales data analysis for small to medium-sized businesses. This tool offers a set of pre-built functions that make it easy to identify market segments, predict future sales, and analyze seasonal revenue trends.

Why salesanalyzer_mds?

Small to medium-sized businesses (SMBs) often lack the resources for in-house data teams or complex analytics tools. sales_analyzer is here to bridge that gap by providing easy-to-use, specialized functions that allow businesses to extract valuable insights from their sales data without requiring deep expertise in data science.

Key Benefits:

  • Tailored for SMBs: No need for expensive or complex tools. Our package is designed specifically for small to medium-sized businesses to help them make data-driven decisions with ease.
  • Easy-to-use functions: Simple, pre-built functions for common retail sales tasks so you can get started right away.
  • Cost-effective: Instead of hiring a full-time data analytics team or paying for expensive software, this package offers an affordable, one-stop solution to meet your business’s analytical needs.
  • Actionable Insights: Gain a better understanding of your market segments and sales trends, which can inform inventory management, marketing strategies, and customer outreach.

How It Fits into the Python Ecosystem

While existing Python packages such as Pandas and Scikit-learn provide powerful general-purpose tools for data manipulation and machine learning, they require significant customization and specialized knowledge to be applied effectively to retail sales analysis. sales_analyzer complements these tools by streamlining common retail-specific tasks. It provides a suite of pre-built, easy-to-use functions specifically tailored to sales data, so businesses don't need to spend time customizing solutions for their needs.

Installation

$ pip install salesanalyzer_mds

Functions

  • segment_revenue_share: Segments products into three categories: cheap, medium, expensive, based on price, and calculates their respective share in total revenue.
  • predictSales: Predicts future sales based on the provided historical data and the target.
  • sales_summary_statistics: Calculates a variety of summary statistics that provide insights into overall sales performance, customer behavior, and product performance.

Usage

salesanalyzer_mds can be used to extract sales data insights from available data.

  1. Set up imports
from salesanalyzer_mds.sales_summary_statistics import sales_summary_statistics
from salesanalyzer_mds.segment_revenue_share import segment_revenue_share
from salesanalyzer_mds.predict_sales import predict_sales
import pandas as pd     # additional import to handle your sales data
  1. Load your sales data as pandas DataFrame

  2. Retrieve the insights:

Summary statistics

sales_summary_statistics(your_sales_data)

The sales_summary_statistics() function returns a pandas DataFrame with:

  • 'total_revenue': The total revenue generated by all sales.
  • 'unique_customers': The number of unique customers.
  • 'average_order_value': The average value of an order (sum of revenue per invoice).
  • 'top_selling_product_quantity': The product with the highest quantity sold.
  • 'top_selling_product_revenue': The product with the highest total revenue.
  • 'average_revenue_per_customer': The average revenue generated by each customer.

Segment revenue share

segment_revenue_share(your_sales_data, 
                      price_col='UnitPrice', 
                      quantity_col='Quantity',
                      price_thresholds=None)      # replace column names with your data column names

The segment_revenue_share() funtion returns a pandas DataFrame showing the total revenue share for each price segment: 'cheap', 'medium', 'expensive'. Custom price thresholds can be set by the user other set automatically.

  • Custom price thresholds can be set using the price_thresholds parameter.
  • If not specified, thresholds are automatically determined based on the data.

Predict sales

predict_sales(your_sales_data, 
              new_data,     # new sales data to base the predictions on
              numeric_features = ['UnitPrice'],
              categorical_features = ['Description', 'Country'], 
              target = 'Quantity', 
              date_feature = 'InvoiceDate')

The predict_sales() function returns a DataFrame with prediction values, and a printed out MSE score.

Developer notes:

Install Development Version

  1. Clone the repository and navigate into the project root directory.

  2. Create a new environment with Python 3.10:

    conda create -n salesanalyzermds python=3.10
    conda activate salesanalyzermds
  3. Install Poetry by following these instructions, and then run the following bash command to install the necessary dependencies:

    poetry install

Running The Tests

To test the salesanalyzer-mds package, follow the steps below:

  1. Execute the tests using pytest from the root project directory:
pytest tests/
  1. To assess the branch coverage for this package:
pytest --cov=salesanalyzer_mds --cov-branch

Dependencies

This package relies on the following dependencies as outlined in pyproject.toml:

  • python = ">=3.10"
  • scikit-learn = ">=1.6.1"
  • pandas = ">=2.2.3"
  • pytest = ">=8.3.4"
  • jupyter = ">=1.1.1"
  • myst-nb = ">=1.1.2"
  • sphinx-autoapi = ">=3.4.0"
  • sphinx-rtd-theme = ">=3.0.2"

Contributors

  • Yeji Sohn
  • Daria Khon
  • Franklin Aryee

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

salesanalyzer_mds was created by Yeji Sohn, Daria Khon, Franklin Aryee. It is licensed under the terms of the MIT license.

Credits

salesanalyzer_mds was created with cookiecutter and the py-pkgs-cookiecutter template.