Skip to content

ntehseen/Data-Balancing-with-Gen-AI-Credit-Card-Fraud-Detection

Repository files navigation

Data Balancing : Fraud Detection using Generative Adversarial Networks (GANs)

Credit Card Fraud Detection using Generative Ai

Introduction

The project aims to explore the application of Generative Adversarial Networks (GANs) in fraud detection using credit card transaction data. By leveraging GANs, we seek to generate synthetic fraudulent transactions to address the challenge of imbalanced datasets commonly encountered in fraud detection tasks. The project involves several key steps, including data preprocessing, model architecture design, training and evaluation, and generation of synthetic data for analysis.

  • Accuracy Score for Generative AI: 0.936

Project Outline

  • Load the Dataset
  • Preprocess and Explore the data
  • Create the Generator model
  • Practice Task - Data Preprocessing for Neural Networks
  • Create the Discriminator model
  • Combine Generator and Discriminator models to Build The GAN
  • Train and evaluate our GAN
  • Generate synthetic data using the trained Generator
  • Principal Component Analysis for Data visualization

Importing the Modules

# Importing necessary modules and libraries
import numpy as np
import pandas as pd
import tensorflow as tf
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from tensorflow.keras.layers import Input, Dense, BatchNormalization
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import plot_model
from tensorflow.keras.callbacks import TensorBoard
from sklearn.utils import shuffle

Importing the Data

# Load the dataset
data = pd.read_csv('/content/Creditcard_dataset.csv')
data.head()

Check the data shape:

data.shape

Remove rows with NaN values:

data.dropna(inplace=True)
data.shape

Remove the 'Time' column:

data = data.drop(axis=1, columns='Time')
data.head()

Feature scaling of the 'Amount' column:

scaler = StandardScaler()
data['Amount'] = scaler.fit_transform(data[['Amount']])
data.head()

Let's split the genuine and fraudulent records into separate dataframes:

data_fraud = data[data['Class'] == 1]
data_genuine = data[data['Class'] == 0]
data.head()

Split the data into features and labels:

X = data.drop('Class', axis=1)
y = data['Class']

Data Exploration

Apply PCA to reduce the dimensionality of features 'X' into two dimensions:

pca = PCA(n_components=2)
transformed_data = pca.fit_transform(X)
df = pd.DataFrame(transformed_data)
df['label'] = y
df.head()

Use a scatter plot to visualize the data:

PCA Scatter Plot Placeholder

Building the Generator Model

Define a method to create the Generator model architecture:

# Generator model architecture
def build_generator():
    model = Sequential()
    # Define layers
    ...
    return model

# Build and visualize the generator model
gen = build_generator()
plot_model(gen, to_file='generator_model_plot.png', show_shapes=True, show_layer_names=True)

Building the Discriminator Model

Define a method to create the Discriminator model architecture:

# Discriminator model architecture
def build_discriminator():
    model = Sequential()
    # Define layers
    ...
    return model

# Build and visualize the discriminator model
disc = build_discriminator()
plot_model(disc, to_file='discriminator_model_plot.png', show_shapes=True, show_layer_names=True)

Combine Generator and Discriminator Models to Build The GAN

# Combine Generator and Discriminator models to build the GAN
def build_gan(generator, discriminator):
    ...
    return gan

# Build and visualize the GAN model
gans = build_gan(generator, discriminator)
plot_model(gans, to_file='gan_model_plot.png', show_shapes=True, show_layer_names=True)

Train and Evaluate our GAN

# Train and evaluate the GAN
...

Generate Synthetic Data using the Trained Generator

Generate synthetic data using the trained generator:

# Generate synthetic data using the trained Generator
synthetic_data = generate_synthetic_data(generator, 1000)
...

Checking the individual feature distribution of synthetic and real fraud data:

# Plot histograms for feature distribution
...

Model Evaluation

Calculate the accuracy score for Generative AI:

# Calculate accuracy score for Generative AI
...

Accuracy Score for Generative AI: 0.936

Conclusion

In this project, we have demonstrated the application of Generative Adversarial Networks (GANs) in fraud detection, specifically in generating synthetic fraudulent transactions to balance imbalanced datasets. By leveraging advanced techniques in machine learning and neural networks, we aim to enhance the accuracy and effectiveness of fraud detection systems, contributing to a safer and more secure financial environment.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published