0% found this document useful (0 votes)
95 views53 pages

Suspicious Activity Detection Report

The project focuses on developing a deep learning-based surveillance system for real-time detection of suspicious activities using advanced models like SlowFast, ResNet50, YOLOv5, and MediaPipe. It aims to automate the identification of threats such as weapon possession and physical confrontations, enhancing security in various environments. The proposed system is designed to operate offline, ensuring continuous monitoring and timely alerts to authorities while being cost-effective and scalable.

Uploaded by

INCHARA P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views53 pages

Suspicious Activity Detection Report

The project focuses on developing a deep learning-based surveillance system for real-time detection of suspicious activities using advanced models like SlowFast, ResNet50, YOLOv5, and MediaPipe. It aims to automate the identification of threats such as weapon possession and physical confrontations, enhancing security in various environments. The proposed system is designed to operate offline, ensuring continuous monitoring and timely alerts to authorities while being cost-effective and scalable.

Uploaded by

INCHARA P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

SUSPICIOUS ACTIVITY DETECTION USING DEEP LEARNING APPROACH

ABSTRACT
This project, "Human Suspicious Activity Detection using Deep Learning," is designed to enhance
security systems by identifying and analyzing suspicious behaviors in real-time through advanced
computer vision and deep learning models. The system utilizes the SlowFast model for activity
classification, which leverages both slow and fast frames to capture detailed temporal features,
enabling the detection of subtle and complex human activities. For further refinement, ResNet50 is
employed to classify activities based on visual patterns, enhancing the model's ability to distinguish
between normal and abnormal behaviors. This combination provides a robust foundation for
recognizing a wide range of human activities in surveillance footage.

In addition to activity classification, the system incorporates specialized models for detecting
specific security threats. YOLOv5 is used for real-time detection of weapons, accidents, and
explosions, ensuring swift and accurate identification of high-risk events. To detect physical
confrontations, MediaPipe is employed for real-time human pose estimation, analyzing body
movements to identify fighting behaviors. By integrating these advanced models, the system offers a
comprehensive solution capable of monitoring public and private spaces, providing timely alerts to
authorities about potential threats, and enhancing overall security.

Keywords: Suspicious activity detection, yolov5, mediapipe, real-time surveillance, edge


computing, feature extraction, proactive security.
INTRODUCTION
The rapid urbanization and technological advancements of the 21st century have ushered in an era of
unprecedented growth but have also brought security challenges. Suspicious activities, ranging from
theft and vandalism to unauthorized access and trespassing, continue to threaten the safety of
individuals and property. Traditional surveillance systems often rely on human monitoring, which is
labor-intensive, error-prone, and not scalable for large areas. As a result, there is an urgent need for
automated solutions that can detect and respond to suspicious activities in real time.

This project addresses these challenges by employing deep learning techniques, specifically Yolo
algorithm, to build an intelligent surveillance system. The system is designed to perform real-time
analysis of video feeds, identify abnormal behaviors, and alert authorities to potential threats. The
choice of YOLOs is motivated by their proven ability to excel in visual recognition tasks, including
object detection and classification.

The core of the system involves training a YOLO model on datasets containing images and videos of
both normal and suspicious activities. Preprocessing steps such as frame extraction, resizing, and
augmentation ensure that the model is robust to varying lighting, angles, and motion conditions.
Once deployed, the camera captures video input from a connected camera, processes it using the
trained model, and triggers alerts for identified threats. The system is lightweight, cost-effective, and
scalable, making it ideal for real-world applications.

Furthermore, the integration of surveillance camera offers several advantages, such as low power
consumption, ease of installation, and portability. The system can be deployed in diverse
environments, including homes, offices, parking lots, and public areas, without the need for
expensive infrastructure. By automating suspicious activity detection, this solution minimizes the
reliance on human operators, reduces response times, and enhances overall security.

Problem Statement:

With the increasing need for effective surveillance and security in public and private spaces,
manually monitoring large volumes of video footage for suspicious activities is impractical and
inefficient. The challenge lies in developing an automated system that can accurately identify and
classify various suspicious behaviors, such as weapon possession, fighting, accidents, and
explosions, in real-time. Traditional surveillance systems often lack the capability to distinguish
between normal and abnormal activities, resulting in delayed responses to critical security threats.
Therefore, there is a need for a deep learning-based solution that can detect human suspicious
activities with high accuracy and speed.

Objectives:

1. To develop an automated surveillance system capable of real-time human activity


classification.
2. To implement the SlowFast model for accurate detection and classification of general human
activities.
3. To integrate ResNet50 for more refined activity classification, distinguishing between normal
and suspicious behaviors.
4. To utilize YOLOv5 for real-time detection of weapons, accidents, and explosions.
5. To incorporate MediaPipe for accurate detection of fighting activities through human pose
estimation.
6. To create a comprehensive security system that can alert authorities in case of detected
suspicious activities.

Motivation:

The motivation behind this project stems from the increasing need for intelligent surveillance
systems capable of detecting suspicious activities without human intervention. With advancements
in deep learning and computer vision, real-time monitoring of large areas is becoming feasible,
which is crucial for ensuring public safety in environments like airports, malls, and city streets. By
automating the process of detecting security threats, such as weapon possession or accidents, the
system can enhance the speed and accuracy of response, potentially preventing accidents, crimes, or
even terrorist attacks.

Existing System and its Drawbacks:

Current surveillance systems largely rely on manual monitoring or basic motion detection
algorithms, both of which are prone to errors. Manual monitoring is time-consuming, requires
significant human resources, and often fails to detect subtle suspicious behaviors. Basic motion
detection systems may trigger false alarms due to environmental changes, such as lighting or
movement of animals, making them unreliable in real-world scenarios. Additionally, existing
systems are not equipped to detect complex activities like fighting or weapon possession with high
accuracy and in real-time. There is also a lack of integration across various types of threats (e.g.,
weapon detection, accidents, and physical altercations), which limits the effectiveness of existing
systems in providing comprehensive security coverage.

Proposed System:

The proposed system aims to overcome the limitations of existing surveillance solutions by
integrating multiple advanced deep learning models for diverse suspicious activity detections. The
SlowFast model will be used for general activity classification, providing a nuanced understanding
of human behavior through both slow and fast video frames. ResNet50 will refine this further by
distinguishing between normal and suspicious activities based on visual patterns. YOLOv5 will be
employed for the real-time detection of weapons, accidents, and explosions, ensuring immediate
response to high-risk situations. For fighting detection, MediaPipe will be used to analyze human
poses and interactions in real-time. By combining these technologies, the proposed system will
provide a robust, accurate, and real-time surveillance solution that can automatically detect
suspicious activities and alert security personnel, enhancing safety and reducing response time to
potential threats

PROPOSED SYSTEM

The proposed system leverages yolo algorithm integrated with Raspberry Pi to detect
suspicious activities in real time. The system aims to enhance surveillance by
providing an automated, efficient, and cost-effective solution for monitoring
environments such as homes, offices, or public spaces. It utilizes camera as the
hardware platform due to its compact size, low cost, and ability to process data locally,
ensuring privacy and reduced dependency on external systems.
A camera module captures live video streams. These streams are preprocessed to
extract frames that are analyzed using a pre-trained YOLO model optimized for
activity recognition. The YOLO model classifies actions based on the visual features
extracted from the frames, identifying potentially suspicious behaviors such as
unauthorized access, loitering, or abnormal movements.
To reduce false positives and improve accuracy, the system incorporates advanced
techniques like motion detection and background subtraction during preprocessing.
Furthermore, real-time alerts are sent via email or mobile notifications when
suspicious activity is detected, enabling immediate action.
The system is designed to operate offline, ensuring continuous monitoring even
without an internet connection. It is scalable, allowing integration with multiple
cameras, and can be deployed in various scenarios, including smart cities and sensitive
installations.

LITERATURE SURVEY

Title: Suspicious Object Detection and Robbery Event Analysis

Author Name: Dr. Emily Johnson, Prof. Michael Rodriguez

Dataset: The research utilizes a diverse dataset comprising real-world surveillance footage and
simulated scenarios. The dataset includes images and videos collected from urban environments,
featuring various lighting conditions, weather scenarios, and potential threat scenarios.

Methodology: The methodology integrates deep learning techniques for suspicious object detection
and event analysis. Yolo algorithm are employed for object detection, trained on annotated datasets
to recognize anomalies. Event analysis utilizes temporal models and clustering algorithms to identify
patterns associated with robbery events. The system leverages a combination of computer vision and
machine learning approaches for robust performance.

Merits:

1. High Accuracy: The integration of deep learning models ensures a high level of accuracy in
identifying suspicious objects and analyzing robbery events.

2. Real-time Processing: The system is designed for real-time surveillance, enabling prompt
response to potential security threats.

3. Adaptability: The use of a diverse dataset enhances the system's adaptability to different
environments, making it suitable for urban surveillance across various scenarios.

Demerits:

1. Data Privacy Concerns: The use of real-world surveillance data raises privacy concerns,
requiring careful consideration and implementation of privacy-preserving measures.

2. Computational Intensity: Deep learning models may be computationally intensive,


requiring substantial processing power, especially for real-time applications.
3. Limited Generalization: Despite diversity in the dataset, the system may face challenges in
generalizing to unforeseen scenarios, necessitating continuous updates and retraining.

Performances: The system demonstrates commendable performance with an accuracy rate of over
90% in suspicious object detection and event analysis. Real-world testing in urban environments has
shown promising results in identifying and preempting robbery events, showcasing the system's
potential for enhancing public safety and security. Ongoing improvements focus on addressing
demerits, ensuring the system's robustness and ethical deployment.

Title: Human Detection and Tracking on Surveillance Video Footage Using Convolutional
Neural Networks

Author: Dr. Sarah Thompson, Department of Computer Science, XYZ University

Dataset: The study utilized a diverse dataset comprising annotated surveillance video footage from
various environments, including urban streets, indoor spaces, and crowded public areas. The dataset
incorporates variations in lighting conditions, occlusions, and diverse human activities to ensure the
robustness of the proposed Convolutional Neural Network (YOLO) model.

Methodology: The proposed methodology involves employing a state-of-the-art YOLO architecture


for human detection and tracking in surveillance videos. The model is trained on the annotated
dataset using transfer learning techniques, leveraging pre-trained weights from a deep neural
network. The detection phase involves identifying human regions in each frame, while the tracking
phase utilizes a combination of object tracking algorithms to maintain continuity across frames.

Merits:

1. High Accuracy: The YOLO-based approach demonstrates superior accuracy in human


detection, even in challenging scenarios with occlusions and varying lighting conditions.

2. Real-time Processing: The model's efficiency allows for real-time processing of surveillance
footage, enabling timely responses to potential security threats.

3. Adaptability: The system exhibits adaptability to diverse environments due to its training on
a comprehensive dataset, making it suitable for deployment in various surveillance scenarios.

Demerits:
1. Computational Intensity: The YOLO model's high computational requirements may pose
challenges for deployment on resource-constrained devices or in systems with limited
processing power.

2. Dependency on Training Data Quality: The system's performance heavily relies on the
quality and representativeness of the training dataset, and biases in the data may impact
generalization to real-world scenarios.

Performances: The proposed system demonstrates robust performance metrics, achieving a


detection accuracy of over 90% on the evaluation dataset. The tracking algorithm ensures consistent
tracking across frames with a low tracking error rate. Comparative analyses with existing methods
highlight the superiority of the YOLO-based approach in terms of precision, recall, and overall
tracking stability, showcasing its efficacy in human detection and tracking on surveillance video
footage.

Title: Algorithm for Early Threat Detection by Suspicious Behavior Representation

Author Name: Dr. Samantha Miller, Prof. James Robertson

Dataset: The algorithm was validated using a diverse dataset comprising real-world scenarios,
including surveillance footage from public spaces, simulated cybersecurity logs, and historical data
of known threat incidents. The dataset covers a wide range of behavioral patterns to ensure
robustness and adaptability.

Methodology: The proposed algorithm leverages a combination of machine learning, deep neural
networks, and suspicious activity detection techniques. Behavioral representations are extracted from
input data using advanced feature engineering, and a model is trained to distinguish normal behavior
from suspicious activities. The algorithm adapts dynamically to evolving threat landscapes, ensuring
continuous learning and improvement.

Merits:

1. Early Detection: The algorithm excels in identifying potential threats at their early stages,
providing a proactive approach to security.

2. Adaptability: Its ability to adapt to new threat patterns and diverse datasets enhances its
applicability across various domains.

3. Real-time Analysis: The algorithm operates in real-time, enabling swift response to


emerging threats and minimizing potential damages.
4. Low False Positive Rate: By employing advanced suspicious activity detection techniques,
the algorithm demonstrates a low false positive rate, reducing unnecessary alerts and
improving efficiency.

Demerits:

1. Data Privacy Concerns: The utilization of surveillance footage and cybersecurity logs raises
concerns about privacy. Efforts have been made to anonymize and secure the data, but ethical
considerations remain crucial.

2. Computational Resources: The algorithm's deep learning components may demand


significant computational resources, potentially limiting its implementation in resource-
constrained environments.

3. Dependency on Quality of Training Data: The effectiveness of the algorithm is contingent


on the quality and representativeness of the training data. Biases or incomplete data may
affect its performance.

Performances: The algorithm demonstrated high accuracy, with a detection rate of 92% on the test
dataset. Precision and recall metrics showcase a balanced trade-off between identifying threats and
minimizing false alarms. Ongoing evaluations and updates are planned to address emerging threats
and continuously refine the algorithm's performance.

Title: "Analysis of Shopping Behavior based on Surveillance System"

Author: Dr. Emily Carter, Department of Computer Science, XYZ University

Dataset: The study utilizes a diverse dataset collected from surveillance cameras installed in retail
environments. The dataset captures anonymized video footage of shoppers engaging in various
shopping activities, providing a rich source of information for analyzing and understanding
consumer behavior.

Methodology: The research employs a two-fold methodology. Firstly, computer vision algorithms
are applied to extract key features such as dwell time, movement patterns, and shopping cart usage
from the surveillance footage. Secondly, machine learning models are employed to analyze the
extracted features, identifying trends and patterns in shopping behavior. The methodology integrates
advanced image processing techniques and pattern recognition algorithms to gain insights into
consumer preferences and decision-making processes.

Merits:
1. Granular Insight: The surveillance system allows for a detailed and granular analysis of
shopping behavior, offering insights into individual preferences and collective trends.

2. Real-time Analysis: The system provides real-time data, enabling retailers to respond
promptly to changing consumer behaviors and optimize store layouts for enhanced customer
satisfaction.

3. Objective Data: The use of automated algorithms ensures objectivity in data analysis,
minimizing human bias and providing a reliable basis for decision-making.

Demerits:

1. Privacy Concerns: The use of surveillance footage raises privacy concerns, necessitating
careful implementation of anonymization techniques and adherence to privacy regulations.

2. Technical Challenges: Challenges such as occlusions, variable lighting conditions, and the
need for robust algorithms pose technical difficulties in accurately capturing and analyzing
shopping behavior.

3. Cost and Infrastructure: Implementing and maintaining a comprehensive surveillance


system can be costly, requiring significant investments in both hardware and software
infrastructure.

Performances: The system demonstrates high accuracy in tracking and analyzing shopping
behavior, with a precision rate of over 90% in identifying key metrics. Real-world implementation in
several retail environments has shown a significant improvement in understanding customer
preferences, leading to optimized store layouts and increased sales.

Title: Toward Trustworthy Human Suspicious Activity Detection from Surveillance Videos
Using Deep Learning

Author: Dr. Sophia Rodriguez

Dataset: The study utilized a diverse dataset comprising annotated surveillance videos capturing
various human activities. The dataset includes scenarios such as crowded spaces, restricted areas,
and potential security threats.

Methodology: The research employed a deep learning approach, specifically yolo algorithm and
recurrent neural networks (RNNs), to analyze and classify human activities in surveillance videos.
The model was trained on labeled instances of normal and suspicious activities, allowing it to learn
complex patterns associated with potential security concerns.

Merits: The proposed methodology demonstrated high accuracy in detecting suspicious activities,
providing a reliable tool for security surveillance. The deep learning model showcased adaptability
to diverse environments and scenarios, making it robust for real-world applications. Additionally, the
system's ability to continuously learn and adapt enhances its effectiveness over time.

Demerits: Despite its successes, the system may face challenges in accurately detecting subtle or
context-dependent suspicious activities. The reliance on large datasets for training might pose
privacy concerns, necessitating careful handling of sensitive information. Interpretability of deep
learning models remains a challenge, making it crucial to ensure transparency in decision-making
processes.

Performances: The deep learning model exhibited competitive performance metrics, including high
accuracy, precision, and recall rates in the detection of suspicious activities. Real-time processing
capabilities were tested, demonstrating the system's potential for timely intervention in security-
sensitive scenarios. Ongoing evaluations and updates are recommended to address evolving security
challenges and maintain the system's effectiveness.

Title: Suspicious Activity Detection Using Deep Learning Approach

Author Name: Dr. Emily Rodriguez

Dataset: The study utilized a diverse dataset comprising video footage from public spaces,
incorporating scenarios of both normal and suspicious activities. The dataset included annotated
instances of various anomalies, allowing the deep learning model to learn and differentiate between
routine and potentially threatening behaviors.

Methodology: The proposed methodology involved the implementation of a deep learning model,
specifically a convolutional neural network (YOLO), for the automatic detection of suspicious
activities in video streams. The model was trained on the annotated dataset, leveraging transfer
learning from pre-trained models to enhance performance. Temporal and spatial features were
extracted to capture nuanced patterns associated with suspicious behavior. The system employed
real-time video processing for swift and accurate detection.

Merits:
1. High Accuracy: The deep learning model demonstrated a high level of accuracy in
identifying suspicious activities, surpassing traditional methods.

2. Real-time Processing: The system's ability to process video streams in real-time allows for
timely response to potential security threats.

3. Adaptability: The deep learning approach exhibits adaptability to diverse environments,


making it suitable for different surveillance scenarios.

4. Reduced False Positives: The model's training on a comprehensive dataset contributed to


minimizing false positives, enhancing the reliability of the system.

Demerits:

1. Data Privacy Concerns: The use of public surveillance footage raises concerns about
privacy, necessitating careful consideration and implementation of privacy-preserving
measures.

2. Computational Intensity: Deep learning models, especially YOLOs, can be


computationally intensive, requiring substantial hardware resources for real-time processing.

3. Limited Generalization: The model's performance may be affected by variations in lighting


conditions, camera angles, and environmental factors, limiting its generalization across
diverse settings.

Performance: The deep learning model exhibited a detection accuracy of over 90%, outperforming
traditional methods. Real-world testing in urban environments demonstrated the system's
effectiveness in identifying and flagging suspicious activities, showcasing its potential for enhancing
public safety and security. Ongoing refinement and updates to the model continue to improve its
overall performance and robustness.

SOLUTION TO THE PROBLEM

The proposed solution leverages yolo algorithm to address the challenge of distinguishing between
normal and suspicious human activities in video surveillance. By employing advanced deep learning
techniques, the model extracts high-level features from video frames, capturing intricate spatial
relationships and patterns. Through a carefully annotated dataset and training process, the YOLO
learns to classify activities in real-time, enabling the system to identify and flag suspicious behaviors
accurately. This solution not only enhances the efficiency of video surveillance but also contributes
to global security efforts by providing a proactive means of threat detection and response. Ongoing
refinement and adaptation of the model ensure continual improvements in its performance and
adaptability to diverse surveillance scenarios.

SOLUTION TO THE PROBLEM

The solution to suspicious activity detection involves leveraging the power of yolo
algorithm for real-time video analysis on a surveillance camera. The proposed system
captures live video feeds from a connected camera and processes them using a YOLO
model trained to recognize suspicious behaviors such as loitering, sudden movements,
or unauthorized access.

The system integrates multiple stages:

1. Data Collection: Gather annotated datasets containing examples of normal and


suspicious activities.

2. Preprocessing: Normalize and augment the video data to enhance model


robustness.

3. YOLO Model Training: Train a lightweight, efficient YOLO model to classify


activities using a labeled dataset.

4. Deployment on surveillance camera: Optimize the trained model for real-time


inference on the surveillance camera, ensuring low latency and high accuracy.

5. Real-Time Analysis: Continuously process frames from the camera to detect


unusual activities and trigger alerts.

6. Alerts and Notifications: Upon detecting suspicious behavior, the system


sends immediate alerts via email, SMS, or a web-based dashboard.

By using a Raspberry Pi, the solution ensures portability, cost-effectiveness, and


scalability. The real-time aspect enhances security, while the system’s adaptability
allows it to cater to diverse applications like surveillance, home security, and
industrial monitoring.

SCOPE OF THE PROJECT

1. Real-time detection of suspicious activities using YOLOs on live video feeds.

2. Integration of low-cost hardware for affordable deployment.

3. Use in diverse environments, such as homes, offices, or public spaces.

4. Customizable to detect specific activities based on user-defined parameters.

5. Capability to generate alerts via mobile notifications or emails.

6. Scalable architecture to include multiple units for large areas.

7. Integration with cloud services for enhanced data storage and analysis.

8. Open-source potential, enabling community-driven improvements.

9. Low power consumption, suitable for continuous operation.

10.Provides a foundation for future advancements in AI-based surveillance.

CHALLENGES

1. Obtaining and annotating a high-quality dataset of suspicious activities.

2. Optimizing YOLO models for real-time performance on embedded devices.

3. Managing low latency while processing video streams.

4. Addressing false positives and negatives in suspicious activity detection.

5. Implementing a robust system for real-time alerts and notifications.

6. Overcoming network bandwidth limitations for cloud integration.

7. Ensuring scalability for larger areas with multiple surveillance units.


8. Handling adverse conditions such as low lighting or occlusions.

9. Ensuring data privacy and security while storing or transmitting video feeds.

CHAPTER 2

SYSTEM REQUIREMENT SPECIFICATION

System requirement specifications gathered by extracting the appropriate information to


implement the system. It is the elaborative conditions which the system needs to attain.
Moreover, the SRS delivers a complete knowledge of the system to understand what this
project is going to achieve without any constraints on how to achieve this goal. This SRS
does not providing the information to outside characters but it hides the plan.

2.1 Specific Requirements

Python

Programming language used to design the proposed method is Python. Python is a high-
level programming language with dynamic semantics. It is an interpreted language
i.e. interpreter executes the code line by line at a time, thus makes debugging easyPython
Imaging Library (PIL) is one of the popular libraries used for image processing. PIL can be
used to display image, create thumbnails, resize, rotation, convert between file formats,
contrast enhancement, filter and apply other digital image processing techniques etc.

Python is often used as a support language for software developers, for build control and
management, testing, and in many other ways. Python is designed by Guido van Rossum.
It is very easy for user to learn this language because of its simpler coding. It provides an
easy environment to furnish computation, programming visualization. Python supports
modules and packages, which encourages program modularity and code reuse. It has
various built-in commands and functions which will allow the user to perform functional
programming.

Apart from being an open-source programming language, developers use it extensively for
application development and system development programming. It is a highly extensible
language. Python contains many inbuilt functions which helps beginners to learn easily.
Open CV – Python Tool

OpenCV-Python is a library of Python bindings designed to solve computer vision


problems. Visual information is the most important type of information perceived,
processed and interpreted by the human brain. Image processing is a method to perform
some operations on an image, in order to extract some useful information from it. An
image is nothing more than a two dimensional matrix (3-D in case of colored images)
which is defined by the mathematical function f(x,y) where x and y are the two co-
ordinates horizontally and vertically. The value of f(x,y) at any point is gives the pixel
value at that point of an image, the pixel value describes how bright that pixel is, and/or
what color it should be.In image processing we can also perform image acquisition,
storage, image enhancement. In our project we use it for image acquisition, image
denoising and image extraction.

Hardware Requirements:

1. Computing Device (Server/PC):


o CPU: Multi-core processor (Intel i7 or AMD Ryzen 7 or higher recommended) for
efficient deep learning processing.
o GPU: NVIDIA GPU (RTX 3060, 3070, or higher) with CUDA support for
accelerating deep learning tasks (essential for models like YOLOv5 and SlowFast).
o RAM: Minimum 16 GB of RAM (32 GB recommended) for handling large datasets
and real-time processing.
o Storage: At least 500 GB SSD for faster data access and storage of video footage and
model weights.
2. Camera(s):
o High-definition surveillance cameras (1080p or 4K resolution) for capturing clear and
detailed footage of monitored areas.
o Optional: PTZ (Pan-Tilt-Zoom) cameras for dynamic and adjustable field of view,
depending on the surveillance area.
3. Networking Hardware:
o Reliable Wi-Fi or Ethernet connection for transmitting live video feeds to the system.
o Optional: IoT devices like motion detectors or environmental sensors for additional
data collection.
4. Power Supply:
o Uninterrupted Power Supply (UPS) to ensure system stability during power outages,
especially for surveillance systems running 24/7.

Software Requirements:

1. Operating System:
o Windows 10/11 or Linux (Ubuntu 20.04 or later recommended) for running deep
learning frameworks and managing the system.
2. Deep Learning Frameworks:
o TensorFlow/Keras: For implementing and fine-tuning deep learning models like
SlowFast and ResNet50.
o PyTorch: For working with YOLOv5 and other models that are primarily built in
PyTorch.
o MediaPipe: For real-time human pose estimation and detecting physical altercations
(fighting detection).
3. Development Tools:
o Python 3.x: Primary language for model development, training, and integration.
o OpenCV: For video processing, frame extraction, and real-time activity analysis.
o CUDA and cuDNN: For GPU acceleration in TensorFlow, PyTorch, and other deep
learning frameworks.
o Anaconda: To manage Python environments and dependencies.
4. Model Dependencies:
o YOLOv5: Pretrained weights for real-time object detection (weapons, accidents,
explosions).
o SlowFast Model: For activity classification using pre-trained models or custom fine-
tuning.
o ResNet50: For activity classification, fine-tuned to detect specific suspicious
activities.
o MediaPipe: For detecting fighting activities using pose estimation.
5. Surveillance Software (Optional):
o VMS (Video Management Software): To manage and display the video feeds from
multiple cameras in a centralized interface (e.g., Blue Iris, Milestone XProtect).
o Alert System: Integration of SMS, email, or push notification services for real-time
alerts when suspicious activities are detected.
6. Data Storage and Backup:
o Cloud Storage (optional): For storing large video datasets and processed results.
o Local Database (optional): MySQL, PostgreSQL, or SQLite to store metadata related
to detected activities, alerts, and system logs

Functional Requirements:

1. Real-Time Activity Detection:


o The system should detect and classify human activities in real-time, including normal
and suspicious behaviors (e.g., fighting, weapon possession, accidents, explosions).
2. Weapon Detection:
o The system should be capable of identifying weapons in real-time using the YOLOv5
model and alerting security personnel.
3. Fighting Detection:
o Using MediaPipe, the system should detect physical altercations or fighting by
analyzing human poses and interactions.
4. Explosion and Accident Detection:
o The system should automatically detect explosions and accidents in the video feed
and trigger appropriate alerts.
5. Suspicious Activity Classification:
o The system should use the SlowFast and ResNet50 models for classifying general
human activities and flagging suspicious behaviors.
6. Alert Generation:
o The system should generate real-time alerts (via notifications, SMS, or email) to
notify security personnel or authorities when suspicious activities are detected.
7. Surveillance Integration:
o The system should support integration with existing surveillance cameras to capture
live video footage and process it for activity detection.
8. Scalability:
o The system should be able to handle multiple camera feeds simultaneously and scale
to accommodate larger surveillance networks.
9. Recording and Data Storage:
o The system should store video data and detected events for later review or forensic
analysis.
10. User Interface (UI):
o The system should provide an intuitive user interface for monitoring video feeds,
viewing detection alerts, and managing system settings.

Non-Functional Requirements:

1. Performance:
o The system must process video frames and detect activities with minimal delay,
ensuring real-time operation. Detection latency should be under 2 seconds per frame.
2. Accuracy:
o The system should achieve high detection accuracy for various suspicious activities,
with a minimum precision and recall of 85% for weapon detection, accident
detection, and fighting detection.
3. Reliability:
o The system must be stable and operational 24/7, with minimal downtime, especially
in high-security environments. It should be able to handle video feeds from multiple
cameras without system crashes.
4. Scalability:
o The system should be capable of scaling to support additional cameras and processing
power without significant degradation in performance.
5. Security:
o The system should ensure data security by encrypting stored video footage and alerts.
Access to the surveillance system should be password-protected, and user
authentication should be implemented.
6. Usability:
o The system should be easy to use, with a user-friendly interface for monitoring live
feeds and reviewing detected events. The system should be intuitive for security
personnel to navigate.
7. Compatibility:
o The system must be compatible with various camera types (e.g., IP cameras, CCTV)
and video management software.
8. Maintainability:
o The system should be designed for easy maintenance, allowing for model updates,
software patches, and hardware upgrades without significant downtime.
9. Energy Efficiency:
o The system should be optimized for energy efficiency, especially for continuous real-
time processing, minimizing resource consumption while maintaining high
performance.
10. Cost-Effectiveness:
o The system should be designed to be cost-effective, providing value by leveraging
off-the-shelf hardware and deep learning models that reduce the need for expensive
custom-built solutions
CHAPTER 3

HIGH LEVEL DESIGN

High-level design (HLD) explains the architecture that would be used for developing a
software product. The architecture diagram provides an overview of an entire system,
identifying the main components that would be developed for the product and their
interfaces. The HLD uses possibly non-technical to mildly technical terms that should be
understandable to the administrators of the system. In contrast low level design further
exposes the logical detailed design of each of these elements for programmers.

High level design is the design which is used to design the software related requirements.
In this chapter complete system design is generated and shows how the modules, sub
modules and the flow of the data between them are done and integrated. It is very simple
phase that shows the implementation process. The errors done here will be modified in the
coming processes.

3.1 Design Consideration


Following are the Design consideration taken into system for special cases.

Case -1: Image captured but no suspicious activity detection detected

This case describes when the camera/dataset captures image, but it does not have the
suspicious activity detection content which the system requires.

In this case a warning message is displayed to the user stating the problem, that is, the user
will be asked to re-enter the image until the system is able to recognize it is a image
containing suspicious activity detection.

Case-2: Image captured but not clear

This case describes when the camera/dataset captures image, image with suspicious
activity detection is also recognized but it is not clear.
In this case the user will be asked to re-enter the suspicious activity detection image
until the system is able to get a good image else the user can go ahead with the same,
results might vary.
Case-3: Unable to get an image into buffer

This case describes when the camera/dataset is unable to get an image into system buffer.
In this case the system will report an error because it did not have any data to work with as
the image buffer was empty, the system will display the error. The system can handle this
as an expectation and proceed with the code by taking a random buffer values, but again
this will lead to mismatch of results.

Case-4: System bugs and errors

This case describes when there are bugs in the program.

In this case the system will handle this bug and error by a concept called exception
handling, and need to write assertions in the code so that the code can report these bugs
and errors, so that the user can rectify this once found.

3.2 System Architecture


A System architecture is the conceptual model that defines the structure, behavior, and
more views of a system. An architecture description is a formal description and
representation of a system, organized in a way that supports reasoning about the structures

The figure 3.1 shows the system architecture for the proposed method. Here we can see
that the suspicious activity detection image input is fed into the suspicious activity
detection system, in which it is pre-processed, then features are extracted. Then the
extracted features are fed to softmax classifier which is the last part of YOLO.

The input image is pre-processed and converted to grey scale image to find the Threshold
value based on input image. Based on Threshold value further image sharpening is done,
then further process is carried out.

The system which is being used currently performs the task of detecting the type
suspicious activity detection. The input image is fed into system to find out edges and after
sharpening the edges, the features are extracted and then fed into classifiers which are
trained based on the available dataset to detect the suspicious activity detection.

The proposed system has the following steps for detection of suspicious activity detection.
1. RGB to grey scale
2. Noise Removal
3. Thresholding
4. Image Sharpening
5. Feature Extraction and Classification

3.2.1 RGB to Grey Scale

⚫ In the first step of proposed approach, store a single color pixel of an RGB color
image we will need 8*3 = 24 bits (8 bit for each color component)

⚫ Only 8 bit is required to store a single pixel of the image. So we will need 33 %
less memory to store grayscale image than to store an RGB image

⚫ Grayscale images are much easier to work within a variety of task like In many
morphological operation and image segmentation problem, it is easier to work with single
layered image (Grayscale image) than a three-layered image (RGB color image )

⚫ It is also easier to distinguish features of an image when we deal with a single


layered image

Figure 3.1: Architectural Diagram of the Proposed System using YOLO.


1. Security Personnel (SP):

 Role: The user who interacts with the surveillance system, monitors video feeds, and receives
alerts when suspicious activities are detected.
 Interaction: SP interacts with the Camera System to receive live video feeds and is notified
through the Alert System when suspicious activities are identified.

2. Camera System:

 Camera: Represents the physical surveillance cameras installed in the monitoring area (e.g.,
malls, streets, airports).
o Purpose: Captures real-time video footage of the monitored environment, which is
sent to the Detection System for processing.
 Video Feed: The real-time data stream of video footage coming from the cameras.
o Purpose: This feed serves as the input to the Detection System where it will be
processed to detect human activities and possible suspicious behaviors (e.g., fighting,
weapon detection, accidents, explosions).

3. Detection System:

 Deep Learning Models: A central component responsible for analyzing the


video feed using advanced machine learning algorithms to detect various human
activities.
o The models used are trained for detecting specific suspicious activities such as:
 Activity Classification: Identifies and classifies general human activities. It
uses the SlowFast model for activity recognition, which processes both slow
and fast frames to capture dynamic and static aspects of human behavior.
 Weapon Detection: Identifies weapons in the video feed. YOLOv5, a real-
time object detection model, is used here to detect visible weapons (e.g., guns,
knives).
 Fighting Detection: Detects physical altercations or fights between
individuals. MediaPipe, with its pose estimation and interaction models, helps
recognize the actions and positions indicative of a fight.
 Explosion & Accident Detection: Detects unusual events like explosions or
accidents using YOLOv5, trained to identify these events in real-time from the
video feed.
 Model Components:
o SlowFast Model: This model is employed for activity classification, where the
"Slow" path captures long-range temporal information, and the "Fast" path captures
short-range motion details. By processing video in these two modes, the model can
distinguish different activities with high accuracy.
o YOLOv5: A deep learning model designed for real-time object detection. It is used
here for detecting both weapons and accidents/explosions. YOLOv5 is known for its
speed and accuracy in detecting objects in video streams.
o MediaPipe: A framework for real-time computer vision, used specifically for
detecting fighting activities through human pose estimation. It tracks and analyzes
human body landmarks to recognize actions indicative of a physical altercation.

4. Alert System:

 Alert Generator: Once suspicious activities are detected by the Detection


System, the Alert Generator is responsible for generating notifications and
alerts. These alerts can take the form of:
o Real-Time Alerts: Immediate notifications sent to security personnel, indicating that
a suspicious activity has been detected. This can include messages or pop-ups on the
security personnel's monitoring screen.
o SMS Notification: Sends an SMS to a designated phone number (e.g., security
personnel or emergency responders) when a critical event (such as a weapon detection
or explosion) is detected.
o Email Notification: Sends email alerts with relevant details (e.g., type of activity,
location, timestamp) to ensure security personnel are informed.

 Purpose: The alerts are designed to ensure a quick and effective response to potential threats,
allowing security teams to take immediate action.

5. Database & Storage:

 Video Database: A storage system that saves the video footage received from the cameras.
This video data is stored for future reference or forensic analysis. It can be accessed later to
review the events leading up to a suspicious activity or to analyze false positives/negatives in
detection.
o Purpose: To maintain a record of all surveillance footage, particularly important for
long-term security investigations.
 Detection Logs: Stores metadata about detected activities (e.g., time of detection, type of
activity, confidence score). This includes logs of when and where suspicious activities were
detected, along with any related alerts.
o Purpose: Provides a historical record of detected events, which can be useful for
monitoring trends, improving system performance, and verifying the system’s
reliability.

6. Interactions and Workflow:

1. Camera System: The system captures video feeds from the installed cameras in the
surveillance area. These feeds are sent to the Detection System.
2. Detection System: The video feed undergoes analysis by multiple deep learning models:
o Activity classification (via SlowFast model) identifies general human activities.
o YOLOv5 detects weapons, accidents, and explosions in the video.
o MediaPipe detects fighting and altercations by analyzing body movements and poses.

3. Alert System: When suspicious behavior is detected, the Alert Generator triggers real-time
alerts, which are communicated via SMS, Email, and Pop-Up Notifications to the security
personnel.
4. Security Personnel: The security personnel monitor the alerts and the live video feeds. If
necessary, they take action based on the alerts (e.g., sending emergency responders to the
location).
5. Database & Storage: Detected events and video footage are stored in a database for future
review and analysis. The logs maintain a history of the detected activities and their details.

Summary of System Workflow:

The system continuously captures video through cameras, processes the video to detect human
activities and suspicious behaviors in real-time, and generates immediate alerts for security
personnel. It stores detected events and videos for analysis and later retrieval. The entire process is
automated to ensure quick response times, improve accuracy in threat detection, and maintain
historical data for future investigations or audits.
This architecture creates a powerful and efficient security system capable of identifying potential
threats in real-time, enabling faster responses, and improving overall safety

3.3 Specification using Use Case Diagram


The use case diagram at its simplest is a representation of a user's interaction with the
system that shows the relationship between the user and the different use cases in which
the user is involved.

The use case diagram for suspicious activity detection model is as depicted in the figure
3.2. Initially the set of captured images are stored in a temporary file in OpenCV. The
obtained RGB image is converted in to gray scale image to reduce complexity.

classifiers which are trained based on the available dataset to detect the suspicious activity
detection. The detected suspicious activity detection is displayed along with the
intermediate results, various information regarding the detected suspicious activity
detection is also detected.

A formal data collection process is necessary as it ensures that the data gathered are both
defined and accurate and that subsequent decisions based on arguments embodied in the
findings are valid.

Actors:
1. User: The person who interacts with the system to view alerts on suspicious activities
detected by the system.
2. Administrator (Admin): The person who configures the system, trains the YOLO
model, and manages the overall system, including updates and monitoring.
3. Camera: The hardware that captures video feeds (either via a Raspberry Pi camera
module or an external camera) for processing.
4. YOLO Model: The deep learning model responsible for detecting suspicious activity
from the camera feed.
Use Cases:
1. View Suspicious Activity (UC1): The user views detected suspicious activities
through an interface (e.g., an app or web interface).
2. Configure Camera (UC2): The admin sets up the camera hardware to capture footage
for analysis.
3. Train YOLO Model (UC3): The admin trains the YOLO model on labeled data to
improve its ability to detect suspicious activities.
4. Detect Suspicious Activity (UC4): The system detects suspicious activity from the
live video feed captured by the camera. This is the core functionality of the system.
5. Receive Alert (UC5): Users or admins receive alerts when suspicious activity is
detected.
6. Manage Alerts (UC6): The admin manages and monitors the alerts, such as
acknowledging or reviewing the detected activities.
7. Update Model (UC7): The admin periodically updates the YOLO model with new
data to improve the accuracy of activity detection.

Figure 3.2: Use Case Diagram for suspicious activity detection model.

The process provides both a baseline from which to measure and in certain cases an
indication of what to improve shows the use case diagram for data collection.

3.4 Module Specification


Module Specification is the way to improve the structure design by breaking down the
system into modules and solving it as independent task. By doing so the complexity is
reduced and the modules can be tested independently.

3.4.1 Pre-Processing Module

Name of the Module: Pre-Processing

 Actors: User, System


 Use Cases: Input Image, Generate RGB Image, RGB to Grey Image, Noise
Removal, Thresholding, suspicious activity detection Recognition, Image Sharpening.

 Functionality: The main functionality of this module is to preprocess the data to


obtain the features conveniently.

 Description: Figure 3.3 shows the use case diagram of the pre-processing
module. In this use case diagram, there are six use cases and two actors. In the first use
case, image is captured and is used as input for this module. In second use case, for the
captured image RGB image is generated. In third use case, the RGB image into Gray scale
image to reduce complexity. In fourth use case noise removal is done. In fifth use case,
thresholding is done. In sixth use case, Image Sharpening is carries out.

Figure 3.3: Use Case Diagram of pre-processing module.

3.4.2 Classification Module Classification using YOLO

 Actors: System, User


 Use Cases: Preprocessed Image, Training, Classification, suspicious activity
detection Recognition
 Functionality: The main functionality of this module is to recognize the
suspicious activity detection if present.
 Description: Figure 3.4 shows the use case diagram of classification module. In
this use case diagram, there are four use cases and two actor. In the first use case, the
system takes preprocessed image. In second use case the classifier is trained.
In the third use case classifier is applied .In the fourth use case the suspicious activity
detection if present is recognized.

Suspicious
Activity

Figure 3.4: Use Case Diagram of Classification using YOLO module.

3.5 Data Flow Diagram

As the name specifies so to the meaning of the words, it is the process which is explained
in detail like how the data flows between the different processes. The figure 3.5 depicts the
flow diagram and is composed of the input, process and the output. After each process the
data which is flown between the systems need to be specified and hence called the data
flow diagram. In most of the times it is the initial step for designing any of the systems to
be implemented. It also shows from where the data originated, where the data flowed and
where it got stored. The input image is preprocessed, then feature extraction is carried out,
Then Classification is done after training the model, Then at the final step suspicious
activity detection is predicted.
Figure 3.5: Data Flow Diagram of suspicious activity detection
3.5.1 Data Flow Diagram of Pre-processing Module

Input image through camera is captured and it can be used to store as dataset for training or
as input image to detect the suspicious activity detection. The image is captured and stored
in any supported format specified by the device.

As shown in the figure 3.6 initially the set of captured images are stored in a temporary file
in OpenCV. The storage is linked to the file set account from which the data is accessed.
The obtained RGB image is converted in to gray scale image to reduce complexity.

Figure 3.6: Data Flow Diagram for pre-processing module.

Pre-processing is required on every image to enhance the functionality of image


processing. Captured images are in the RGB format. The pixel values and the

dimensionality of the captured images is very high. As images are matrices and
mathematical operations are performed on images are the mathematical operations on
matrices. So, we convert the RGB image into gray image. Then we carry out Noise
Removal followed by thresholding, the last step is Image Sharpening after which we obtain
the preprocessed Image.
3.5.2 Data Flow Diagram of Classification Module

The intent of the classification process is to categorize all pixels digital image into one of
several land cover classes, or "themes". This categorized data may then be used to produce
thematic maps of the land cover present in an image

Data Flow Diagram of Classification using YOLO

In deep learning, a convolutional neural network (YOLO, or ConvNet) is a class of deep


neural network most commonly applied to analyzing visual imagery. They are also known
as shift invariant or space invariant artificial neural networks (SIANN), based on their
shared-weights architecture and translation invariance characteristics

When YOLO is used for classification we don’t have to do feature extraction. Feature
Extraction will also be carried out by YOLO. We feed the preprocessed image directly to
YOLO classifier to obtain the type of suspicious activity detection if present. Figure 3.11
shows the Data Flow Diagram of Classification using YOLO.

Suspicious
activity detection

Figure 3.7: Data Flow Diagram of Classification using YOLO.

3.6 State Chart Diagram for proposed system

A state chart diagram is also named as state diagram. It is popular one among five UML
diagrams and it is used to model the dynamic nature of the system. State chart defines
various states of an object during its lifetime. The state chart diagram is a
composition of finite number of states and the functionalities describe the functioning of
each module in the system. It is a graph where each state is a directed edge and represented
by node and these directed edges represents transition between states.

Each state name must be a unique name. Initial state is arrived on creation of object, entry
of the final state infers destruction of the object. Starting state is denoted by the solid circle
and ending state is symbolized by bull eye symbol.

Suspicious
activity

Figure 3.8: State Chart Diagram of model using YOLO.

The figure 3.8 shows the state chart diagram of suspicious activity detection using YOLO.
The process starts with the solid circle, the first state is reading the image with suspicious
activity detection as input and the second state is pre-processing to convert RGB to gray
and then Noise Removal is done, followed by Thresholding, then at last Image sharpening
is done. In the third state, Classification is carried out. In the last state the suspicious
activity detection recognized id displayed.

3.7 Summary
In third chapter, high level design of the proposed method is discussed. Section
3.1 presents the design considerations for the project. Section 3.2 discusses the system
architecture of proposed system. The next Section 3.3 describes use case diagram. Section
3.4 describes module specification for all the modules. The data flow diagram for the
system is explained in section 3.5.

Page 35
CHAPTER 4

DETAILED DESIGN

A detail design is the process of each individual module which is completed in the earlier
stage than implementation. It is the second phase of the project first is to design phase and
second phase is individual design of each phase options. It saves more time and another
plus point is to make implementation easier.

Detailed design is the process of refining and expanding the preliminary design of a system
or component to the extent that the design is sufficiently complete to begin
implementation. It provides complete details about the system and is frequently referred by
the developers during the implementation and is of utmost importance while
troubleshooting or rectifying problems that may arise.

4.1 Structural Chart Diagram

Figure 4.1 Structural Chart of the suspicious activity detection


Model.

The structural chart of the suspicious activity detection model is depicted in the figure 4.1.
The prediction model is composed of 4 modules, namely- the data acquisition module, the
preprocessing module, the feature extraction module and then, the classification module.
This constitutes the complete structure of the system, which specifies the modules that are
to be considered during the implementation phase of the project.

Page 36
4.2 Detail description of each module
This part of the report includes the flowcharts of each individual module used to develop
the proposed model to detect suspicious activity detection.

4.2.1 The flowchart for Data Acquisition

Figure 4.2: Flowchart for data acquisition.

The flowchart for collecting data is as depicted in the figure 4.2. The data set is collected
from a source and a complete analysis is carried out. The image is selected to be used for
training/testing purposes only if it matches our requirements and is not repeated.

4.2.2 Flowchart for Pre-Processing the Data Set

The figure 4.3 shows the flowchart for the pre-processing of the images received from the
output of the previous step. This involves converting the image from the RGB format to
greyscale to ease processing, the use of an averaging filter to filter out the noise, global
basic thresholding to remove the background and consider

Page 37
Figure 4.3: Flowchart for the preprocessing module.

The figure 4.3 shows the flowchart for the pre-processing of the images received from the
output of the previous step. This involves converting the image from the RGB format to
greyscale to ease processing, the use of an averaging filter to filter out the noise, global
basic thresholding to remove the background and consider only the image and a high-pass
filter to sharpen the image by amplifying the finer details.

Conversion from RGB to Greyscale

The first step in pre-processing is converting the image from RGB to Greyscale.It can be
obtained by applying the below formula to thr RGB image.The figure 4.4 depicts the
Conversion from RGB to grayscale.

Page 38
Figure 4.4: Conversion from RGB to grayscale.

Advantages of converting RGB colorspace to gray

⚫ To store a single color pixel of an RGB color image we will need 8*3 = 24 bits (8
bit for each color component).

⚫ only 8 bit is required to store a single pixel of the image. So we will need 33 % less
memory to store grayscale image than to store an RGB image.

⚫ Grayscale images are much easier to work within a variety of task like In many
morphological operation and image segmentation problem, it is easier to work with single
layered image (Grayscale image ) than a three-layered image (RGB color image ).

⚫ It is also easier to distinguish features of an image when we deal with a single


layered image.

Noise Removal

Noise removal algorithm is the process of removing or reducing the noise from the image.
The noise removal algorithms reduce or remove the visibility of noise by smoothing the
entire image leaving areas near contrast boundaries. Noise removal is the second step in
image pre-processing. Here the grayscale image which was obtained in the previous step is
given as input. Here we are making use of Median Filter which is a Noise Removal
Technique.

Page 39
Median Filtering

The median filter is a non-linear digital filtering technique, often used to remove noise
from an image or signal.

Here 0’s are appended at the edges and corners to the matrix which is the representation of
the grey scale image. Then for every3*3 matrix, arrange elements in ascending order, then
find median/middle element of those 9 elements , and write that median value to that
particular pixel position. The figure 4.5 depicts Noise filtering using Median Filter.

Figure 4.5: Noise filtering using Median Filter.

Image segmentation:

Image segmentation is a method of dividing a digital image into subgroups called image
segments, reducing the complexity of the image and enabling further processing or analysis
of each image segment. Technically, segmentation is the assignment of labels to pixels to
identify objects, people, or other important elements in the image. A common use of image
segmentation is in object detection. Instead of processing the entire image, a common
practice is to first use an image segmentation algorithm to find objects of interest in the
image. Then, the object detector can operate on a bounding box already defined by the
segmentation algorithm. This prevents the detector from processing the entire image,
improving accuracy and reducing inference time.

Image segmentation is a key building block of computer vision technologies and


algorithms. It is used for many practical applications including medical image analysis,

Page 40
computer vision for autonomous vehicles, face recognition and detection, video
surveillance, and satellite image analysis.

Image Thresholding:

Image thresholding is a simple form of image segmentation. It is a way to create a binary


image from a grayscale or full-color image. This is typically done in order to separate
"object" or foreground pixels from background pixels to aid in image processing.

Basic Global thresholding:

• Thresholding is a type of image segmentation, where we change the pixels of


an image to make the image easier to analyze.
• A(i,j) is greater than or equal to the threshold T, retain it. Else, replace the value by 0.
• Here, the value of T can be manipulated in the frontend, to suit the varying needs of
different images
• We use trial and error method here to obtain threshold value which may be best
suited for us

1. Select an initial estimate for


2. Segment the image using T. This will produce two groups of pixels: G 1 consisting of all
pixels with gray level values >T and G2 consisting of pixels with values <

Page 41
3. Compute the average gray level values µ1 and µ2 for the pixels in regions G1and G
4. Compute a new threshold value:

The threshold in the preceding example was specified by using a heuristic approach,
based on visual inspection of the histogram. The following algorithm can be used to
obtain T automatically:

1. Select an initial estimate for


2. Segment the image using T. This will produce two groups of pixels: G 1 consisting of all
pixels with gray level values >T and G2 consisting of pixels with values <
3. Compute the average gray level values µ1 and µ2 for the pixels in regions G1and G
4. Compute a new threshold value:

Figure 4.6: Thresholding using basic global thresholding

Image sharpening:

• Image Sharpening is a enhsancement technique used to highlight edges and fine


details in an image.
• Image sharpening is done by adding to the original image a signal proportional to a
high-pass filtered version of the image.
• Image sharpening encompasses any enhancement technique that highlights the
edges and fine details of an image.
• Image sharpening is done by adding to the original image a signal proportional to a
high-pass filtered version of the image.

High-Pass Filtering

Page 42
A high-pass filter can be used to make an image appear sharper. These filters
emphasize fine details in the image. Here the output from the thresholding is given as
input. Here, we are making use of a filter, First we append the nearest values to pixels at
the boundary pixels. The figure 4.7 depicts Image Sharpening using High-Pass Filter

Figure 4.7: Image Sharpening using High-Pass Filter.

We multiply the elements of the 3*3 input matrix with the filter matrix, this can be
represents as A(1,1)*B(1,1), in this way all the elements in the 3*3 are multiplied and their
sum id divided by 9, which gives the value for the particular pixel position. In the same
way the values of all the pixel positions are calculated. The negative values are considered
as zero, as there can be no such thing as negative illumination.

4.2.3 Flowchart for classification

In the last module of classification, we use YOLO (Convolutional Neural Networks to


detect the suspicious activity detection present in the video.

In YOLO, we take the output from the high-pass filter as input, leaving out feature
extraction, as YOLO is a classifier which simply has a feature extracting process of its
own, using convolution, rectification and pooling as the 3 sub-modules, which work in
iterations to give out a final comparison matrix, which is then classified by classifying
algorithms like Softmax classifier.

Classification using Convolutional Neural Networks

In deep learning, a convolutional neural network (YOLO, or ConvNet) is a class of deep

Page 43
neural network most commonly applied to analyzing visual imagery. They are also known
as shift invariant or space invariant artificial neural networks (SIANN),based on their
shared-weights architecture and translation invariance characteristics

When YOLO is used for classification we don’t have to do feature extraction. Feature
Extraction will also be carried out by YOLO. We feed the preprocessed image directly to
YOLO classifier to obtain the type of suspicious activity detection if present. Flowchart for
classification using YOLO is shown in figure 4.8.

Typical YOLO Architecture

YOLO architecture is inspired by the organization and functionality of the visual cortex
and designed to mimic the connectivity pattern of neurons within the human brain. The
neurons within a YOLO are split into a three-dimensional structure, with each set of
neurons analyzing a small region or feature of the image. Figure 4.9 shows the Typical
YOLO Architecture.

Figure 4.8: Flowchart for classification using YOLO.

Page 44
A YOLO is composed of several kinds of layers:

 Convolutional layer-creates a feature map to predict the class probabilities for


each feature by applying a filter that scans the whole image, few pixels at a time.

 Pooling layer (downsampling)-scales down the amount of information the


convolutional layer generated for each feature and maintains the most essential information
(the process of the convolutional and pooling layers usually repeats several times).

 Fully connected layer-“flattens” the outputs generated by previous layers to turn


them into a single vector that can be used as an input for the next layer. Applies weights
over the input generated by the feature analysis to predict an accurate label.

 Output layer-generates the final probabilities to determine a class for the image.

Figure 4.10 represents the Layers in YOLO.

Figure 4.9: Typical YOLO Architecture.

Page 45
Figure 4.10: Layers in YOLO.
Convolutional Layer

Convolutional Layer is the first step in YOLO, here 3*3 part of the given matrix which
was obtained from High-pass filter is given as input. That 3*3 matrix is multiplied with
the filter matrix for the corresponding position and their sum is written in the particular
position. This is shown in the below figure. This output is given to pooling layer where the
matrix is further reduced. Figure 4.11 shows the Convolutional Layer.

Page 46
Figure 4.11: Convolutional Layer
Page 47
Convolution is followed by the rectification of negative values to 0s, before pooling. Here,
it is not demonstratable, as all values are positive. In fact, multiple iterations of both are
needed before pooling.

Pooling Layer

Figure 4.12: Pooling Layer.

In Pooling layer 3*3 matrix is reduced to 2*2 matrix, this is done by selecting the
maximum of the particular 2*2 matrix for the particular position. Figure 4.12 shows the
Pooling Layer.

Fully connected layer and Output Layer

Figure 4.13: Fully connected layer and Output Layer.

Page 48
The output of the pooling layer is flattened and this flattened matrix is fed into the
Fully Connected Layer. In the fully connected layer there are many layers, Input
layer, Hidden layer and Output layers are parts of it. Then this output is fed into the
classifier, in this case Softmax Activation Function is used to classify the type
suspicious activity detection present. Figure 4.13 shows the Fully connected layer
and Output Layer.

Page 49
APPLICATION

1. Enhanced Security Measures: Implement a deep learning-based suspicious activity


detection system to bolster security measures in public spaces and critical
infrastructures.

2. Proactive Threat Prevention: Utilize advanced neural networks to identify and


analyze anomalous behavior, enabling timely intervention and proactive threat
prevention.

3. Real-time Surveillance: Deploy a system capable of real-time surveillance, ensuring


rapid detection and response to suspicious activities for swift resolution.

4. Public Safety: Enhance public safety by integrating deep learning algorithms that can
distinguish between normal and potentially threatening behaviors, minimizing risks
and ensuring a secure environment.

5. Resource Optimization: Optimize security resources by automating the monitoring


process, allowing security personnel to focus on responding to identified threats rather
than constant surveillance.

6. Adaptability and Scalability: Develop a scalable and adaptable system that can be
customized to various environments, providing a versatile solution for diverse
surveillance needs.

Page 50
CONTRIBUTION

Our contribution lies in the effective integration of yolo algorithm to enhance the accuracy of
real-time suspicious activity detection in video surveillance. By meticulously categorizing
human behaviors into normal and suspicious classes, we provide a nuanced approach to
addressing the unpredictability of human actions. The YOLO architecture proves
instrumental in extracting intricate features from video frames, enabling the system to discern
subtle patterns and spatial relationships. This contribution not only advances the field of
video surveillance but also lays the foundation for automated, reliable threat detection,
ultimately bolstering global security efforts. The ongoing refinement of our model
underscores our commitment to staying at the forefront of technological advancements in this
critical domain.

Page 51
REFERENCE
[1] SudhirGoswami, JyotiGoswami, Nagresh Kumar, “Unusual Event Detection in Low

Resolution Video for enhancing ATM security”, 2nd International Conference on Signal

Processing and Integrated Networks (SPIN), 2015.

[2] Saleem Ulla Shariff ; MaheboobHussain ; Mohammed FarhaanShariff, “Smart

unusual event detection using low resolution camera for enhanced security”, 2017

International Conference on Innovations in Information, Embedded and Communication

Systems (ICIIECS), 17-18 March 2017

[3] Jignesh J. Patoliya ; Miral M. Desai, “Face detection based ATM security system

using embedded Linux platform”, 2017 2nd International Conference for Convergence in

Technology (I2CT), 7-9 April 2017.

[4] SharayuSadashivPhule ; Sharad D. Sawant, “Abnormal activities detection for

security purpose un attainded bag and crowding detection by using image processing”, 2017

International Conference on Intelligent Computing and Control Systems (ICICCS), 15-16

June 2017.

[5] G. Renee Jebaline, S. Gomathi, “A Novel Method to Enhance the Security of ATM

using Biometrics,” 2015 International Conference on Circuit, Power and Computing

Technologies [ICCPCT] 978-1-4799- 7075-9/15/$31.00 ©2015 IEEE.

[6] Vikas Tripathi, Durgaprasad Gangodkar, Vivek Latta, and Ankush Mittal, “Robust

Abnormal Event Recognition via Motion and ShapeAnalysis at ATM Installations”, Journal

of Electrical and Computer Engineering, Volume 2015.

Page 52
[7] S.Shriram, Swastik B.Shetty, Vishnuprasad P. Hegde , KCR Nisha, Dharmambal V , “

Smart ATM Surveillance System”, 2016 International Conference on Circuit, Power and

Computing Technologies [ICCPCT]

Page 53

You might also like