DEEP LEARNING BASED MISSIN
G OBJECT
DETECTION AND PERSON IDENTIFICATION
AND APPLICATION FOR SMART CCTV
R. C. Dharmik
Assistant Professor, Department of Information Technology Yeshwantrao Chavan College of
Engineering, Nagpur, Maharashtra, (India).
Sushilkumar Chavhan
Assistant Professor, Department of Information Technology Yeshwantrao Chavan College of
Engineering, Nagpur, Maharashtra, (India).
S. R. Sathe
Professor, Department of CSE, VNIT Nagpur, (India).
Reception: 10/09/2022 Acceptance: 25/09/2022 Publication: 29/12/2022
Suggested citation:
Dharmik, R. C., Chavhan, S., y Sathe, S. R. (2022). Deep learning based missing object detection and person
identification: an application for smart CCTV. 3C Tecnología. Glosas de innovación aplicadas a la pyme, 11(2),
51-57. https://doi.org/10.17993/3ctecno.2022.v11n2e42.51-57
https://doi.org/10.17993/3ctecno.2022.v11n2e42.51-57
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed. 42 Vol. 11 N.º 2 August - December 2022
51
ABSTRACT
Security and protection are the most crucial concerns in today’s quickly developing world. Deep
Learning methods and computer vision assist in resolving both problems. One of the computer vision
subtasks that allows us to recognise things is object detection. Videos are a source that is taken into
account for detection, and image processing technology helps to increase the effectiveness of state-of-
the-art techniques. With all of these technologies, CCTV is recognised as a key element. Using a deep
convolutional neural network, we accept CCTV data in real time in this article. The main objective is
to make content the centre of things. Using the YOLO technique, we were able to detect the missing
item with an improvement of 10% sparsity over the current state-of-the-art algorithm in the context of
surveillance systems, where object detection is a crucial step. It can be utilised to take immediate
additional action.
KEYWORDS
Deep Learning, Object Detection, Computer vision
https://doi.org/10.17993/3ctecno.2022.v11n2e42.51-57
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed. 42 Vol. 11 N.º 2 August - December 2022
52
1. INTRODUCTION
The method of object detection involves comprehending the entire image, concentrating on proper
categorization while focusing on the item in the image. This process is a subtask of computer vision,
which also covers face detection and skeleton identification by Zhao, Zheng, Xu, and Wu
(2019) ,Felzenszwalb, Girshick, Mcallester, and Ramanan (2010),Sung and Poggio (1998),Doll´ar,
Wojek, Schiele, and Perona (2011)and Sampat and Bovik (2003). The development of computational
models that offer the most fundamental data required by computer vision applications is the aim of
object detection. An industrial revolution changes the face of the computer vision task where
identification, surveillance, medical, robotics, self-driving cars can be part of it. In recent era
advancement in deep leaning accelerated the object detection task. A specific form of machine
learning called ”deep learning” (DL) uses neural networks to learn in phases. It can therefore mimic
human thought. Video analytics is closely related with the deep learning which provides application in
different fields. Traditional Object detection is complex in complexity and low level features are also
leads to saturation to increased complexities (Zhao et al., 2019).
Image classification and detection are the steps for the detection of object for surveillance. Primary
task for both the operation to receive the relevant features. Deep Learning is the better solution for it.
It learns from previous stages and extracts new features. Deep Network may conventio al neural
network or Multilayer Perceptron networks which consist of activation functions and various hidden
layers. Image is extracted from the video with fixed size as neural network required for model (Sung
and Poggio, 1998). If fully integrated networks are necessary. However, size reduction results in
information loss inside the image, reducing accuracy, precision, and sparsity as well as applicability
for surveillance system (Zhao et al., 2019). Viola-Jones Detector (Viola and Jones, 2001) and HOG
Detector (Dalal and Triggs, 2005) are two examples of conventional object detection algorithms. One-
stage and two-stage algorithms based on deep learning are the two main categories. The two-stage
algorithms RCNN and SPPNet(Girshick, Donahue, Darrell, and Malik, 2013),(Liu, Anguelov, Erhan,
Szegedy, Reed, Fu, and Berg, 2016), Fast RCNN and Faster RCNN(He, Zhang, Ren, and Sun, 2014),
Mask R–CNN, Pyramid Networks/FPN(Girshick, 2015), and G–RCNN(Ren, He, Girshick, and Sun,
2015) propose object region utilising deep feature before classification. Without the region suggestion,
one-stage algorithms like YOLO (Redmon, Divvala, Girshick, and Farhadi, 2016), SSD (Liu et al.,
2016), RetinaNet (Girshick, 2015), YOLOv3, YOLOv4, etc. anticipate bounding boxes over the
pictures.
In this paper we created application using CCTV will be helpful to understand whether any object is
stolen from the area (room, cabin, etc.) and recognition of the person, thus helping to form the better
security infrastructure. This system can be used at official workspaces where there is a crucial need for
more secure surveillance than earlier. Analyses frames and find stolen objects by using structural
similarities. A deep learning-based object recognition system coupled with YOLO for remote
surveillance that can accurately and quickly identify the target inside video frames has been suggested.
proposed system has the ability to send data to a localised remote server after automatically detecting
a person or item. To detect and transmit the information, a thin coating of YOLO was utilised. We
applied filters to the data to keep it clean.
2. LITERATURE REVIEW
Viola and Jones (2001) proposed a machine Learning based object detection which is populary know
as Viola-Jones Detector where they calculate new feature, learn by using AdaBoost and able to clasify
more complex features. Dalal and Triggs (2005) proposed HOG (Histograms of Oriented Gradient)
object detection algorithm which uses dense overlapping grid gives which provides betteer results for
person identification, reducing false positive rates with resppect to Haar wavelet based detector.
Forsyth (2014) Exnded the above work and proposed the DPM model for object Detection. They
https://doi.org/10.17993/3ctecno.2022.v11n2e42.51-57
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed. 42 Vol. 11 N.º 2 August - December 2022
53
modify the SVM algorithm of data mining for the purpose object detection. It uses divide and conqure
technology with the root filter and various part filters which make it Multi instance Leaning algorithm.
All above traditional methods have far surpassed many algorithm in term of accuracy.
Girshick et al. (2013) proposed RCNN object detection algorithm with mean average precision. As
local Search combine with CNN it said to be RCNN. They Combine the CV and Deep Learning for
better results as comparre to traditional Methods. He et al. (2014) proposed SPPNet model for object
detection. It Detect the object by variriyng the image size. For that they changes the image size as
most of object detection algorithms are used the strandard size of 255*255.Girshick (2015) proposed
Fast RCNN detector, advanced verison of R-CNN and SPPNet. They suggested Dense boxex and and
Parse preposal to acceleate process as it is costly process.
Ren et al. (2015) proposed Faster RCNN detector and target is too reduce the time of RCNN. Which is
mostly presented as RPN as it uses more deep learning network.Redmon et al. (2016) proposed
Feature Selection framewok inside convonet popurarly know as FPN (Feature Pyramid Network)
which is based on Faster RCNN with framework. They suggest multi scale problems with deep
learning. Redmon et al. (2016) proposed most popolar and widely used algoritrhm you only looks one
(YOLO) one stage detector using deep learning. Insteed of consideration of whole image, it convert
image in to region and simulteniously predict the bounding boxes and probability of each region. Also
they train the loss function which helps model to detect the images. Further they improved to
YOLOv3, YOLOv4.
3. METHODOLOGY
By observing the various literature, we apply the CNN with two layer architecture and different
activation function and Filter. Identified Process is as follows.
Fig 1: Feature Selection.
https://doi.org/10.17993/3ctecno.2022.v11n2e42.51-57
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed. 42 Vol. 11 N.º 2 August - December 2022
54