Manjot Kaur
Bharati Vidyapeeth’s Institute of Computer Applications and Management
(BVICAM). New Delhi (India)
Tanya Garg
Bharati Vidyapeeth’s Institute of Computer Applications and Management
(BVICAM). New Delhi (India)
Ritika Wason
Bharati Vidyapeeth’s Institute of Computer Applications and Management
(BVICAM). New Delhi (India)
Vishal Jain
Bharati Vidyapeeth’s Institute of Computer Applications and Management
(BVICAM). New Delhi (India)
Recepción: 05/03/2019 Aceptación: 09/04/2019 Publicación: 17/05/2019
Citación sugerida:
Kaur, M., Garg, T., Wason, R. y Jain, V. (2019). Novel framework for handwritten digit
recognition through neural networks. 3C Tecnología. Glosas de innovación aplicadas a la pyme.
Edición Especial, Mayo 2019, pp. 448–467. doi:
Suggested citation:
Kaur, M., Garg, T., Wason, R. & Jain, V. (2019). Novel framework for handwritten digit
recognition through neural networks. 3C Tecnología. Glosas de innovación aplicadas a la pyme.
Special Issue, May 2019, pp. 448–467. doi:
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254–4143
The biggest challenge for natural language processing systems is to accurately
identify and classify the hand–written characters. Accurate Handwritten
Character recognition is a challenging task for humans too as the style, size
and other handwriting parameters may vary from human to human. Though a
relatively straightforward machine vision task but improved accuracy as compared
to the existing implementations is still desirable. This manuscript aims to propose
a novel neural network based framework for handwritten character recognition.
The proposed neural network based framework, transforms the raw data set to a
NumPy array to achieve image attening and feeds the same into a pixel vector
before feeding it into the network. In the neural network, the activation function
is applied to transfer the resultant value to the hidden layer where it is further
minimized through the use of minimized mean square and back propagation
algorithms before applying a stochastic gradient on the resultant mini–batches.
After a detailed study, the optimal algorithm for eective handwritten character
recognition was proposed. Initially, the framework has been simulated only on
digits through 50,000 training data samples, 10,000 validation data set and
10,000 test data set, the accuracy of 96.08.
This manuscript aims to give the reader an insight into how the proposed neural
network based framework has been applied for handwritten digit recognition. It
highlights the successful applications of the same while laying down the directions
for the enhancements possible.
Natural Language Processing, Handwritten Character Recognition, Neural
Networks, Machine Vision; Digit Recognition.
Edición Especial Special Issue Mayo 2019
Many literate humans eortlessly recognize the decimal digit set (0–9). A sample
of the same is depicted in Figure 1 below. This natural attribute of mankind
is actually due to the seemingly simple human brain. The human brain is a
supercomputer in itself as each of its hemispheres has a visual cortex containing
140 million neurons with approximately tens of billions of connections between
them (LeCun, et al., 1990). So this supercomputer tuned by evolution over
hundreds of millions of years on this earth is superbly adapted to understand this
complex, colourful visual world.
This masterpiece–human brain can solve a tough problem like recognizing any
entity in this world in moments of seconds. The diculty bubbles up when we
attempt to automate the same task by writing a computer program and applying
computer vision for character/digit recognition (Bottou, et al., 1994) (Nielsen,
2018). Recognizing digits, in particular, is a simple task as the input is simple
black and white pixels with only 10 well–dened outputs. However, the accurate
recognition of the handwritten shapes in dierent styles, fonts etc is a complex
task in itself. A simple 6 has a loop on bottom and vertical or curved stroke on
top which can be written in varied styles and is dicult to express algorithmically
to a machine which is none other than a newborn baby born after every turn
o (Bottou, et al., 1994; Nielsen, 2018; Hegen, Demuth, & Beale, 1996). Neural
networks solve the above problem in a much simpler way (LeCun, et al., 1990;
Bottou, et al., 1994; Nielsen, 2018; Hegen, et al., 1996; Widrow, Rumelhart, &
Lehr, 1994; Mishra & Singh, 2016). The strategy is to take huge data set of black
and white handwritten digits from real life people and build a neural network to
train from those data sets and learn to recognize those digits (Shamim, Miah,
Sarker, Rana, & Jobair, 2018; Patel, Patel, & Patel, 2011; Ganea, Brezovan, &
Ganea, 2005).
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254–4143
Figure 1. Sample of Handwritten Digits gure.
In Neural Networks, each node performs some simple computation on the input
and conveys a signal to next node through a connection having a weight and
a bias associated with it which amplies or diminishes a signal (Nielsen, 2018;
Shamim, Miah, Sarker, Rana, & Jobair, 2018). Dierent choices for weights and
bias results in dierent functions evaluated by networks. An appropriate learning
algorithm must be used to determine the optimal values of weights and bias
(Widrow, et al., 1994; Knerr, Personnaz, & Dreyfus, 1992).
All neural networks have these following common following attributes (Widrow,
et al., 1994; Alwzwazy, Albe–Hadili, Alwan & Islam, 2016; Matan, et al., 1990).
A set of processing units
A set of connections
A computing procedure
A training procedure
The Processing Units
The processing units in a Neural Network are the smallest units just like neurons
in a brain. These nodes work in a similar fashion and operate simultaneously.
There is no master procedure to coordinate them all (Cardoso & Wichert, 2013).
These units compute a scalar function of its input and broadcast the results to
units connected to it as output. The result is called the activation value and the
scalar function is called activation function (Widrow, et al., 1994).
There are three types of inputs:
Input unit, which receives data from the environment as input.
The hidden unit, which transforms internal data of network and broadcast
to the next set of units.
Edición Especial Special Issue Mayo 2019
Output unit, which represents a decision as the output of all system (Widrow,
et al., 1994).
The Connections
The connections are essential to determine the topology of a neural network.
There are three types of topologies (Gattal, Djeddi, Chibani & Siddiqi, 2016):
Unstructured networks for pattern completion.
Layered networks for pattern association.
Modular networks for building complex systems.
The topology used in this paper for the proposed system is layered networks
(Widrow, et al., 1994).
A Computing Procedure
Computations feed input vectors to processing units from the input layer (Sakshica
& Gupta, 2015). Then the activation value of remaining units is computed
synchronously or asynchronously. In a layered network, this is done by feedforward
propagation method. The activation functions used are mathematical functions.
The most common function is a sigmoidal function (Mishra & Singh, 2016).
A Training Procedure
Training a network implies adapting its connections according to the input
environment so the network can exhibit optimized computational behaviour for
all input patterns (Arel, Rose & Karnowski, 2010). The process used in this paper
is modifying weights and biases with respect to the desired output. The cost of
error is calculated using a mean square error method (Mishra & Singh, 2016).
Handwritten digit data set is vague in nature because they may not always
be sharp straight lines of pixels (LeCun, et al., 1990). The main goal in digit
recognition of feature extraction is to remove the ambiguity from data (Bottou,
et al., 1994; Cireşan, Meier, Gambardella & Schmidhuber, 2010). It deals with
extracting essential information from normalized images of isolated digits that
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254–4143
serve as raw data in the form of vectors (Cireşan, et al., 2010). The numbers in
the images can be of dierent sizes, styles and orientation (Patel, et al., 2011).In
this study, a subset of MNIST dataset is used which contains ten thousands of
scanned images of handwritten digits from 250 people. This data is divided into
three parts, the rst part contains 50,000 images to be used as training data. The
second part is 10,000 images to be used as testing data. The third part contains
10,000 images for validation data. There are 28X28 pixels in size gray scale
images. The training set, validation set and testing set are kept distinct to help
neural network to learn from training set , validate the results from validation
test and generate output from test set (LeCun, et al., 1990; Liu, Nakashima,
Sako & Fujisawa, 2003; LeCun, et al., 1995). These are an example of digits of
MNIST dataset collected in dierent hand writings. For example – a digit 2 can
be represented in a dierent orientation with or without a loop at the bottom or
straight line or curved line at the bottom.
Figure 2. A sample set of MNIST dataset.
These are the 100 examples out of 60,000 used in the input of neural network
in this paper.
We now discuss this feed–forward neural network that was applied in this work to
achieve highest possible accuracy in handwritten digit recognition (LeCun, et al.,
1989; Knerr, et al., 1992).
Edición Especial Special Issue Mayo 2019
Figure 3. Proposed Algorithm.
Above elaborates the steps of the proposed framework used in this study. The
next section details this simulation.
A. Neural Network
Figure 3 describes the architecture of the proposed neural network. It consists of
an input layer, a hidden layer and an output layer. Each layer contains a number
of neurons represented by a sigmoid function. So, the output of each neuron lies
in the range of [0,1]. Every neurons output is determined by a weighted sum.
jwjxj w is the weight of jthneurone having x input. The sum of the weighted
sum and bias value determines the output value. The input layer consists of 784
neurons with each neuron representing each pixel value. Since each digit should
be between 0and 9. So, the output layer consists of 10 neurons represented by the
matrix (LeCun, et al., 1990; Lauer, Suen & Bloch, 2007).