consistent. Each time a learning phase is completed, a new hypothesis is taught, and the examples
are reweighted such that instances that were properly identified during that pase have a lower
weight and the system may focus on instances that weren’t. Instances that were incorrectly
categorised are chosen so they can be correctly categorised in the following learning stage. This
procedure keeps on till the final classifier is built. To arrive at the final forecast, the output of each
classifier is finally merged using majority voting. The Boosting method has been generalised in
AdaBoost(Breiman, )[12].
(3) Random Subspaces
The approach comes in two different types. Each base learner is taught using a distinct feature
subspace of the initial training data set at the first form. Only decision trees may be utilised as the
base learner at the second form (Gopika1 and Azhagusundari, 2014)[9].
(4) Random Forest
Breiman proposed Random Forest. Bagging plus the second kind of random subspaces can be used
to formulate it (Breiman) [12]. The bagging and random subspace methods are combined to induce
the tree. Although each model is a random tree rather than a single model, it differs from bagging
in that each tree is created in accordance with the bootstrap sample of the training set to N. Each
node is divided using yet another random step. Instead of examining all potential splits, a limited
subset of features is randomly picked, and the optimum split is determined from this subset. Across
all trees, the majority vote determines the final categorization [11].
(5) Rotation Forest
Rotation Forest is a brand-new ensemble approach built on the Principal Component Analysis (PCA)
and decision trees. To create a training set for the base classifier using a K axis rotation of the feature
subset, the attribute set F is randomly divided into K subgroups, and PCA is then performed separately
to each subset. By keeping all of the PCA, Rotation Forest maintains all of the information. The basis
classifier for Rotation Forest is the decisión tree(Pandey and Taruna, 2014) [11].
3. CROSS VALIDATION TECHNIQUES
A statistical technique called cross-validation determines how well a trained model will perform on
unobserved data. By training the model on a subset of the input data and testing it on a different
subset, the model’s effectiveness is confirmed. Building a generalised model is assisted by cross-
validation. Cross-validation is helpful for both performance estimate and model selection since
modelling is an iterative process.
Cross-validation involves the following three steps:
i. Split the dataset into two sections: a training section and a testing section.
ii. Use the training dataset to train the model.
iii.
Use the testing set to gauge the model’s effectiveness. Check for problems if the model
doesn’t perform well with the testing set.
If a model can predict accurately for a variety of input data and does well on unknown data, it is stable
and consistent. Evaluation of the stability of machine learning models is aided by crossvalidation.
The dataset has to be divided into three separate sections for training and testing the model:
•
Training Data: Using the training data, the model is trained to discover the dataset’s hidden
characteristics and patterns. The model continually assesses the data to better understand its
behaviour, and then it modifies itself to achieve its goal. Basically, it’s employed to fit the
models.
•
Validation Data: This is used to confirm that the model’s training results were accurate. It aids
in adjusting the hyper-parameters and settings of the model appropriately. The prediction error
for model selection is estimated using the validation data. Validation data helps prevent over-
fitting models.
•
Test Data: Following training, the test data confirms that the trained model is capable of
making precise predictions. It is used to evaluate the generalisation error of the last model
chosen (Hulu and Sihombing, 2020)[1](Jung and A K-Fold, 2015)[7][8](Wu, )[13-14](??, ).
https://doi.org/10.17993/3ctecno.2022.v11n2e42.59-69
3C TecnologÃa. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed. 42 Vol. 11 N.º 2 August - December 2022
62