Fig 10 Feature Selection through k-mers counting.
6. CONCLUSION AND FUTURE WORK
This method of feature selection and feature extraction from DNA data sequence was successfully
completed. Here, we employed K-mer counting, one-hot encoding, and ordinal encoding as the
language for choosing DNA sequence features in python libraries. We have demonstrated the result
using these libraries in the forms of a matrix, vector, and graph. In future, we also retrieved K-mers to
use in the classifier process.
REFERENCES
https://devopedia.org/cross-validation. Accessed: 2022-12-13.
[1] Arumugam, P., Professor, Department of Statistics, Manonmaniam Sundaranar University,
Tirunelveli (Tamil Nadu), India., Kadhirveni, V., Priya, R. L., Manimannan, Research Scholar,
Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli (Tamil Nadu),
India., Assistant Professor, Department of Statistics, Dr. Ambedkar Government Arts College,
Vyasarpadi, Chennai (Tamil Nadu), India., and Assistant Professor. Department of Statistics,
TMG College of Arts and Science, Chennai (Tamil Nadu), India. 2021. Prediction, cross
validation and classification in the presence COVID-19 of indian states and union territories
using machine learning algorithms. International Journal of Recent Technology and Engineering
(IJRTE) 10, 1 (May), 16–20.
[2] Breiman, L. Bagging predictors”. Boston. Manufactured in The Netherlands.
[3] Darapureddy, N., Karatapu, N., and Tirumala, K. 2019. Research of machine learning algorithms
using K-Fold cross validation”. International Journal of Engineering and Advanced Technology
(IJEAT).
[4] Gopika, D. and Azhagusundari, B. 2014. An analysis on ensemble methods in classification tasks”.
International Journal of Advanced Research in Computer and Communication Engineering 3, 7.
[5] Hulu, S. and Sihombing, P. 2020. Analysis of performance cross validation method and K-Nearest
neighbor in classification data. International Journal of Research and Review 7.
[6] Jung, Y. and A K-Fold. 2015. Averaging cross-validation procedure”. Journal of Nonparametric
Statistics.
[7] Kumar, S. 2020. Understanding 8 types of cross-validation. https://towardsdatascience.com/
understanding-8-types-of-cross-validation-80c935a4976d. Accessed: 2022-12-13.
[8] Mera-Gaona, M., LÅLopez, D. M., Vargas-Canas, R., and Neumann, U. 2021. Framework for the
ensemble of feature selection methods. Appl. Sci. (Basel) 11, 17 (Sept.), 8122.
[9] Pandey, M. and Taruna, S. 2014. A comparative study of ensemble methods for students’
performance modeling”. International Journal of Computer Applications 103, 8, 975–8887.
https://doi.org/10.17993/3ctecno.2022.v11n2e42.59-69
3C Tecnología. Glosas de innovación aplicadas a la pyme. ISSN: 2254-4143
Ed. 42 Vol. 11 N.º 2 August - December 2022
68