Feature Selection for Serving Medical Datasets Applying Heuristic Algorithms (Scatter Search within Decision Tree Classifier)

Authors

  • Maher Ibraheem Issa جامعة القدس المفتوحة

DOI:

https://doi.org/10.33977/2106-000-008-002

Keywords:

metaheuristic (MH), feature selection (FS), scatter search Algorithm (SSA), descision tree (DT), medical datasets

Abstract

Objectives: This research presents a feature selection process on different datasets of the medical domain with different aims and sizes using a wrapper approach based on a powerful metaheuristic algorithm which is the Scatter Search Algorithm and J48 decision tree classifier as the selection criteria.

Methods: The paper applied a modified approach of the basic Sequential Scatter Search algorithm called Improved Sequential Scatter Search follows the basic procedures of the original algorithm in addition to an early improvement mechanism choosing decision tree classifier to be the evaluator of the experiments.

Results: The experimental results show competition and superiority in feature selection compared to other metaheuristic algorithms for the same datasets in consideration of number of features selected and accuracy.

Conclusion: This research emphasizes the importance of wrapper approaches using metaheuristic algorithms to select the most dominant attributes in a dataset which is very important in reduction of the cost and complexity of all data analysis areas.

Author Biography

Maher Ibraheem Issa, جامعة القدس المفتوحة

Issa

References

EFERENCES

- Abd-Alsabour, N. (2018). On the Role of Dimensionality Reduction. J. Comput., 13(5), 571-579.

- Abdulrazzaq, M. B., & Saeed, J. N. (2019, April). A comparison of three classification algorithms for handwritten digit recognition. In 2019 International Conference on Advanced Science and Engineering (ICOASE) (pp. 58-63). IEEE.

- Ayesha, S., Hanif, M. K., & Talib, R. (2020). Overview and comparative study of dimensionality reduction techniques for high dimensional data. Information Fusion, 59, 44-58.

- Bahar, M. H., & Saad, H. (2024). Decision Tree Induction Using Evolutionary Algorithms: A Survey. International Journal of Computing and Digital Systems, 15(1), 99–113. https://doi.org/10.12785/ijcds/150109

- Bouchlaghem, Y., Akhiat, Y., & Amjad, S. (2022). Feature Selection: A Review and Comparative Study. E3S Web of Conferences, 351, 01046. https://doi.org/10.1051/e3sconf/202235101046

- Chen, C. W., Tsai, Y. H., Chang, F. R., & Lin, W. C. (2020). Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results. Expert Systems, 37(5), e12553.

- Chen, J., Yuan, S., Dongdong Lv, & Xiang, Y. (2021). A novel self-learning feature selection approach based on feature attributions. Expert Systems with Applications, 183, 115219–115219. https://doi.org/10.1016/j.eswa.2021.115219

- Cherfi, A., Nouira, K., & Ferchichi, A. (2020). MC4.5 decision tree algorithm: an improved use of continuous attributes. International Journal of Computational Intelligence Studies, 9(1/2), 4. https://doi.org/10.1504/ijcistudies.2020.106485

- Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent data analysis, 1(1-4), 131-156.

- Elgamal, Z. M., Yasin, N. B. M., Tubishat, M., Alswaitti, M., & Mirjalili, S. (2020). An improved Harris hawks optimization algorithm with simulated annealing for feature selection in the medical field. IEEE access, 8, 186638-186652.

- Erfani, S. M., Rajasegarar, S., Karunasekera, S., & Leckie, C. (2016). High-dimensional and large-scale anomaly detection using a linear one-class.

- Garcıa-López, F., Melián-Batista, B., Moreno-Pérez, J. A., & Moreno-Vega, J. M. (2003). Parallelization of the scatter search for the p-median problem. Parallel computing, 29(5), 575-589.

- García-Pedrajas, N., del Castillo, J. A. R., & Cerruela-García, G. (2021). SI(FS)2: Fast simultaneous instance and feature selection for datasets with many features. Pattern Recognition, 111, 107723. https://doi.org/10.1016/j.patcog.2

- Ghamisi, P., & Benediktsson, J. A. (2014). Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geoscience and remote sensing letters, 12(2), 309-313.

- Ghazal, M. M., & Hammad, A. (2022). Application of knowledge discovery in database (KDD) techniques in cost overrun of construction projects. International Journal of Construction Management, 22(9), 1632-1646.

- Glover, F., Laguna, M., & Martí, R. (2003). Scatter search. Advances in evolutionary computing: theory and applications, 519-537.

- Hancock, J. T., Wang, H., Khoshgoftaar, T. M., & Liang, Q. (2024). Data reduction techniques for highly imbalanced medicare Big Data. Journal of Big Data, 11(1), 8.

- Hussain, K., Neggaz, N., Zhu, W., & Houssein, E. H. (2021). An efficient hybrid sine-cosine Harris hawks optimization for low and high-dimensional feature selection. Expert Systems with Applications, 176, 114778.

- Jaddi, N. S., & Abdullah, S. (2020). Global search in single-solution-based metaheuristics. Data Technologies and Applications, 54(3), 275–296. https://doi.org/10.1108/dta-07-2019-0115

- Kaur, N., Singla, J., Mathur, G., Talwani, S., & Malik, N. (2023, November). An Advanced Feature Selection Approach to Improve Intrusion Detection System using Machine Learning. In 2023 7th International Conference on Electronics, Communication and Aerospace Technology (ICECA) (pp. 984-992). IEEE.

- KILIC, U., ESSIZ, E. S., & KELES, M. K. (2023). Binary anarchic society optimization for feature selection. Romanian J. Inf. Sci. Technol, 26, 351-364.‏

- López, F. G., Torres, M. G., Batista, B. M., Pérez, J. A. M., & Moreno-Vega, J. M. (2006). Solving feature subset selection problem by a parallel scatter search. European Journal of Operational Research, 169(2), 477-489.

- Lyu, Y., Feng, Y., & Sakurai, K. (2023). A survey on feature selection techniques based on filtering methods for cyber-attack detection. Information, 14(3), 191.

- Mullins, I. M., Siadaty, M. S., Lyman, J., Scully, K., Garrett, C. T., Miller, W. G., ... & Knaus, W. A. (2006). Data mining and clinical data repositories: Insights from a 667,000 patient data set. Computers in Biology and Medicine, 36(12), 1351-1377.

- Nadimi-Shahraki, M. H., Banaie-Dezfouli, M., Zamani, H., Taghian, S., & Mirjalili, S. (2021). B-MFO: a binary moth-flame optimization for feature selection from medical datasets. Computers, 10(11), 136. ‏

- Palanisamy, S., & Kanmani, S. (2012). Artificial bee colony approach for optimizing feature selection. International Journal of Computer Science Issues (IJCSI), 9(3), 432.

- Probst, D., & Reymond, J. L. (2020). Visualization of very large high-dimensional data sets as minimum spanning trees. Journal of Cheminformatics, 12(1), 12.

- Quinlan, J. R. (1996). Improved use of continuous attributes in C4. 5. Journal of artificial intelligence research, 4, 77-90.

- Sharma, M., & Kaur, P. (2021). A comprehensive analysis of nature-inspired meta-heuristic techniques for feature selection problem. Archives of Computational Methods in Engineering, 28, 1103-1127.

- Shu, X., & Ye, Y. (2023). Knowledge Discovery: Methods from data mining and machine learning. Social Science Research, 110, 102817.

- Sugumaran, V., Muralidharan, V., & Ramachandran, K. I. (2007). Feature selection using decision tree and classification through proximal support vector machine for fault diagnostics of roller bearing. Mechanical systems and signal processing, 21(2), 930-942.

- Tan, P.-N., Steinbach, M., & Pearson, V. (2006). Introduction to Data Mining.WP CO.

- Vandana, C. P., & Chikkamannur, A. A. (2021). Feature selection: An empirical study. International Journal of Engineering Trends and Technology, 69(2), 165-170.

- Varma, K., Ajmire, P. E., & Rehapande, A. B. (2022). A REVIEW OF DIMENSIONALITY REDUCTION TECHNIQUES FOR HIGH DIMENSIONAL DATA. Journal of the Oriental Institute, 71(4).

- Velliangiri, S., & Alagumuthukrishnan, S. J. P. C. S. (2019). A review of dimensionality reduction techniques for efficient computation. Procedia Computer Science, 165, 104-111.

- Wang, J., Hedar, A. R., Wang, S., & Ma, J. (2012). Rough set and scatter search metaheuristic based feature selection for credit scoring. Expert Systems with Applications, 39(6), 6123-6128.

- Wang, J., Hedar, A. R., Zheng, G., & Wang, S. (2009, April). Scatter search for rough set attribute reduction. In 2009 International Joint Conference on Computational Sciences and Optimization (Vol. 1, pp. 531-535). IEEE.

- WEKA: A Java Machine Learning Package, https://ml.cms.waikato.ac.nz/weka/

- Zebari, R., Abdulazeez, A., Zeebaree, D., Zebari, D., & Saeed, J. (2020). A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction. Journal of Applied Science and Technology Trends, 1(1), 56-70.

Downloads

Published

2025-06-02

How to Cite

Issa, M. I. (2025). Feature Selection for Serving Medical Datasets Applying Heuristic Algorithms (Scatter Search within Decision Tree Classifier). Palestinian Journal of Technology and Applied Sciences (PJTAS), 1(8). https://doi.org/10.33977/2106-000-008-002

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.