Unsupervised Machine Learning Method for Researchers’ Profiles Matching

Thabit Sulaiman Sabbah


Researcher Profiles Matching is an initial and important step of effective research teams’ formation. The researchers’ wide, multidisciplinary, and changeable research interests complicate the process of profile matching using traditional methods and affect its performance. This research aims to solve the problem of Profile matching in Scientific Research and Scholarly Work by employing unsupervised machine learning methods. The K-mean clustering method is utilized to categorize researcher profiles based on the statistical analysis of their publication titles, and the correlation-based similarity is employed for profile matching within the categories. The proposed method is implemented, tested, and evaluated using an extracted dataset from Google Scholar. The profile matching results and the clustering quality test result show that the designed task was achieved, in addition to high similarity values of publications within the categories and low correlation values among the clusters. Moreover, the clustering results’ analysis can reveal interesting and enlightening information about the scholarly work, which may help the researchers, research management departments, as well as policies and decision-makers in their scholarly work associated tasks.


Researcher Profiles Matching, Unsupervised Machine Learning, Correlation-based Similarity, K-mean algorithm, Google Scholar.

Full Text:



- Andrews, N. O., and Fox, E. A. (2007). Recent Developments in Document Clustering: Department of Computer Science, Virginia Polytechnic Institute & State ….

- Deelers, S., and Auwatanamongkol, S. J. I. J. o. C. S. (2007). Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning Along the Data Axis with the Highest Variance. 2(4): 247-252.

- Delua, J. (2021). Supervised Vs. Unsupervised Learning: What’s the Difference? Artificial intelligence Retrieved 05/09/2021, 2021, from https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning

- Erisoglu, M., Calis, N., and Sakallioglu, S. (2011). A New Algorithm for Initial Cluster Centers in K-Means Algorithm. Pattern Recognition Letters. 32(14): 1701-1705.

- Eze, B., Kuziemsky, C., and Peyton, L. (2020). A Configurable Identity Matching Algorithm for Community Care Management. Journal of Ambient Intelligence and Humanized Computing. 11(3): 1007-1020.

- Feldman, R., and Sanger, J. (2006). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge: Cambridge University Press.

- Franklin, J. (2005). The Elements of Statistical Learning: Data Mining, Inference and Prediction. The Mathematical Intelligencer. 27(2): 83-85.

- Garbade, M. J. (2018). Understanding K-Means Clustering in Machine Learning. Towards Data Science Retrieved 05/09/2021, 2021, from https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1

- Garcia, P. E. (2016). Hybrid Algorithm for Matching Profiles and Social Networks.

- Jain, A. K., Murty, M. N., and Flynn, P. J. J. A. c. s. (1999). Data Clustering: A Review. 31(3): 264-323.

- Li, S., Lv, X., Wang, T., and Shi, S. (2010). The Key Technology of Topic Detection Based on K-Means. 2010 International Conference on Future Information Technology and Management Engineering. 387-390.

- Li, Y., Peng, Y., Zhang, Z., Yin, H., and Xu, Q. (2019). Matching User Accounts across Social Networks Based on Username and Display Name. World Wide Web. 22(3): 1075-1097.

- Milojević, S. (2013). Accuracy of Simple, Initials-Based Methods for Author Name Disambiguation. Journal of Informetrics. 7(4): 767-773.

- Milojević, S. (2014). Principles of Scientific Research Team Formation and Evolution. Proceedings of the National Academy of Sciences. 111(11): 3984-3989.

- Nurgaliev, I., Qu, Q., Bamakan, S. M. H., and Muzammal, M. (2020). Matching User Identities across Social Networks with Limited Profile Data. Frontiers of Computer Science. 14(6): 146809.

- Paembonan, S., Manga, A. R., Jusmidah, Atmajaya, D., Waluyantari, A. V., Astuti, W., and Mansyur, S. H. (2018). Combination of K-Means and Profile Matching for Drag Substitution. 2018 2nd East Indonesia Conference on Computer and Information Technology (EIConCIT). 6-7 Nov. 2018. 180-183.

- Petrovic, S. (2006). A Comparison between the Silhouette Index and the Davies-Bouldin Index in Labelling Ids Clusters. Proceedings of the 11th Nordic Workshop of Secure IT Systems. 53-64.

- Pfitzner, D., Leibbrandt, R., and Powers, D. (2008). Characterization and Evaluation of Similarity Measures for Pairs of Clusterings. Knowledge and Information Systems. 19(3): 361.

- Pizzi, C., and Ukkonen, E. (2008). Fast Profile Matching Algorithms — a Survey. Theoretical Computer Science. 395(2): 137-157.

- Ray, S., and Turi, R. H. (1999). Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation. Proceedings of the 4th international conference on advances in pattern recognition and digital techniques. 137-143.

- Redmond, S. J., and Heneghan, C. (2007). A Method for Initialising the K-Means Clustering Algorithm Using Kd-Trees. Pattern Recognition Letters. 28(8): 965-973.

- Sabbah, T., Selamat, A., Selamat, M. H., Al-Anzi, F. S., Viedma, E. H., Krejcar, O., and Fujita, H. (2017). Modified Frequency-Based Term Weighting Schemes for Text Classification. Applied Soft Computing. 58: 193-206.

- Santos, R. S., Malheiros, S. M. F., Cavalheiro, S., and de Oliveira, J. M. P. (2013). A Data Mining System for Providing Analytical Information on Brain Tumors to Public Health Decision Makers. Computer Methods and Programs in Biomedicine. 109(3): 269-282.

- Sharma, S., and Gupta, V. J. I. J. o. C. A. (2012). Recent Developments in Text Clustering Techniques. 37(6): 14-19.

- Sugiarto, I., Diyasa, G. S. M., and Idhom, M. (2021). Profile Matching Algorithm in Determining the Position of Colleagues. Journal of Physics: Conference Series. 1844(1): 012026.

- Sun, C., Wan, Y., and Chen, Y. (2009). Dynamics of Research Team Formation in Complex Networks. Complex Sciences. 2009//. Berlin, Heidelberg. 2004-2015.

- Tran, N.-Y., Chan, E. K. J. C., and Libraries, R. (2020). Seeking and Finding Research Collaborators: An Exploratory Study of Librarian Motivations, Strategies, and Success Rates. 81(7): 1095.

- Wang, X., and Xu, Y. (2019). An Improved Index for Clustering Validation Based on Silhouette Index and Calinski-Harabasz Index. IOP Conference Series: Materials Science and Engineering. 569: 052024.

- Wassermann, B., and Zimmermann, G. (2011). User Profile Matching: A Statistical Approach. CENTRIC 2011, The fourth international conference on advances in human-oriented and personalized mechanisms, technologies, and services. 60-63.

- Wilcox, R. (2017). Comparing Two Groups. In: R. Wilcox (ed.). Introduction to Robust Estimation and Hypothesis Testing (Fourth Edition) (pp. 145-234): Academic Press.

- Yuan, C., and Yang, H. (2019). Research on K-Value Selection Method of K-Means Clustering Algorithm. 2(2): 226-235.

- Zhang, D., and Li, S. (2011). Topic Detection Based on K-Means. 2011 International Conference on Electronics, Communications and Control (ICECC). 2983-2985.

DOI: http://dx.doi.org/10.33977/2106-000-005-005


  • There are currently no refbacks.

Copyright (c) 2022 Palestinian Journal of Technology and Applied Sciences (PJTAS)

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.