Unsupervised Machine Learning Method for Researchers’ Profiles Matching
DOI:
https://doi.org/10.33977/2106-000-005-005Keywords:
Researcher Profiles Matching, Unsupervised Machine Learning, Correlation-based Similarity, K-mean algorithm, Google Scholar.Abstract
Researcher Profiles Matching is an initial and important step of effective research teams’ formation. The researchers’ wide, multidisciplinary, and changeable research interests complicate the process of profile matching using traditional methods and affect its performance. This research aims to solve the problem of Profile matching in Scientific Research and Scholarly Work by employing unsupervised machine learning methods. The K-mean clustering method is utilized to categorize researcher profiles based on the statistical analysis of their publication titles, and the correlation-based similarity is employed for profile matching within the categories. The proposed method is implemented, tested, and evaluated using an extracted dataset from Google Scholar. The profile matching results and the clustering quality test result show that the designed task was achieved, in addition to high similarity values of publications within the categories and low correlation values among the clusters. Moreover, the clustering results’ analysis can reveal interesting and enlightening information about the scholarly work, which may help the researchers, research management departments, as well as policies and decision-makers in their scholarly work associated tasks.References
- Andrews, N. O., and Fox, E. A. (2007). Recent Developments in Document Clustering: Department of Computer Science, Virginia Polytechnic Institute & State ….
- Deelers, S., and Auwatanamongkol, S. J. I. J. o. C. S. (2007). Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning Along the Data Axis with the Highest Variance. 2(4): 247-252.
- Delua, J. (2021). Supervised Vs. Unsupervised Learning: What’s the Difference? Artificial intelligence Retrieved 05/09/2021, 2021, from https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning
- Erisoglu, M., Calis, N., and Sakallioglu, S. (2011). A New Algorithm for Initial Cluster Centers in K-Means Algorithm. Pattern Recognition Letters. 32(14): 1701-1705.
- Eze, B., Kuziemsky, C., and Peyton, L. (2020). A Configurable Identity Matching Algorithm for Community Care Management. Journal of Ambient Intelligence and Humanized Computing. 11(3): 1007-1020.
- Feldman, R., and Sanger, J. (2006). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge: Cambridge University Press.
- Franklin, J. (2005). The Elements of Statistical Learning: Data Mining, Inference and Prediction. The Mathematical Intelligencer. 27(2): 83-85.
- Garbade, M. J. (2018). Understanding K-Means Clustering in Machine Learning. Towards Data Science Retrieved 05/09/2021, 2021, from https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1
- Garcia, P. E. (2016). Hybrid Algorithm for Matching Profiles and Social Networks.
- Jain, A. K., Murty, M. N., and Flynn, P. J. J. A. c. s. (1999). Data Clustering: A Review. 31(3): 264-323.
- Li, S., Lv, X., Wang, T., and Shi, S. (2010). The Key Technology of Topic Detection Based on K-Means. 2010 International Conference on Future Information Technology and Management Engineering. 387-390.
- Li, Y., Peng, Y., Zhang, Z., Yin, H., and Xu, Q. (2019). Matching User Accounts across Social Networks Based on Username and Display Name. World Wide Web. 22(3): 1075-1097.
- Milojević, S. (2013). Accuracy of Simple, Initials-Based Methods for Author Name Disambiguation. Journal of Informetrics. 7(4): 767-773.
- Milojević, S. (2014). Principles of Scientific Research Team Formation and Evolution. Proceedings of the National Academy of Sciences. 111(11): 3984-3989.
- Nurgaliev, I., Qu, Q., Bamakan, S. M. H., and Muzammal, M. (2020). Matching User Identities across Social Networks with Limited Profile Data. Frontiers of Computer Science. 14(6): 146809.
- Paembonan, S., Manga, A. R., Jusmidah, Atmajaya, D., Waluyantari, A. V., Astuti, W., and Mansyur, S. H. (2018). Combination of K-Means and Profile Matching for Drag Substitution. 2018 2nd East Indonesia Conference on Computer and Information Technology (EIConCIT). 6-7 Nov. 2018. 180-183.
- Petrovic, S. (2006). A Comparison between the Silhouette Index and the Davies-Bouldin Index in Labelling Ids Clusters. Proceedings of the 11th Nordic Workshop of Secure IT Systems. 53-64.
- Pfitzner, D., Leibbrandt, R., and Powers, D. (2008). Characterization and Evaluation of Similarity Measures for Pairs of Clusterings. Knowledge and Information Systems. 19(3): 361.
- Pizzi, C., and Ukkonen, E. (2008). Fast Profile Matching Algorithms — a Survey. Theoretical Computer Science. 395(2): 137-157.
- Ray, S., and Turi, R. H. (1999). Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation. Proceedings of the 4th international conference on advances in pattern recognition and digital techniques. 137-143.
- Redmond, S. J., and Heneghan, C. (2007). A Method for Initialising the K-Means Clustering Algorithm Using Kd-Trees. Pattern Recognition Letters. 28(8): 965-973.
- Sabbah, T., Selamat, A., Selamat, M. H., Al-Anzi, F. S., Viedma, E. H., Krejcar, O., and Fujita, H. (2017). Modified Frequency-Based Term Weighting Schemes for Text Classification. Applied Soft Computing. 58: 193-206.
- Santos, R. S., Malheiros, S. M. F., Cavalheiro, S., and de Oliveira, J. M. P. (2013). A Data Mining System for Providing Analytical Information on Brain Tumors to Public Health Decision Makers. Computer Methods and Programs in Biomedicine. 109(3): 269-282.
- Sharma, S., and Gupta, V. J. I. J. o. C. A. (2012). Recent Developments in Text Clustering Techniques. 37(6): 14-19.
- Sugiarto, I., Diyasa, G. S. M., and Idhom, M. (2021). Profile Matching Algorithm in Determining the Position of Colleagues. Journal of Physics: Conference Series. 1844(1): 012026.
- Sun, C., Wan, Y., and Chen, Y. (2009). Dynamics of Research Team Formation in Complex Networks. Complex Sciences. 2009//. Berlin, Heidelberg. 2004-2015.
- Tran, N.-Y., Chan, E. K. J. C., and Libraries, R. (2020). Seeking and Finding Research Collaborators: An Exploratory Study of Librarian Motivations, Strategies, and Success Rates. 81(7): 1095.
- Wang, X., and Xu, Y. (2019). An Improved Index for Clustering Validation Based on Silhouette Index and Calinski-Harabasz Index. IOP Conference Series: Materials Science and Engineering. 569: 052024.
- Wassermann, B., and Zimmermann, G. (2011). User Profile Matching: A Statistical Approach. CENTRIC 2011, The fourth international conference on advances in human-oriented and personalized mechanisms, technologies, and services. 60-63.
- Wilcox, R. (2017). Comparing Two Groups. In: R. Wilcox (ed.). Introduction to Robust Estimation and Hypothesis Testing (Fourth Edition) (pp. 145-234): Academic Press.
- Yuan, C., and Yang, H. (2019). Research on K-Value Selection Method of K-Means Clustering Algorithm. 2(2): 226-235.
- Zhang, D., and Li, S. (2011). Topic Detection Based on K-Means. 2011 International Conference on Electronics, Communications and Control (ICECC). 2983-2985.
Downloads
Published
How to Cite
Issue
Section
License
- The editorial board confirms its commitment to the intellectual property rights
- Researchers also have to commit to the intellectual property rights.
- The research copyrights and publication are owned by the Journal once the researcher is notified about the approval of the paper. The scientific materials published or approved for publishing in the Journal should not be republished unless a written acknowledgment is obtained by the Deanship of Scientific Research.
- Research papers should not be published or republished unless a written acknowledgement is obtained from the Deanship of Scientific Research.
- The researcher has the right to accredit the research to himself, and to place his name on all the copies, editions and volumes published.
- The author has the right to request the accreditation of the published papers to himself.