Unsupervised Machine Learning Method for Researchers’ Profiles Matching

Thabit Sulaiman Sabbah

doi:10.33977/2106-000-005-005

المؤلفون

Thabit Sulaiman Sabbah Faculty of Technology and Applied Sciences Al-Quds Open University, Palestine http://orcid.org/0000-0001-5770-7339

DOI:

https://doi.org/10.33977/2106-000-005-005

الكلمات المفتاحية:

مطابقة ملفات تعريف الباحثين، تعلم الآلة غير الخاضع للإشراف، التشابه المعتمد على الارتباط، خوارزمية ك-متوسطات، الباحث العلمي.

الملخص

مطابقة ملفات تعريف الباحثين هي خطوة أولية ومهمة لتشكيل الفرق البحثية الفعالة. إن الاهتمامات البحثية الواسعة ومتعددة التخصصات والمتغيرة للباحثين تُعَقِّد عملية مطابقة الملفات التعريفية باستخدام الأساليب التقليدية، وتؤثر على أدائها. يهدف هذا البحث إلى حل مشكلة مطابقة الملفات الشخصية في مجال البحث العملي، والعمل البحثي من خلال توظيف طرق تعلم الآلة غير الخاضعة للإشراف. واستخدمت طريقة التصنيف (ك-متوسطات) لتصنيف ملفات تعريف الباحثين اعتمادا على التحليل الإحصائي لعناوين أبحاثهم، ووظِف التشابه المبني على الارتباط لمطابقة ملفات التعريف ضمن الفئات. وتم بناء الطريقة المقترحة، وفحصها، ثم قُيِّمت باستخدام مجموعة بيانات مستخلصة من موقع الباحث العلمي ل(جوجل). وأظهرت نتائج مطابقة الملفات الشخصية، وفحص جودة التصنيف أن المهمة المصممة قد تمّ إنجازها، يضاف إلى ذلك ظهور قيم تشابه عالية للأبحاث داخل الفئة وقيم ارتباط متدنية بين الفئات. ويمكن لتحليل نتائج التصنيف أن تكشف معلومات مضيئةً ومهمةً حول العمل البحثي، والتي من شأنها أن تساعد الباحثين، ودوائر إدارة البحث، وصُناع السياسات والقرارات في مهامهم المرتبطة بالعمل البحثي.

السيرة الشخصية للمؤلف

Thabit Sulaiman Sabbah، Faculty of Technology and Applied Sciences Al-Quds Open University, Palestine

Assistant Professor, Computer Science

Faculty Member, Collage of Technology and Applied Sciences

Al Quds Open University

Palestine

المراجع

- Andrews, N. O., and Fox, E. A. (2007). Recent Developments in Document Clustering: Department of Computer Science, Virginia Polytechnic Institute & State ….

- Deelers, S., and Auwatanamongkol, S. J. I. J. o. C. S. (2007). Enhancing K-Means Algorithm with Initial Cluster Centers Derived from Data Partitioning Along the Data Axis with the Highest Variance. 2(4): 247-252.

- Delua, J. (2021). Supervised Vs. Unsupervised Learning: What’s the Difference? Artificial intelligence Retrieved 05/09/2021, 2021, from https://www.ibm.com/cloud/blog/supervised-vs-unsupervised-learning

- Erisoglu, M., Calis, N., and Sakallioglu, S. (2011). A New Algorithm for Initial Cluster Centers in K-Means Algorithm. Pattern Recognition Letters. 32(14): 1701-1705.

- Eze, B., Kuziemsky, C., and Peyton, L. (2020). A Configurable Identity Matching Algorithm for Community Care Management. Journal of Ambient Intelligence and Humanized Computing. 11(3): 1007-1020.

- Feldman, R., and Sanger, J. (2006). The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge: Cambridge University Press.

- Franklin, J. (2005). The Elements of Statistical Learning: Data Mining, Inference and Prediction. The Mathematical Intelligencer. 27(2): 83-85.

- Garbade, M. J. (2018). Understanding K-Means Clustering in Machine Learning. Towards Data Science Retrieved 05/09/2021, 2021, from https://towardsdatascience.com/understanding-k-means-clustering-in-machine-learning-6a6e67336aa1

- Garcia, P. E. (2016). Hybrid Algorithm for Matching Profiles and Social Networks.

- Jain, A. K., Murty, M. N., and Flynn, P. J. J. A. c. s. (1999). Data Clustering: A Review. 31(3): 264-323.

- Li, S., Lv, X., Wang, T., and Shi, S. (2010). The Key Technology of Topic Detection Based on K-Means. 2010 International Conference on Future Information Technology and Management Engineering. 387-390.

- Li, Y., Peng, Y., Zhang, Z., Yin, H., and Xu, Q. (2019). Matching User Accounts across Social Networks Based on Username and Display Name. World Wide Web. 22(3): 1075-1097.

- Milojević, S. (2013). Accuracy of Simple, Initials-Based Methods for Author Name Disambiguation. Journal of Informetrics. 7(4): 767-773.

- Milojević, S. (2014). Principles of Scientific Research Team Formation and Evolution. Proceedings of the National Academy of Sciences. 111(11): 3984-3989.

- Nurgaliev, I., Qu, Q., Bamakan, S. M. H., and Muzammal, M. (2020). Matching User Identities across Social Networks with Limited Profile Data. Frontiers of Computer Science. 14(6): 146809.

- Paembonan, S., Manga, A. R., Jusmidah, Atmajaya, D., Waluyantari, A. V., Astuti, W., and Mansyur, S. H. (2018). Combination of K-Means and Profile Matching for Drag Substitution. 2018 2nd East Indonesia Conference on Computer and Information Technology (EIConCIT). 6-7 Nov. 2018. 180-183.

- Petrovic, S. (2006). A Comparison between the Silhouette Index and the Davies-Bouldin Index in Labelling Ids Clusters. Proceedings of the 11th Nordic Workshop of Secure IT Systems. 53-64.

- Pfitzner, D., Leibbrandt, R., and Powers, D. (2008). Characterization and Evaluation of Similarity Measures for Pairs of Clusterings. Knowledge and Information Systems. 19(3): 361.

- Pizzi, C., and Ukkonen, E. (2008). Fast Profile Matching Algorithms — a Survey. Theoretical Computer Science. 395(2): 137-157.

- Ray, S., and Turi, R. H. (1999). Determination of Number of Clusters in K-Means Clustering and Application in Colour Image Segmentation. Proceedings of the 4th international conference on advances in pattern recognition and digital techniques. 137-143.

- Redmond, S. J., and Heneghan, C. (2007). A Method for Initialising the K-Means Clustering Algorithm Using Kd-Trees. Pattern Recognition Letters. 28(8): 965-973.

- Sabbah, T., Selamat, A., Selamat, M. H., Al-Anzi, F. S., Viedma, E. H., Krejcar, O., and Fujita, H. (2017). Modified Frequency-Based Term Weighting Schemes for Text Classification. Applied Soft Computing. 58: 193-206.

- Santos, R. S., Malheiros, S. M. F., Cavalheiro, S., and de Oliveira, J. M. P. (2013). A Data Mining System for Providing Analytical Information on Brain Tumors to Public Health Decision Makers. Computer Methods and Programs in Biomedicine. 109(3): 269-282.

- Sharma, S., and Gupta, V. J. I. J. o. C. A. (2012). Recent Developments in Text Clustering Techniques. 37(6): 14-19.

- Sugiarto, I., Diyasa, G. S. M., and Idhom, M. (2021). Profile Matching Algorithm in Determining the Position of Colleagues. Journal of Physics: Conference Series. 1844(1): 012026.

- Sun, C., Wan, Y., and Chen, Y. (2009). Dynamics of Research Team Formation in Complex Networks. Complex Sciences. 2009//. Berlin, Heidelberg. 2004-2015.

- Tran, N.-Y., Chan, E. K. J. C., and Libraries, R. (2020). Seeking and Finding Research Collaborators: An Exploratory Study of Librarian Motivations, Strategies, and Success Rates. 81(7): 1095.

- Wang, X., and Xu, Y. (2019). An Improved Index for Clustering Validation Based on Silhouette Index and Calinski-Harabasz Index. IOP Conference Series: Materials Science and Engineering. 569: 052024.

- Wassermann, B., and Zimmermann, G. (2011). User Profile Matching: A Statistical Approach. CENTRIC 2011, The fourth international conference on advances in human-oriented and personalized mechanisms, technologies, and services. 60-63.

- Wilcox, R. (2017). Comparing Two Groups. In: R. Wilcox (ed.). Introduction to Robust Estimation and Hypothesis Testing (Fourth Edition) (pp. 145-234): Academic Press.

- Yuan, C., and Yang, H. (2019). Research on K-Value Selection Method of K-Means Clustering Algorithm. 2(2): 226-235.

- Zhang, D., and Li, S. (2011). Topic Detection Based on K-Means. 2011 International Conference on Electronics, Communications and Control (ICECC). 2983-2985.

طريقة التعلم الآلي غير الخاضع للإشراف لمطابقة ملفات تعريف الباحثين

المؤلفون

DOI:

الكلمات المفتاحية:

الملخص

السيرة الشخصية للمؤلف

Thabit Sulaiman Sabbah، Faculty of Technology and Applied Sciences Al-Quds Open University, Palestine

المراجع

التنزيلات

منشور

كيفية الاقتباس

إصدار

القسم

الرخصة

الأعمال الأكثر قراءة لنفس المؤلف/المؤلفين

اللغة

Ruls

المعلومات

forms

moreinfo

support

indexing

location