Optimizing Support Vector Machine Classification Based on Semantic-Text Knowledge Enrichment

Mr. Shadi Diab, Mr. Nasim Kamal Hamaydeh

Abstract


In this research, we enhanced the performance of Support Vector Machine (SVM) in text classification by applying semantic-knowledge enrichment. We propose using semantic-knowledge enrichment scheme to inject new concepts into the original contents of the text documents. A pre-processing technique is proposed for cleaning and extracting features for generating semantic concepts through using WordNet database and the open source Natural Language Toolkit (NLTK). Additionally, the combined online variation Bayes algorithm and the Latent Dirichlet Allocation model are used as a dimensionality reduction technique to generate abstract concepts from the raw text. In our experiment, we clarified the process of preparing data for cleaning, transformation and weighting the features vectors in a multi-dimensional space as a step to measure the performance metrics of SVM, before and after applying our proposed approach on two different datasets. K-Fold Cross-Validation technique is used to validate our proposed approach. Moreover, a confusion matrix is implemented to measure the accuracy and macro-averages of precision, recall and f1 measurements. The result of the evaluation showed improvements in term of accuracy from 94% to 98.3% for the dataset-1, and from 88% to 93% for dataset-2. Moreover, the training time of the classifier in terms of seconds was reduced to 32% and 17% for dataset-1 and dataset-2 respectively, in comparison with the training time of the original data before applying our proposed enrichment scheme.

 

DOI

Keywords


Support Vector Machine; Semantic Enrichment; Text Classification; Latent Dirichlet Allocation; Semantic Concepts Extraction.

Full Text:

PDF

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.