Title: Extrinsic Plagiarism Detection in Text Combining Vector Space Model and Fuzzy Semantic Similarity Scheme
Abstract: The proposed work combines Vector Space Model with Fuzzy similarity measure to detect plagiarism cases in documents. For a given suspicious document the aim is to identify the set of source documents from which the suspicious document is copied. In the first step, all the documents need to be processed to perform tokenization, stop word removal, stemming, etc. In the next step, a subset of documents that may possibly be the sources of plagiarism need to be selected. Vector Space Model (VSM) can be used for this candidate selection. Similarity between a suspicious document and a source document can be computed using cosine similarity measure between the document vectors weighted by tf-idf scoring. Thirdly, a sentence-wise in-depth analysis using fuzzy semantic based approach to find the plagiarized parts in the suspicious documents. This can detect similar, yet not necessarily the same, statements based on the similarity degree between words in the statements and the fuzzy set. Adjacent sentences regarded as plagiarism are joined together, and the final plagiarism cases are reported. The similarity Index (SI) and overall similarity index (OSI) can be used to report the similarity between source document and suspicious document. The system that combines vector space model and fuzzy semantic similarity was evaluated in terms of precision, recall and F-measure and showed improved performance over pure vector space model for extrinsic plagiarism detection.
Publication Year: 2013
Publication Date: 2013-01-01
Language: en
Type: article
Access and Citation
Cited By Count: 5
AI Researcher Chatbot
Get quick answers to your questions about the article from our AI researcher chatbot