Using pysimilar to compute similarity between texts
I recently wrote an article titled How to detect plagiarism in text using python where by I shown how you can easily detect the plagiarism between documents as title says manually using cosine similarity.
I republished that article on multiple platform including here on Hashnode and Hackernoon, and its one of my most viewed article plus most starred GitHub repository among articles repositories.
Which gave me a second thought to refactor the code/article to make it more easily and friendly to get started with even for absolutely beginners leading me to build a python library pysimilar which I can say simplify it to the maximum;
Getting started with Pysimilar
To get started with pysimilar for comparing text documents, you just need to install first of which you can either install directly from github or using pip.
Here how to install pysimilar using pip
$ pip install pysimilar
Here how to install directly from github
$ git clone https://github.com/Kalebu/pysimilar $ cd pysimilar $ pysimilar -> python setup.py install
With Pysimilar you can either compare text documents as strings or specify the path to the file containing the textual documents.
Comparing strings directly
You can easily compare strings using pysimilar using compare() method just as illustrated below;
from pysimilar import compare compare('very light indeed', 'how fast is light') 0.17077611319011649
Comparing strings contained files
To compare strings contained in the files, you just need to explicit specify the isfile parameter to True just as illustrated below;
'README.md', 'LICENSE', isfile=True) 0.25545580376557886compare(
Well that's all for this article
Here a link to Github Repository