Programmers Place: December 2016

Program for detecting language of text from pdf in python using NLTK:

In this code I have taken a dummy pdf and extracted the content and tried to detect the language of the text. Since I have used the library called as nltk. I have used the approach of stopwords.

Bascially stopwords are the words in every language which removing form the text will not effect the meaning of the text. So stopwords are the , but ,etc for english. I have used the similar approach for this. Please go through the below github link for downloading or viewing the code , instruction for the usage is also written in read.me file.

Github link for code

Thanks for reading it will keep posting interesting things.

Programmers Place

Wednesday, 21 December 2016

Program for detecting language of text from pdf in python using NLTK:

Blog Archive