Bayesian classifier and filtering
Reverend is a free Bayesian module for Python. Get it here
Here's how we train the classifier
1 2 from reverend.thomas import Bayes 3 g = Bayes() # guesser 4 g.train('french','La souris est rentrée dans son trou.') 5 g.train('english','my tailor is rich.') 6 g.train('french','Je ne sais pas si je viendrai demain.') 7 g.train('english','I do not plan to update my website soon.')
Then use it to guess the language
1 2 >>> print g.guess('Jumping out of cliffs it not a good idea.') 3 [('english', 0.99990000000000001), ('french', 9.9999999999988987e-005)] 4 # 99.99% English 5 6 >>> print g.guess('Demain il fera très probablement chaud.') 7 [('french', 0.99990000000000001), ('english', 9.9999999999988987e-005)] 8 # 99.99% French
You can train it with more languages. You can also train it
to classify the kind of text.