pi's Bogofilter page

Links | Various Tests | Alternative Lexer | Training to Exhaustion

Best performance (most messages): 17 days with no error in 300 ham (18/d) and more than 9000 spam (529/d) messages with no change to the database. Using training-to-exhaustion.
Best performance (longest time): 28 day with no error in more than 500 ham (18/d) and 1776 spam (63/d) messages with no change to the database. Using training-to-exhaustion.

Various Tests

2003/07/31
Repeated training runs (training to exhaustion) and security margins are useful for train-on-error
2003/10/14
Comparing training methods from the FAQ
2003/10/24
As before with security margins for all train-on-error methods
2003/11/18
Comparing training methods with standard and my lexer compared
2003/12/02
Comparing standard and my lexer
2003/12/10
Security margins in training (on error and to exhaustion)
2003/12/10
Testing radically simplified definitions of TOKEN
2004/03/19
Importance of dot in TOKEN
2004/03/19
Importance of IP addresses in lexer
2004/03/24
Importance of dot in TOKEN, revisited

Alternative Lexer

allows tokens of any length, every character of a token is essentially alphanumeric or above 127. If you want to use this version of the lexer, just replace the original file src/lexer_v3.l in the bogofilter source and compile. It is strongly recommended to rebuild the database once you start using a different version of the lexer.

bogominitrain.pl (Training to Exhaustion with Bogofilter)


Valid CSS!Valid HTML 4.01!
© Boris 'pi' Piwinger, April 20, 2012