pi's Bogofilter page
Links | Various Tests | Alternative Lexer | Training to Exhaustion
Best performance (most messages): 17 days with no error in 300 ham (18/d) and more than 9000 spam (529/d) messages with no change to the database. Using training-to-exhaustion.
Best performance (longest time): 28 day with no error in more than 500 ham (18/d) and 1776 spam (63/d) messages with no change to the database. Using training-to-exhaustion.
Links
Various Tests
- 2003/07/31
- Repeated training runs (training to exhaustion) and security margins are useful for train-on-error
- 2003/10/14
- Comparing training methods from the FAQ
- 2003/10/24
- As before with security margins for all train-on-error methods
- 2003/11/18
- Comparing training methods with standard and my lexer compared
- 2003/12/02
- Comparing standard and my lexer
- 2003/12/10
- Security margins in training (on error and to exhaustion)
- 2003/12/10
- Testing radically simplified definitions of TOKEN
- 2004/03/19
- Importance of dot in TOKEN
- 2004/03/19
- Importance of IP addresses in lexer
- 2004/03/24
- Importance of dot in TOKEN, revisited
Alternative Lexer
allows tokens of any length, every character of a token is essentially alphanumeric or above 127. If you want to use this version of the lexer, just replace the original file src/lexer_v3.l in the bogofilter source and compile. It is strongly recommended to rebuild the database once you start using a different version of the lexer.
- pi's lexer (modifications to get closer to standard version, slight improvement for mail server IDs, other changes should only be cosmetic) last modified 2010-08-19, fits 1.2.2)
- pi's lexer (modifications to get closer to standard version, slight improvement for mail server IDs, other changes should only be cosmetic) last modified 2006-11-26, fits 1.1.1–1.1.7 (probably also down to 0.96.2)
bogominitrain.pl (Training to Exhaustion with Bogofilter)

© Boris 'pi' Piwinger,
April 20, 2012