logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

bogoutil - Dumps, loads, and maintains bogofilter database files

Author

       Gyepi Sam <gyepi@praxis-sw.com>.

       Matthias Andree <matthias.andree@gmx.de>.

       David Relson <relson@osagesoftware.com>.

       For updates, see thebogofilterprojectpage[1].

Data Format

       Bogoutil reads and writes text files where each nonblank line consists of a word, any amount of
       horizontal whitespace, a numeric word count, more whitespace, and (optionally) a date in form YYYYMMDD.
       Blank lines are skipped.

Description

       Bogoutil is part of the bogofilter Bayesian spam filter package.

       It is used to dump and load bogofilter's Berkeley DB databases to and from text files, perform database
       maintenance functions, and to display the values for specific words.

Environment Maintenance

       The --db-checkpointdir option causes bogoutil to flush the buffer caches and checkpoint the database
       environment.

       The --db-list-logfilesdir option causes bogoutil to list the log files in the environment. Zero or more
       keywords can be added or combined (separated by whitespace) to modify the behavior of this mode. The
       default behavior is to list only inactive log files with relative paths. You can add all to list all log
       files (inactive and active). You can add absolute to switch the listing to absolute paths.

       The --db-prunedir option causes bogoutil to checkpoint the database environment and remove inactive log
       files.

       The --db-recoverdir option runs a regular database recovery in the specified database directory. If that
       fails, it will retry with a (usually slower) catastrophic database recovery. If that fails, too, your
       database cannot be repaired and must be rebuilt from scratch. This is only supported when compiled with
       Berkeley DB support with transactions enabled. Trying recovery with QDBM or SQLite3 support will result
       in an error.

       The --db-recover-harderdir option runs a catastrophic data base recovery in the specified database
       directory. If that fails, your database cannot be repaired and must be rebuilt from scratch. This is only
       supported when compiled with Berkeley DB support with transactions enabled. Trying recovery with QDBM or
       SQLite3 support will result in an error.

       The --db-remove-environmentdirectory option has no short option equivalent. It runs recovery in the
       given directory and then removes the database environment. Use this before upgrading to a new Berkeley DB
       version if the new version to be installed requires a log file format update.

       The --db-print-leafpage-countfile option prints the number of leaf pages in the database file file as a
       decimal number, or UNKNOWN if the database does not support querying this figure.

       The --db-print-pagesizefile option prints the size of a database page in file as a decimal number, or
       UNKNOWN for databases with variable page size or databases that do not allow a query of the database page
       size.

       The --db-verifyfile option requests that bogofilter verifies the database file. It prints only errors,
       unless in verbose mode.

Name

       bogoutil - Dumps, loads, and maintains bogofilter database files

Notes

        1. the bogofilter project page
           http://bogofilter.sourceforge.net/

Bogofilter                                         05/19/2019                                        BOGOUTIL(1)

Options

       The -dfile option tells bogoutil to print the contents of the database file to stdout.

       The -Hfile option tells bogoutil to print a histogram of the database file to stdout. The output is
       similar to bogofilter -vv. Finally, hapaxes (tokens which were only seen once) and pure tokens (tokens
       which were encountered only in ham or only in spam) are counted.

       The -lfile option tells bogoutil to load the data from stdin into the database file. If the database
       file exists, stdin data is merged into the database file, with counts added up.

       The -m option tells bogoutil to perform maintenance functions on the specified database, i.e. discard
       tokens that are older than desired, have counts that are too small, or sizes (lengths) that are too long
       or too short.

       The -wfile option tells bogoutil to display token information from the database file. The option takes
       an argument, which is either the name of the wordlist (usually wordlist.db) or the name of the directory
       containing it. Tokens can be listed on the command line or piped to bogoutil. When there are extra
       arguments on the command line, bogoutil will use them as the tokens to lookup. If there are no extra
       arguments, bogoutil will read tokens from stdin.

       The -pfile option tells bogoutil to display the database information for one or more tokens. The display
       includes a probability column with the token's spam score (computed using bogofilter's default values).
       Option -p takes the same arguments as option -w .

       The -rfile option tells bogoutil to recalculate the ROBX value and print it as a six-digit fraction.

       The -Rfile option does the same as -r, but saves the result in the training database without printing
       it.

       The -Ifile option tells bogoutil to read its input from file rather than stdin.

       The -Ofile option tells bogoutil to write its output to file rather than stdout.

       The -v option produces verbose output on stderr. This option is primarily useful for debugging.

       The -C inhibits reading configuration files and lets bogoutil go with the defaults.

       The --config-filefile option tells bogoutil to read file instead of the standard configuration file.

       The -D redirects debug output to stdout (it usually goes to stderr).

       The -xflags option sets debugging flags.

       Option -n stands for "replace non-ascii characters". It will replace characters with the high bit (0x80)
       by question marks. This can be useful if a word list has lots of unreadable tokens, for example from
       Asian spam. The "bad" characters will be converted to question marks and matching tokens will be combined
       when used with -m or -l, but not with -d.

       Option -aage indicates an acceptable token age, with older ones being discarded. The age can be a date
       (in form YYYYMMMDD) or a day count, i.e. discard tokens older than age days.

       Option -cvalue indicates that tokens with counts less than or equal to value are to be discarded.

       Option -smin,max is used to discard tokens based on their size, i.e. length. All tokens shorter than min
       or longer than max will be discarded.

       Option -ydate is specifies the date to give to tokens that don't have dates. The format is YYYYMMDD.

       The -h option prints the help message and exits.

       The -V option prints the version number and exits.

Return Values

       0 for successful operation. 1 for most errors. 3 for I/O or other errors. Error 3 usually means that
       something is seriously wrong with the database files.

See Also

bogofilter(1), bogolexer(1), bogotune(1), bogoupgrade(1)

Synopsis

bogoutil {-h | -V}

       bogoutil [options] {-d file | -H file | -l file | -m file | -w file | -p file}

       bogoutil {-r file | -R file}

       bogoutil {--db-print-leafpage-count file | --db-print-pagesize file | --db-verify file |
                --db-checkpoint directory [flag...]  | --db-list-logfiles directory | --db-prune directory |
                --db-recover directory | --db-recover-harder directory | --db-remove-environment directory}

       where options is

       bogoutil [-v] [-n] [-C] [-D] [-a age] [-c count] [-s min,max] [-y date] [-I file] [-O file] [-x flags]
                [--config-file file]

See Also