timbl - Tilburg Memory Based Learner

Authors

       Ko van der Sloot Timbl@uvt.nl

       Antal van den Bosch Timbl@uvt.nl

Bugs

       possibly

Description

       TiMBL  is  an  open  source software package implementing several memory‐based learning algorithms, among
       which IB1‐IG, an implementation of k‐nearest neighbor classification with feature weighting suitable  for
       symbolic  feature spaces, and IGTree, a decision‐tree approximation of IB1‐IG. All implemented algorithms
       have in common that they store some representation of the  training  set  explicitly  in  memory.  During
       testing, new cases are classified by extrapolation from the most similar stored cases.

Name

       timbl - Tilburg Memory Based Learner

Options

-a <n> or -a <string>
              determines the classification algorithm.

              Possible values are:

              0 or IB
               the IB1 (k‐NN) algorithm (default)

              1 or IGTREE
               a decision‐tree‐based approximation of IB1

              2 or TRIBL
               a hybrid of IB1 and IGTREE

              3 or IB2
               an incremental editing version of IB1

              4 or TRIBL2
               a non‐parameteric version of TRIBL

       -b n
              number of lines used for bootstrapping (IB2 only)

       -B n
              number of bins used for discretization of numeric feature values (Default B=20)

       --Beam=<n>
              limit +v db output to n highest‐vote classes

       --clones=<n>
              number f threads to use for parallel testing

       -c n
              clipping frequency for prestoring MVDM matrices

       +D
              store  distributions  on  all  nodes  (necessary  for  using  +v db with IGTree, but wastes memory
              otherwise)

       --Diversify
              rescale weight (see docs)

       -d val
              weigh neighbors as function of their distance:
               Z      : equal weights to all (default)
               ID     : Inverse Distance
               IL     : Inverse Linear
               ED:a   : Exponential Decay with factor a (no whitespace!)
               ED:a:b : Exponential Decay with factor a and b (no whitespace!)

       -e n
              estimate time until n patterns tested

       -f file
              read from data file 'file' OR use filenames from 'file' for cross validation test

       -F format
              assume the specified input format (Compact, C4.5, ARFF, Columns, Binary, Sparse )

       -G normalization

              normalize distributions (+v db option only)

              Supported normalizations are:

              Probability or 0

              normalize between 0 and 1

              addFactor:<f> or 1:<f>

              add f to all possible targets, then normalize between 0 and 1  (default f=1.0).

              logProbability or 2

              Add 1 to the target Weight, take the 10Log and then normalize between 0 and 1

       +H or -H
              write hashed trees (default +H)

       -i file
              read the InstanceBase from 'file' (skips phase 1 & 2 )

       -I file
              dump the InstanceBase in 'file'

       -k n
              search 'n' nearest neighbors (default n = 1)

       -L n
              set value frequency threshold to back off from MVDM to Overlap at level n

       -l n
              fixed feature value length (Compact format only)

       -m string
              use feature metrics as specified in 'string':
               The format is : GlobalMetric:MetricRange:MetricRange
                         e.g.: mO:N3:I2,5-7

               C: cosine distance. (Global only. numeric features implied)
               D: dot product. (Global only. numeric features implied)
               DC: Dice coefficient
               O: weighted overlap (default)
               E: Euclidian distance
               L: Levenshtein distance
               M: modified value difference
               J: Jeffrey divergence
               S: Jensen‐Shannon divergence
               N: numeric values
               I: Ignore named  values

       --matrixin=file
              read ValueDifference Matrices from file 'file'

       --matrixout=file
              store ValueDifference Matrices in 'file'

       -n file
              create a C4.5-style names file 'file'

       -M n
              size of MaxBests Array

       -N n
              number of features (default 2500)

       -o s
              use s as output filename

       --occurrences=<value>
              The input file contains occurrence counts (at the last position) value can be one of: train , test
              or both-O path
              save output using 'path'

       -p n
              show progress every n lines (default p = 100,000)

       -P path
              read data using 'path'

       -q n
              set TRIBL threshold at level n

       -R n
              solve ties at random with seed n

       -s
              use the exemplar weights from the input file

       -s0
              ignore the exemplar weights from the input file

       -T n
              use feature n as the class label. (default: the last feature)

       -t file
              test using 'file'

       -t leave_one_out
              test with the leave‐one‐out testing regimen (IB1 only).  you may add --sloppy to speed  up  leave‐
              one‐out testing (but see docs)

       -t cross_validate
              perform cross‐validation test (IB1 only)

       -t @file
              test  using files and options described in 'file' Supported options: d e F k m o p q R t u v w x %
              -

       --Treeorder=value n
              ordering of the Tree:
               DO: none
               GRO: using GainRatio
               IGO: using InformationGain
               1/V: using 1/# of Values
               G/V: using GainRatio/# of Valuess
               I/V: using InfoGain/# of Valuess
               X2O: using X‐square
               X/V: using X‐square/# of Values
               SVO: using Shared Variance
               S/V: using Shared Variance/# of Values
               GxE: using GainRatio * SplitInfo
               IxE: using InformationGain * SplitInfo
               1/S: using 1/SplitInfo

       -u file
              read value‐class probabilities from 'file'

       -U file
              save value‐class probabilities in 'file'

       -V
              Show VERSION

       +v level or -v level
              set or unset verbosity level, where level is:

               s:  work silently
               o:  show all options set
               b:  show node/branch count and branching factor
               f:  show calculated feature weights (default)
               p:  show value difference matrices
               e:  show exact matches
               as: show advanced statistics (memory consuming)
               cm: show confusion matrix (implies +vas)
               cs: show per‐class statistics (implies +vas)
               cf: add confidence to output file (needs -G)
               di: add distance to output file
               db: add distribution of best matched to output file
               md: add matching depth to output file.
               k:  add a summary for all k neigbors to output file (sets -x)
               n:  add nearest neigbors to output file (sets -x)

                You may combine levels using '+' e.g. +v p+db or -v o+di

       -w n
              weighting
               0 or nw: no weighting
               1 or gr: weigh using gain ratio (default)
               2 or ig: weigh using information gain
               3 or x2: weigh using the chi‐square statistic
               4 or sv: weigh using the shared variance statistic
               5 or sd: weigh using standard deviation. (all features must be numeric)

       -w file
              read weights from 'file'

       -w file:n
              read weight n from 'file'

       -W file
              calculate and save all weights in 'file'

       +% or -%
              do or don't save test result (%) to file

       +x or -x
              do or don't use the exact match shortcut
                 (IB1 and IB2 only, default is -x)

       -X file
              dump the InstanceBase as XML in 'file'

Synopsis

       timbl [options]

       timbl -f data-file -t test‐file

timbl - Tilburg Memory Based Learner

Contents

Authors

Bugs

Description

Name

Options

See Also

Synopsis

See Also