This program implements Large Margin Nearest Neighbors, a distance learning technique. The method seeks
to improve k-nearest-neighbor classification on a dataset. The method employes the strategy of reducing
distance between similar labeled data points (a.k.a target neighbors) and increasing distance between
differently labeled points (a.k.a impostors) using standard optimization techniques over the gradient of
the distance between data points.
To work, this algorithm needs labeled data. It can be given as the last row of the input dataset
(specified with '--input_file (-i)'), or alternatively as a separate matrix (specified with
'--labels_file (-l)'). Additionally, a starting point for optimization (specified with '--distance_file
(-d)'can be given, having (r x d) dimensionality. Here r should satisfy 1 <= r <= d, Consequently a Low-
Rank matrix will be optimized. Alternatively, Low-Rank distance can be learned by specifying the '--rank
(-A)'parameter (A Low-Rank matrix with uniformly distributed values will be used as initial learning
point).
The program also requires number of targets neighbors to work with ( specified with '--k (-k)'), A
regularization parameter can also be passed, It acts as a trade of between the pulling and pushing terms
(specified with ’--regularization (-r)'), In addition, this implementation of LMNN includes a parameter
to decide the interval after which impostors must be re-calculated (specified with '--update_interval
(-R)').
Output can either be the learned distance matrix (specified with ’--output_file (-o)'), or the
transformed dataset (specified with ’--transformed_data_file (-D)'), or both. Additionally mean-centered
dataset (specified with '--centered_data_file (-c)') can be accessed given mean-centering (specified with
'--center (-C)') is performed on the dataset. Accuracy on initial dataset and final transformed dataset
can be printed by specifying the '--print_accuracy (-P)'parameter.
This implementation of LMNN uses AdaGrad, BigBatch_SGD, stochastic gradient descent, mini-batch
stochastic gradient descent, or the L_BFGS optimizer.
AdaGrad, specified by the value 'adagrad' for the parameter '--optimizer (-O)', uses maximum of past
squared gradients. It primarily on six parameters: the step size (specified with '--step_size (-a)'), the
batch size (specified with '--batch_size (-b)'), the maximum number of passes (specified with ’--passes
(-p)'). Inaddition, a normalized starting point can be used by specifying the '--normalize (-N)'
parameter.
BigBatch_SGD, specified by the value 'bbsgd' for the parameter '--optimizer (-O)', depends primarily on
four parameters: the step size (specified with ’--step_size (-a)'), the batch size (specified with
'--batch_size (-b)'), the maximum number of passes (specified with '--passes (-p)'). In addition, a
normalized starting point can be used by specifying the '--normalize (-N)' parameter.
Stochastic gradient descent, specified by the value 'sgd' for the parameter ’--optimizer (-O)', depends
primarily on three parameters: the step size (specified with '--step_size (-a)'), the batch size
(specified with ’--batch_size (-b)'), and the maximum number of passes (specified with ’--passes (-p)').
In addition, a normalized starting point can be used by specifying the '--normalize (-N)' parameter.
Furthermore, mean-centering can be performed on the dataset by specifying the '--center (-C)'parameter.
The L-BFGS optimizer, specified by the value 'lbfgs' for the parameter ’--optimizer (-O)', uses a back-
tracking line search algorithm to minimize a function. The following parameters are used by L-BFGS:
'--max_iterations (-n)', '--tolerance (-t)'(the optimization is terminated when the gradient norm is
below this value). For more details on the L-BFGS optimizer, consult either the mlpack L-BFGS
documentation (in lbfgs.hpp) or the vast set of published literature on L-BFGS. In addition, a normalized
starting point can be used by specifying the '--normalize (-N)' parameter.
By default, the AMSGrad optimizer is used.
Example - Let's say we want to learn distance on iris dataset with number of targets as 3 using
BigBatch_SGD optimizer. A simple call for the same will look like:
$ mlpack_lmnn--input_file iris.csv --labels_file iris_labels.csv --k 3 --optimizer bbsgd --output_file
output.csv
Another program call making use of update interval & regularization parameter with dataset having labels
as last column can be made as:
$ mlpack_lmnn--input_file letter_recognition.csv --k 5 --update_interval 10 --regularization 0.4
--output_file output.csv