logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

mlpack_logistic_regression - l2-regularized logistic regression and prediction

Additional Information

       For  further  information,  including  relevant  papers, citations, and theory, consult the documentation
       found at http://www.mlpack.org or included with your distribution of mlpack.

mlpack-4.5.1                                     29 January 2025                   mlpack_logistic_regression(1)

Description

       An  implementation  of  L2-regularized  logistic  regression  using  either  the  L-BFGS optimizer or SGD
       (stochastic gradient descent). This solves the regression problem

         y = (1 / 1 + e^-(X * b)).

       In this setting, y corresponds to class labels and X corresponds to data.

       This program allows loading a logistic regression model (via the ’--input_model_file (-m)' parameter)  or
       training  a  logistic  regression  model  given  training data (specified with the '--training_file (-t)'
       parameter), or both those things at once. In addition, this  program  allows  classification  on  a  test
       dataset  (specified  with  the  '--test_file (-T)' parameter) and the classification results may be saved
       with the '--predictions_file (-P)' output parameter. The trained logistic regression model may  be  saved
       using the ’--output_model_file (-M)' output parameter.

       The  training  data,  if  specified,  may  have  class  labels  as  its last dimension.  Alternately, the
       '--labels_file (-l)' parameter may be used to specify a separate matrix of labels.

       When a model is being trained, there are many options. L2 regularization (to prevent overfitting) can  be
       specified  with  the  '--lambda  (-L)' option, and the optimizer used to train the model can be specified
       with the '--optimizer (-O)' parameter. Available options are  'sgd'  (stochastic  gradient  descent)  and
       ’lbfgs'   (the   L-BFGS   optimizer).   There   are  also  various  parameters  for  the  optimizer;  the
       '--max_iterations  (-n)'  parameter  specifies  the  maximum  number  of  allowed  iterations,  and   the
       '--tolerance  (-e)'  parameter  specifies  the  tolerance  for  convergence.  For  the SGD optimizer, the
       '--step_size (-s)' parameter controls the step size taken at each iteration by the optimizer.  The  batch
       size  for  SGD  is controlled with the '--batch_size (-b)' parameter.  If the objective function for your
       data is oscillating between Inf and 0, the step size is probably too large. There are more parameters for
       the optimizers, but the C++ interface must be used to access these.

       For SGD, an iteration refers to a single point. So to take a single  pass  over  the  dataset  with  SGD,
       '--max_iterations (-n)' should be set to the number of points in the dataset.

       Optionally,  the  model  can  be  used  to  predict  the  responses for another matrix of data points, if
       '--test_file (-T)'  is  specified.  The  '--test_file  (-T)'  parameter  can  be  specified  without  the
       '--training_file  (-t)'  parameter,  so  long  as an existing logistic regression model is given with the
       ’--input_model_file (-m)' parameter. The output predictions from the logistic  regression  model  may  be
       saved with the '--predictions_file (-P)' parameter.

       This implementation of logistic regression does not support the general multi-class case but instead only
       the  two-class  case.  Any  labels  must  be  either 0 or 1. For more classes, see the softmax regression
       implementation.

       As an example, to train a logistic regression model on the data ''data.csv'' with  labels  ''labels.csv''
       with L2 regularization of 0.1, saving the model to ’'lr_model.bin'', the following command may be used:

       $   mlpack_logistic_regression--training_file   data.csv   --labels_file   labels.csv   --lambda   0.1
       --output_model_file lr_model.bin --print_training_accuracy

       Then, to use that model to predict classes for the dataset ''test.csv'', storing the  output  predictions
       in ''predictions.csv'', the following command may be used:

       $  mlpack_logistic_regression--input_model_file  lr_model.bin  --test_file  test.csv --predictions_file
       predictions.csv

Name

mlpack_logistic_regression - l2-regularized logistic regression and prediction

Optional Input Options

--batch_size(-b)[int]
              Batch size for SGD. Default value 64.

       --decision_boundary(-d)[double]
              Decision boundary for prediction; if the logistic function for a point is less than the  boundary,
              the class is taken to be 0; otherwise, the class is 1. Default value 0.5.

       --help(-h)[bool]
              Default help info.

       --info[string]
              Print help on a specific option. Default value ''.

       --input_model_file(-m)[unknown]
              Existing model (parameters).  --labels_file (-l) [unknown] A matrix containing labels (0 or 1) for
              the points in the training set (y).

       --lambda(-L)[double]
              L2-regularization parameter for training.  Default value 0.

       --max_iterations(-n)[int]
              Maximum iterations for optimizer (0 indicates no limit). Default value 10000.

       --optimizer(-O)[string]
              Optimizer to use for training ('lbfgs' or 'sgd'). Default value 'lbfgs'.

       --print_training_accuracy(-a)[bool]
              If  set,  then the accuracy of the model on the training set will be printed (verbose must also be
              specified).

       --step_size(-s)[double]
              Step size for SGD optimizer. Default value 0.01.

       --test_file(-T)[unknown]
              Matrix containing test dataset.

       --tolerance(-e)[double]
              Convergence tolerance for optimizer. Default value 1e-10.

       --training_file(-t)[unknown]
              A matrix containing the training set (the matrix of predictors, X).

       --verbose(-v)[bool]
              Display informational messages and the full list of parameters and timers at the end of execution.

       --version(-V)[bool]
              Display the version of mlpack.

Optional Output Options

--output_model_file(-M)[unknown]
              Output for trained logistic regression model.

       --predictions_file(-P)[unknown]
              If test data is specified, this matrix is where the predictions for the test set will be saved.

       --probabilities_file(-p)[unknown]
              If test data is specified, this matrix is where the class probabilities for the test set  will  be
              saved.

Synopsis

mlpack_logistic_regression [-bint] [-ddouble] [-munknown] [-lunknown] [-Ldouble] [-nint] [-Ostring] [-abool] [-sdouble] [-Tunknown] [-edouble] [-tunknown] [-Vbool] [-Munknown] [-Punknown] [-punknown] [-h-v]

See Also