usage: tpot [-h] [-is INPUT_SEPARATOR] [-target TARGET_NAME]
[-mode {classification,regression}] [-o OUTPUT_FILE] [-g GENERATIONS] [-p POPULATION_SIZE] [-os
OFFSPRING_SIZE] [-mr MUTATION_RATE] [-xr CROSSOVER_RATE] [-scoring SCORING_FN] [-cv NUM_CV_FOLDS]
[-sub SUBSAMPLE] [-njobs NUM_JOBS] [-maxtime MAX_TIME_MINS] [-maxeval MAX_EVAL_MINS] [-s
RANDOM_STATE] [-config CONFIG_FILE] [-template TEMPLATE] [-memory MEMORY] [-cf CHECKPOINT_FOLDER]
[-es EARLY_STOP] [-v {0,1,2,3}] [-log LOG] [--version] INPUT_FILE
A Python tool that automatically creates and optimizes machine learning pipelines using genetic
programming.
positionalarguments:
INPUT_FILE
Data file to use in the TPOT optimization process. Ensure that the class label column is labeled
as "class".
optionalarguments:-h, --help
Show this help message and exit.
-is INPUT_SEPARATOR
Character used to separate columns in the input file.
-target TARGET_NAME
Name of the target column in the input file.
-mode {classification,regression}
Whether TPOT is being used for a supervised classification or regression problem.
-o OUTPUT_FILE
File to export the code for the final optimized pipeline.
-g GENERATIONS
Number of iterations to run the pipeline optimization process. It must be a positive number or
None. If None, the parameter max_time_mins must be defined as the runtime limit. Generally, TPOT
will work better when you give it more generations (and therefore time) to optimize the pipeline.
TPOT will evaluate POPULATION_SIZE + GENERATIONS x OFFSPRING_SIZE pipelines in total.
-p POPULATION_SIZE
Number of individuals to retain in the GP population every generation. Generally, TPOT will work
better when you give it more individuals (and therefore time) to optimize the pipeline. TPOT will
evaluate POPULATION_SIZE + GENERATIONS x OFFSPRING_SIZE pipelines in total.
-os OFFSPRING_SIZE
Number of offspring to produce in each GP generation. By default,OFFSPRING_SIZE =
POPULATION_SIZE.
-mr MUTATION_RATE
GP mutation rate in the range [0.0, 1.0]. This tells the GP algorithm how many pipelines to apply
random changes to every generation. We recommend using the default parameter unless you understand
how the mutation rate affects GP algorithms.
-xr CROSSOVER_RATE
GP crossover rate in the range [0.0, 1.0]. This tells the GP algorithm how many pipelines to
"breed" every generation. We recommend using the default parameter unless you understand how the
crossover rate affects GP algorithms.
-scoring SCORING_FN
Function used to evaluate the quality of a given pipeline for the problem. By default, accuracy is
used for classification problems and mean squared error (mse) is used for regression problems.
Note: If you wrote your own function, set this argument to mymodule.myfunctionand TPOT will import
your module and take the function from there.TPOT will assume the module can be imported from the
current workdir.TPOT assumes that any function with "error" or "loss" in the name is meant to be
minimized, whereas any other functions will be maximized. Offers the same options as
cross_val_score: accuracy, adjusted_rand_score, average_precision, f1, f1_macro, f1_micro,
f1_samples, f1_weighted, neg_log_loss, neg_mean_absolute_error, neg_mean_squared_error,
neg_median_absolute_error, precision, precision_macro, precision_micro, precision_samples,
precision_weighted, r2, recall, recall_macro, recall_micro, recall_samples, recall_weighted,
roc_auc
-cv NUM_CV_FOLDS
Number of folds to evaluate each pipeline over in stratified k-fold cross-validation during the
TPOT optimization process.
-sub SUBSAMPLE
Subsample ratio of the training instance. Setting it to 0.5 means that TPOT will use a random
subsample of half of training data for the pipeline optimization process.
-njobs NUM_JOBS
Number of CPUs for evaluating pipelines in parallel during the TPOT optimization process.
Assigning this to -1 will use as many cores as available on the computer. For n_jobs below -1,
(n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
-maxtime MAX_TIME_MINS
How many minutes TPOT has to optimize the pipeline. If not None, this setting will allow TPOT to
run until max_time_mins minutes elapsed and then stop. TPOT will stop earlier if generationsis set
and all generations are already evaluated.
-maxeval MAX_EVAL_MINS
How many minutes TPOT has to evaluate a single pipeline. Setting this parameter to higher values
will allow TPOT to explore more complex pipelines but will also allow TPOT to run longer.
-s RANDOM_STATE
Random number generator seed for reproducibility. Set this seed if you want your TPOT run to be
reproducible with the same seed and data set in the future.
-config CONFIG_FILE
Configuration file for customizing the operators and parameters that TPOT uses in the optimization
process. Must be a Python module containing a dict export named "tpot_config" or the name of
built-in configuration.
-template TEMPLATE
Template of predefined pipeline structure. The option is for specifying a desired structurefor the
machine learning pipeline evaluated in TPOT. So far this option only supportslinear pipeline
structure. Each step in the pipeline should be a main class of operators(Selector, Transformer,
Classifier or Regressor) or a specific operator(e.g. SelectPercentile) defined in TPOT operator
configuration. If one step is a main class,TPOT will randomly assign all subclass operators
(subclasses of SelectorMixin,TransformerMixin, ClassifierMixin or RegressorMixin in scikit-learn)
to that step.Steps in the template are delimited by "-", e.g.
"SelectPercentile-Transformer-Classifier".By default value of template is None, TPOT generates
tree-based pipeline randomly.
-memory MEMORY
Path of a directory for pipeline caching or "auto" for using a temporary caching directory during
the optimization process. If supplied, pipelines will cache each transformer after fitting them.
This feature is used to avoid repeated computation by transformers within a pipeline if the
parameters and input data are identical with another fitted pipeline during optimization process.
-cf CHECKPOINT_FOLDER
If supplied, a folder in which tpot will periodically save the best pipeline so far while
optimizing. This is useful in multiple cases: sudden death before tpot could save an optimized
pipeline, progress tracking, grabbing a pipeline while it's still optimizing etc.
-es EARLY_STOP
How many generations TPOT checks whether there is no improvement in optimization process. End
optimization process if there is no improvement in the set number of generations.
-v {0,1,2,3}
How much information TPOT communicates while it is running: 0 = none, 1 = minimal, 2 = high, 3 =
all. A setting of 2 or higher will add a progress bar during the optimization procedure.
-log LOG
Save progress content to a file
--version
Show the TPOT version number and exit.
tpot 0.11.7+dfsg January 2021 TPOT(1)