mlpack_dbscan - dbscan clustering
Contents
Additional Information
For further information, including relevant papers, citations, and theory, consult the documentation
found at http://www.mlpack.org or included with your distribution of mlpack.
mlpack-4.5.1 29 January 2025 mlpack_dbscan(1)
Description
This program implements the DBSCAN algorithm for clustering using accelerated tree-based range search.
The type of tree that is used may be parameterized, or brute-force range search may also be used.
The input dataset to be clustered may be specified with the '--input_file (-i)' parameter; the radius of
each range search may be specified with the ’--epsilon (-e)' parameters, and the minimum number of points
in a cluster may be specified with the '--min_size (-m)' parameter.
The '--assignments_file (-a)' and '--centroids_file (-C)' output parameters may be used to save the
output of the clustering. '--assignments_file (-a)' contains the cluster assignments of each point, and
'--centroids_file (-C)' contains the centroids of each cluster.
The range search may be controlled with the '--tree_type (-t)', '--single_mode (-S)', and '--naive (-N)'
parameters. '--tree_type (-t)' can control the type of tree used for range search; this can take a
variety of values: 'kd', 'r', ’r-star', 'x', 'hilbert-r', 'r-plus', 'r-plus-plus', 'cover', 'ball'. The
’--single_mode (-S)' parameter will force single-tree search (as opposed to the default dual-tree
search), and ''--naive (-N)' will force brute-force range search.
An example usage to run DBSCAN on the dataset in 'input.csv' with a radius of 0.5 and a minimum cluster
size of 5 is given below:
$ mlpack_dbscan--input_file input.csv --epsilon 0.5 --min_size 5
Name
mlpack_dbscan - dbscan clustering
Optional Input Options
--epsilon(-e)[double]
Radius of each range search. Default value 1.
--help(-h)[bool]
Default help info.
--info[string]
Print help on a specific option. Default value ''.
--min_size(-m)[int]
Minimum number of points for a cluster. Default value 5.
--naive(-N)[bool]
If set, brute-force range search (not tree-based) will be used.
--selection_type(-s)[string]
If using point selection policy, the type of selection to use ('ordered', 'random'). Default value
'ordered'.
--single_mode(-S)[bool]
If set, single-tree range search (not dual-tree) will be used.
--tree_type(-t)[string]
If using single-tree or dual-tree search, the type of tree to use ('kd', 'r', 'r-star', 'x',
'hilbert-r', 'r-plus', 'r-plus-plus', 'cover', 'ball'). Default value 'kd'.
--verbose(-v)[bool]
Display informational messages and the full list of parameters and timers at the end of execution.
--version(-V)[bool]
Display the version of mlpack.
Optional Output Options
--assignments_file(-a)[unknown]
Output matrix for assignments of each point.
--centroids_file(-C)[unknown]
Matrix to save output centroids to.
Required Input Options
--input_file(-i)[unknown]
Input dataset to cluster.
Synopsis
mlpack_dbscan-iunknown [-edouble] [-mint] [-Nbool] [-sstring] [-Sbool] [-tstring] [-Vbool] [-aunknown] [-Cunknown] [-h-v]
