The ugrep-indexer utility recursively indexes files to accelerate recursive searching with the ug--indexPATTERN commands:
$ ugrep-indexer [-I] [-z]
...
$ ug--index [-I] [-z] [-r|-R] OPTIONSPATTERN
$ ugrep--index [-I] [-z] [-r|-R] OPTIONSPATTERN
where option -I or --ignore-binary ignores binary files, which is recommended to limit indexing storage
overhead and to reduce search time. Option -z or --decompress indexes and searches archives and
compressed files.
Indexing speeds up searching file systems that are large and cold (not recently cached in RAM) and file
systems that are generally slow to search. Note that indexing may not speed up searching few files or
recursively searching fast file systems.
Searching with ug--index is safe and never skips modified files that may match after indexing; the ug--indexPATTERN command always searches files and directories that were added or modified after indexing.
When option --stats is used with ug--index, a search report is produced showing the number of files
skipped not matching any indexes and the number of files and directories that were added or modified
after indexing. Note that searching with ug--index may significantly increase the start-up time when
complex regex patterns are specified that contain large Unicode character classes combined with `*' or
`+' repeats, which should be avoided.
ugrep-indexer stores a hidden index file in each directory indexed. The size of an index file depends on
the number of files indexed and the specified indexing accuracy. Higher accuracy produces larger index
files to improve search performance by reducing false positives (a false positive is a match prediction
for a file when the file does not match the regex pattern.)
ugrep-indexer accepts an optional PATH to the root of the directory tree to index. The default is to
index the working directory tree.
ugrep-indexer incrementally updates indexes. To force reindexing, specify option -f or --force. Indexes
are deleted with option -d or --delete.
ugrep-indexer may be stopped and restarted to continue indexing at any time. Incomplete index files do
not cause errors.
ASCII, UTF-8, UTF-16 and UTF-32 files are indexed and searched as text files unless their UTF encoding is
invalid. Files with other encodings are indexed as binary files and can be searched with non-Unicode
regex patterns using ug--index-U.
When ugrep-indexer option -I or --ignore-binary is specified, binary files are ignored and not indexed.
Avoid searching these non-indexed binary files with ug--index-I using option -I.
ugrep-indexer option -X or --ignore-files respects gitignore rules. Likewise, avoid searching non-
indexed ignored files with ug--index--ignore-files using option --ignore-files.
Archives and compressed files are indexed with ugrep-indexer option -z or --decompress. Otherwise,
archives and compressed files are indexed as binary files or are ignored with option -I or --ignore-binary. Note that once an archive or compressed file is indexed as a binary file, it will not be
reindexed with option -z to index the contents of the archive or compressed file. Only files that are
modified after indexing are reindexed, which is determined by comparing time stamps.
Symlinked files are indexed with ugrep-indexer option -S or --dereference-files. Symlinks to directories
are never followed.
To save a log file of the indexing process, specify option -v or --verbose and redirect standard output
to a log file. All messages and warnings are sent to standard output and captured by the log file.
A .ugrep-indexer configuration file with configuration options is loaded when present in the working
directory or in the home directory. A configuration option consists of the name of a long option and its
argument when applicable.
The following options are available:
-0, -1, -2, -3, ..., -9, --accuracy=DIGIT
Specifies indexing accuracy. A low accuracy reduces the indexing storage overhead at the cost of
a higher rate of false positive pattern matches (more noise). A high accuracy reduces the rate of
false positive regex pattern matches (less noise) at the cost of an increased indexing storage
overhead. An accuracy between 2 and 7 is recommended. The default accuracy is 4.
-., --hidden
Index hidden files and directories.
-?, --help
Display a help message and exit.
-c, --check
Recursively check and report indexes without reindexing files.
-d, --delete
Recursively remove index files.
-f, --force
Force reindexing of files, even those that are already indexed.
-I, --ignore-binary
Do not index binary files.
-q, --quiet, --silent
Quiet mode: do not display indexing statistics.
-S, --dereference-files
Follow symbolic links to files. Symbolic links to directories are never followed.
-s, --no-messages
Silent mode: nonexistent and unreadable files are ignored, i.e. their error messages and warnings
are suppressed.
-V, --version
Display version and exit.
-v, --verbose
Produce verbose output. Files are marked A for archive, C for compressed, and B for binary or I
for ignored binary. Deletions are marked D.
-X, --ignore-files, --ignore-files=FILE
Do not index files and directories matching the globs in FILE encountered during indexing. The
default FILE is `.gitignore'. This option may be repeated to specify additional files.
-z, --decompress
Index the contents of compressed files and archives. Hidden files in archives are ignored unless
option -. or --hidden is specified. Option -I or --ignore-binary ignores compressed binary files.
When used with option --zmax=NUM, indexes the contents of compressed files and archives stored
within archives up to NUM levels deep. Supported compression formats: gzip (.gz), compress (.Z),
zip, 7z, bzip2 (requires suffix .bz, .bz2, .bzip2, .tbz, .tbz2, .tb2, .tz2), lzma and xz (requires
suffix .lzma, .tlz, .xz, .txz), lz4 (requires suffix .lz4), zstd (requires suffix .zst, .zstd,
.tzst), brotli (requires suffix .br), bzip3 (requires suffix .bz3).
--zmax=NUM
When used with option -z (--decompress), indexes the contents of compressed files and archives
stored within archives by up to NUM expansion levels deep. The default --zmax=1 only permits
indexing uncompressed files stored in cpio, pax, tar, zip and 7z archives; compressed files and
archives are detected as binary files and are effectively ignored. Specify --zmax=2 to index
compressed files and archives stored in cpio, pax, tar, zip and 7z archives. NUM may range from 1
to 99 for up to 99 decompression and de-archiving steps. Increasing NUM values gradually degrades
performance.