Reports and fixes common errors in a CSV file.
Checks
• Reports rows that have a different number of columns than the header row, if the --length-mismatch
option is set.
• Reports columns that are empty, if the --empty-columns option is set.
TIP:
Enable all checks with --enable-all-checks (-a).
Fixes
• If a CSV has unquoted cells that contain line breaks, like:
id,address,country
1,1 Main St
Springfield,US
2,123 Acadia Avenue
London,GB
Use --join-short-rows to attempt to correct the errors by merging short rows into a single row:
id,address,country
1,"1 Main St
Springfield",US
2,"123 Acadia Avenue
London",GB
To change the string used to join the lines, use --separator. For example, with --separator",":
id,address,country
1,"1 Main St, Springfield",US
2,"123 Acadia Avenue, London",GB
• If a CSV has missing delimiters, like:
id,name,country
1,Alice
2,Bob,CA
You can add the missing delimiters with --fill-short-rows:
id,name,country
1,Alice,
2,Bob,CA
TIP:csvcut without options also adds missing delimiters!
To change the value used to fill short rows, use --fillvalue. For example, with --fillvalue"US":
id,name,country
1,Alice,US
2,Bob,CA
SEEALSO:--header-normalize-space under Usage.
NOTE:
Every csvkit tool does the following:
• Removes optional quote characters, unless the --quoting (-u) option is set to change this behavior
• Changes the field delimiter to a comma, if the input delimiter is set with the --delimiter (-d) or
--tabs (-t) options
• Changes the record delimiter to a line feed (LF or \n)
• Changes the quote character to a double-quotation mark, if the character is set with the --quotechar
(-q) option
• Changes the character encoding to UTF-8, if the input encoding is set with the --encoding (-e)
option
Outputcsvclean attempts to make the selected fixes. Then:
• If the --omit-error-rows option is set, only rows that pass the selected checks are written to standard
output. If not, all rows are written to standard output.
• If any checks are enabled, error rows along with line numbers and descriptions are written to standard
error. If there are error rows, the exit code is 1.
Usage
usage: csvclean [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
[-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-S] [-H]
[-K SKIP_LINES] [-v] [-l] [--zero] [-V]
[FILE]
Fix common errors in a CSV file.
positional arguments:
FILE The CSV file to operate on. If omitted, will accept
input as piped data via STDIN.
optional arguments:
-h, --help show this help message and exit
--length-mismatch Report data rows that are shorter or longer than the
header row.
--empty-columns Report empty columns as errors.
-a, --enable-all-checks
Enable all error reporting.
--omit-error-rows Omit data rows that contain errors, from standard
output.
--label LABEL Add a "label" column to standard error. Useful in
automated workflows.
--header-normalize-space
Strip leading and trailing whitespace and replace
sequences of whitespace characters by a single space
in the header.
--join-short-rows Merges short rows into a single row.
--separator SEPARATOR
The string with which to join short rows. Defaults to
a newline.
--fill-short-rows Fill short rows with the missing cells.
--fillvalue FILLVALUE
The value with which to fill short rows. Defaults to
none.
See also: Argumentscommontoalltools.