csvsql - csvsql Documentation
Contents
Copyright
2016, Christopher Groskopf and James McKinney
2.0.1 Jul 12, 2024 CSVSQL(1)
Description
Generate SQL statements for a CSV file or execute those statements directly on a database. In the latter
case supports both creating tables and inserting data:
usage: csvsql [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
[-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-L LOCALE]
[-S] [--blanks] [--null-value NULL_VALUES [NULL_VALUES ...]]
[--date-format DATE_FORMAT] [--datetime-format DATETIME_FORMAT]
[-H] [-K SKIP_LINES] [-v] [-l] [--zero] [-V]
[-i {firebird,mssql,mysql,oracle,postgresql,sqlite,sybase}]
[--db CONNECTION_STRING] [--query QUERIES] [--insert]
[--prefix PREFIX] [--before-insert BEFORE_INSERT]
[--after-insert AFTER_INSERT] [--tables TABLE_NAMES]
[--no-constraints] [--unique-constraint UNIQUE_CONSTRAINT]
[--no-create] [--create-if-not-exists] [--overwrite]
[--db-schema DB_SCHEMA] [-y SNIFF_LIMIT] [-I]
[--chunk-size CHUNK_SIZE]
[FILE [FILE ...]]
Generate SQL statements for one or more CSV files, or execute those statements
directly on a database, and execute one or more SQL queries.
positional arguments:
FILE The CSV file(s) to operate on. If omitted, will accept
input as piped data via STDIN.
optional arguments:
-h, --help show this help message and exit
-i {firebird,mssql,mysql,oracle,postgresql,sqlite,sybase,crate}, --dialect {firebird,mssql,mysql,oracle,postgresql,sqlite,sybase,crate}
Dialect of SQL to generate. Cannot be used with --db.
--db CONNECTION_STRING
If present, a SQLAlchemy connection string to use to
directly execute generated SQL on a database.
--query QUERY Execute one or more SQL queries delimited by ";" and
output the result of the last query as CSV. QUERY may
be a filename.
--insert Insert the data into the table. Requires --db.
--prefix PREFIX Add an expression following the INSERT keyword, like
OR IGNORE or OR REPLACE.
--before-insert BEFORE_INSERT
Execute SQL before the INSERT command. Requires
--insert.
--after-insert AFTER_INSERT
Execute SQL after the INSERT command. Requires
--insert.
--tables TABLE_NAMES A comma-separated list of names of tables to be
created. By default, the tables will be named after
the filenames without extensions or "stdin".
--no-constraints Generate a schema without length limits or null
checks. Useful when sampling big tables.
--unique-constraint UNIQUE_CONSTRAINT
A column-separated list of names of columns to include
in a UNIQUE constraint.
--no-create Skip creating the table. Requires --insert.
--create-if-not-exists
Create the table if it does not exist, otherwise keep
going. Requires --insert.
--overwrite Drop the table if it already exists. Requires
--insert. Cannot be used with --no-create.
--db-schema DB_SCHEMA
Optional name of database schema to create table(s)
in.
-y SNIFF_LIMIT, --snifflimit SNIFF_LIMIT
Limit CSV dialect sniffing to the specified number of
bytes. Specify "0" to disable sniffing.
-I, --no-inference Disable type inference when parsing the input.
--chunk-size CHUNK_SIZE
Chunk size for batch insert into the table. Requires
--insert.
--min-col-len MIN_COL_LEN
The minimum length of text columns.
--col-len-multiplier COL_LEN_MULTIPLIER
Multiply the maximum column length by this multiplier
to accomodate larger values in later runs.
See also: Argumentscommontoalltools.
For information on connection strings and supported dialects refer to the SQLAlchemydocumentation.
If you prefer not to enter your password in the connection string, store the password securely in a
PostgreSQLPasswordFile, a MySQLOptionsFile or similar files for other systems.
NOTE:
Using the --query option may cause rounding (in Python 2) or introduce Pythonfloatingpointissues
(in Python 3).
NOTE:
If the CSV file was created from a JSON file using in2csv, remember to quote SQL columns properly. For
example:
echo '{"a":{"b":"c"},"d":"e"}' | in2csv -f ndjson | csvsql --query 'SELECT "a/b" FROM stdin'
NOTE:
Alternatives to csvsql are q and textql.
Examples
GenerateSQLstatements
Generate a statement in the PostgreSQL dialect:
csvsql -i postgresql examples/realdata/FY09_EDU_Recipients_by_State.csv
InteractwithaSQLdatabase
Create a table and import data from the CSV directly into PostgreSQL:
createdb test
csvsql --db postgresql:///test --tables fy09 --insert examples/realdata/FY09_EDU_Recipients_by_State.csv
For large tables it may not be practical to process the entire table. One solution to this is to analyze
a sample of the table. In this case it can be useful to turn off length limits and null checks with the
--no-constraints option:
head -n 20 examples/realdata/FY09_EDU_Recipients_by_State.csv | csvsql --no-constraints --tables fy09
Create tables for an entire directory of CSVs and import data from those files directly into PostgreSQL:
createdb test
csvsql --db postgresql:///test --insert examples/*_converted.csv
If those CSVs have identical headers, you can import them into the same table by using csvstack first:
createdb test
csvstack examples/dummy?.csv | csvsql --db postgresql:///test --insert
QueryandoutputCSVfilesusingSQL
You can use csvsql to "directly" query one or more CSV files. Please note that this will create an
in-memory SQLite database, so it won't be very fast:
csvsql --query "SELECT m.usda_id, avg(i.sepal_length) AS mean_sepal_length FROM iris AS i JOIN irismeta AS m ON (i.species = m.species) GROUP BY m.species" examples/iris.csv examples/irismeta.csv
Group rows by one column:
csvsql --query "SELECT * FROM 'dummy3' GROUP BY a" examples/dummy3.csv
Concatenate two columns:
csvsql --query "SELECT a || b FROM 'dummy3'" --no-inference examples/dummy3.csv
If a column contains null values, you must COALESCE the column:
csvsql --query "SELECT a || COALESCE(b, '') FROM 'sort_ints_nulls'" --no-inference examples/sort_ints_nulls.csv
The UPDATE SQL statement produces no output. Remember to SELECT the columns and rows you want:
csvsql --query "UPDATE 'dummy3' SET a = 'foo'; SELECT * FROM 'dummy3'" examples/dummy3.csv
Name
csvsql - csvsql Documentation
