This tool accepts additional command-line arguments. Refer to the "SYNOPSIS" and usage information for
details.
--ask-pass
Prompt for a password when connecting to MySQL.
--attribute-aliases
type: array; default: db|Schema
List of attribute|alias,etc.
Certain attributes have multiple names, like db and Schema. If an event does not have the primary
attribute, pt-query-digest looks for an alias attribute. If it finds an alias, it creates the
primary attribute with the alias attribute's value and removes the alias attribute.
If the event has the primary attribute, all alias attributes are deleted.
This helps simplify event attributes so that, for example, there will not be report lines for both db
and Schema.
--attribute-value-limit
type: int; default: 0
A sanity limit for attribute values.
This option deals with bugs in slow logging functionality that causes large values for attributes.
If the attribute's value is bigger than this, the last-seen value for that class of query is used
instead. Disabled by default.
--charset
short form: -A; type: string
Default character set. If the value is utf8, sets Perl's binmode on STDOUT to utf8, passes the
mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any other
value sets binmode on STDOUT without the utf8 layer, and runs SET NAMES after connecting to MySQL.
--config
type: Array
Read this comma-separated list of config files; if specified, this must be the first option on the
command line.
--[no]continue-on-error
default: yes
Continue parsing even if there is an error. The tool will not continue forever: it stops once any
process causes 100 errors, in which case there is probably a bug in the tool or the input is invalid.
--[no]create-history-table
default: yes
Create the "--history" table if it does not exist.
This option causes the table specified by "--history" to be created with the default structure shown
in the documentation for "--history".
--[no]create-review-table
default: yes
Create the "--review" table if it does not exist.
This option causes the table specified by "--review" to be created with the default structure shown
in the documentation for "--review".
--daemonize
Fork to the background and detach from the shell. POSIX operating systems only.
--database
short form: -D; type: string
Connect to this database.
--defaults-file
short form: -F; type: string
Only read mysql options from the given file. You must give an absolute pathname.
--embedded-attributes
type: array
Two Perl regex patterns to capture pseudo-attributes embedded in queries.
Embedded attributes might be special attribute-value pairs that you've hidden in comments. The first
regex should match the entire set of attributes (in case there are multiple). The second regex
should match and capture attribute-value pairs from the first regex.
For example, suppose your query looks like the following:
SELECT * from users -- file: /login.php, line: 493;
You might run pt-query-digest with the following option:
pt-query-digest --embedded-attributes ' -- .*','(\w+): ([^\,]+)'
The first regular expression captures the whole comment:
" -- file: /login.php, line: 493;"
The second one splits it into attribute-value pairs and adds them to the event:
ATTRIBUTE VALUE
========= ==========
file /login.php
line 493
NOTE: All commas in the regex patterns must be escaped with \ otherwise the pattern will break.
--expected-range
type: array; default: 5,10
Explain items when there are more or fewer than expected.
Defines the number of items expected to be seen in the report given by "--[no]report", as controlled
by "--limit" and "--outliers". If there are more or fewer items in the report, each one will
explain why it was included.
--explain
type: DSN
Run EXPLAIN for the sample query with this DSN and print results.
This works only when "--group-by" includes fingerprint. It causes pt-query-digest to run EXPLAIN and
include the output into the report. For safety, queries that appear to have a subquery that EXPLAIN
will execute won't be EXPLAINed. Those are typically "derived table" queries of the form
select ... from ( select .... ) der;
The EXPLAIN results are printed as a full vertical format in the event report, which appears at the
end of each event report in vertical style ("\G") just like MySQL prints it.
--filter
type: string
Discard events for which this Perl code doesn't return true.
This option is a string of Perl code or a file containing Perl code that gets compiled into a
subroutine with one argument: $event. This is a hashref. If the given value is a readable file,
then pt-query-digest reads the entire file and uses its contents as the code. The file should not
contain a shebang (#!/usr/bin/perl) line.
If the code returns true, the chain of callbacks continues; otherwise it ends. The code is the last
statement in the subroutine other than "return $event". The subroutine template is:
sub { $event = shift; filter && return $event; }
Filters given on the command line are wrapped inside parentheses like like "( filter )". For
complex, multi-line filters, you must put the code inside a file so it will not be wrapped inside
parentheses. Either way, the filter must produce syntactically valid code given the template. For
example, an if-else branch given on the command line would not be valid:
--filter 'if () { } else { }' # WRONG
Since it's given on the command line, the if-else branch would be wrapped inside parentheses which is
not syntactically valid. So to accomplish something more complex like this would require putting the
code in a file, for example filter.txt:
my $event_ok; if (...) { $event_ok=1; } else { $event_ok=0; } $event_ok
Then specify "--filter filter.txt" to read the code from filter.txt.
If the filter code won't compile, pt-query-digest will die with an error. If the filter code does
compile, an error may still occur at runtime if the code tries to do something wrong (like pattern
match an undefined value). pt-query-digest does not provide any safeguards so code carefully!
An example filter that discards everything but SELECT statements:
--filter '$event->{arg} =~ m/^select/i'
This is compiled into a subroutine like the following:
sub { $event = shift; ( $event->{arg} =~ m/^select/i ) && return $event; }
It is permissible for the code to have side effects (to alter $event).
See "ATTRIBUTES REFERENCE" for a list of common and "--type" specific attributes.
Here are more examples of filter code:
Host/IP matches domain.com
--filter '($event->{host} || $event->{ip} || "") =~ m/domain.com/'
Sometimes MySQL logs the host where the IP is expected. Therefore, we check both.
User matches john
--filter '($event->{user} || "") =~ m/john/'
More than 1 warning
--filter '($event->{Warning_count} || 0) > 1'
Query does full table scan or full join
--filter '(($event->{Full_scan} || "") eq "Yes") || (($event->{Full_join} || "") eq "Yes")'
Query was not served from query cache
--filter '($event->{QC_Hit} || "") eq "No"'
Query is 1 MB or larger
--filter '$event->{bytes} >= 1_048_576'
Since "--filter" allows you to alter $event, you can use it to do other things, like create new
attributes. See "ATTRIBUTES" for an example.
--group-by
type: Array; default: fingerprint
Which attribute of the events to group by.
In general, you can group queries into classes based on any attribute of the query, such as "user" or
"db", which will by default show you which users and which databases get the most "Query_time". The
default attribute, "fingerprint", groups similar, abstracted queries into classes; see below and see
also "FINGERPRINTS".
A report is printed for each "--group-by" value (unless "--no-report" is given). Therefore,
"--group-by user,db" means "report on queries with the same user and report on queries with the same
db"; it does not mean "report on queries with the same user and db." See also "OUTPUT".
Every value must have a corresponding value in the same position in "--order-by". However, adding
values to "--group-by" will automatically add values to "--order-by", for your convenience.
There are several magical values that cause some extra data mining to happen before the grouping
takes place:
fingerprint
This causes events to be fingerprinted to abstract queries into a canonical form, which is then
used to group events together into a class. See "FINGERPRINTS" for more about fingerprinting.
tables
This causes events to be inspected for what appear to be tables, and then aggregated by that.
Note that a query that contains two or more tables will be counted as many times as there are
tables; so a join against two tables will count the Query_time against both tables.
distill
This is a sort of super-fingerprint that collapses queries down into a suggestion of what they
do, such as "INSERT SELECT table1 table2".
--help
Show help and exit.
--history
type: DSN
Save metrics for each query class in the given table. pt-query-digest saves query metrics (query
time, lock time, etc.) to this table so you can see how query classes change over time.
The default table is "percona_schema.query_history". Specify database (D) and table (t) DSN options
to override the default. The database and table are automatically created unless
"--no-create-history-table" is specified (see "--[no]create-history-table").
pt-query-digest inspects the columns in the table. The table must have at least the following
columns:
CREATE TABLE query_review_history (
checksum CHAR(32) NOT NULL,
sample TEXT NOT NULL
);
Any columns not mentioned above are inspected to see if they follow a certain naming convention. The
column is special if the name ends with an underscore followed by any of these values:
pct|avg|cnt|sum|min|max|pct_95|stddev|median|rank
If the column ends with one of those values, then the prefix is interpreted as the event attribute to
store in that column, and the suffix is interpreted as the metric to be stored. For example, a
column named "Query_time_min" will be used to store the minimum "Query_time" for the class of events.
The table should also have a primary key, but that is up to you, depending on how you want to store
the historical data. We suggest adding ts_min and ts_max columns and making them part of the primary
key along with the checksum. But you could also just add a ts_min column and make it a DATE type, so
you'd get one row per class of queries per day.
The following table definition is used for "--[no]create-history-table":
CREATE TABLE IF NOT EXISTS query_history (
checksum CHAR(32) NOT NULL,
sample TEXT NOT NULL,
ts_min DATETIME,
ts_max DATETIME,
ts_cnt FLOAT,
Query_time_sum FLOAT,
Query_time_min FLOAT,
Query_time_max FLOAT,
Query_time_pct_95 FLOAT,
Query_time_stddev FLOAT,
Query_time_median FLOAT,
Lock_time_sum FLOAT,
Lock_time_min FLOAT,
Lock_time_max FLOAT,
Lock_time_pct_95 FLOAT,
Lock_time_stddev FLOAT,
Lock_time_median FLOAT,
Rows_sent_sum FLOAT,
Rows_sent_min FLOAT,
Rows_sent_max FLOAT,
Rows_sent_pct_95 FLOAT,
Rows_sent_stddev FLOAT,
Rows_sent_median FLOAT,
Rows_examined_sum FLOAT,
Rows_examined_min FLOAT,
Rows_examined_max FLOAT,
Rows_examined_pct_95 FLOAT,
Rows_examined_stddev FLOAT,
Rows_examined_median FLOAT,
-- Percona extended slowlog attributes
-- http://www.percona.com/docs/wiki/patches:slow_extended
Rows_affected_sum FLOAT,
Rows_affected_min FLOAT,
Rows_affected_max FLOAT,
Rows_affected_pct_95 FLOAT,
Rows_affected_stddev FLOAT,
Rows_affected_median FLOAT,
Rows_read_sum FLOAT,
Rows_read_min FLOAT,
Rows_read_max FLOAT,
Rows_read_pct_95 FLOAT,
Rows_read_stddev FLOAT,
Rows_read_median FLOAT,
Merge_passes_sum FLOAT,
Merge_passes_min FLOAT,
Merge_passes_max FLOAT,
Merge_passes_pct_95 FLOAT,
Merge_passes_stddev FLOAT,
Merge_passes_median FLOAT,
InnoDB_IO_r_ops_min FLOAT,
InnoDB_IO_r_ops_max FLOAT,
InnoDB_IO_r_ops_pct_95 FLOAT,
InnoDB_IO_r_ops_stddev FLOAT,
InnoDB_IO_r_ops_median FLOAT,
InnoDB_IO_r_bytes_min FLOAT,
InnoDB_IO_r_bytes_max FLOAT,
InnoDB_IO_r_bytes_pct_95 FLOAT,
InnoDB_IO_r_bytes_stddev FLOAT,
InnoDB_IO_r_bytes_median FLOAT,
InnoDB_IO_r_wait_min FLOAT,
InnoDB_IO_r_wait_max FLOAT,
InnoDB_IO_r_wait_pct_95 FLOAT,
InnoDB_IO_r_wait_stddev FLOAT,
InnoDB_IO_r_wait_median FLOAT,
InnoDB_rec_lock_wait_min FLOAT,
InnoDB_rec_lock_wait_max FLOAT,
InnoDB_rec_lock_wait_pct_95 FLOAT,
InnoDB_rec_lock_wait_stddev FLOAT,
InnoDB_rec_lock_wait_median FLOAT,
InnoDB_queue_wait_min FLOAT,
InnoDB_queue_wait_max FLOAT,
InnoDB_queue_wait_pct_95 FLOAT,
InnoDB_queue_wait_stddev FLOAT,
InnoDB_queue_wait_median FLOAT,
InnoDB_pages_distinct_min FLOAT,
InnoDB_pages_distinct_max FLOAT,
InnoDB_pages_distinct_pct_95 FLOAT,
InnoDB_pages_distinct_stddev FLOAT,
InnoDB_pages_distinct_median FLOAT,
-- Boolean (Yes/No) attributes. Only the cnt and sum are needed
-- for these. cnt is how many times is attribute was recorded,
-- and sum is how many of those times the value was Yes. So
-- sum/cnt * 100 equals the percentage of recorded times that
-- the value was Yes.
QC_Hit_cnt FLOAT,
QC_Hit_sum FLOAT,
Full_scan_cnt FLOAT,
Full_scan_sum FLOAT,
Full_join_cnt FLOAT,
Full_join_sum FLOAT,
Tmp_table_cnt FLOAT,
Tmp_table_sum FLOAT,
Tmp_table_on_disk_cnt FLOAT,
Tmp_table_on_disk_sum FLOAT,
Filesort_cnt FLOAT,
Filesort_sum FLOAT,
Filesort_on_disk_cnt FLOAT,
Filesort_on_disk_sum FLOAT,
PRIMARY KEY(checksum, ts_min, ts_max)
);
Note that we store the count (cnt) for the ts attribute only; it will be redundant to store this for
other attributes.
Starting from Percona Toolkit 3.0.11, the checksum function has been updated to use 32 chars in the
MD5 sum. This causes the checksum field in the history table will have a different value than in the
previous versions of the tool.
--host
short form: -h; type: string
Connect to host.
--ignore-attributes
type: array; default: arg, cmd, insert_id, ip, port, Thread_id, timestamp, exptime, flags, key, res,
val, server_id, offset, end_log_pos, Xid
Do not aggregate these attributes. Some attributes are not query metrics but metadata which doesn't
need to be (or can't be) aggregated.
--inherit-attributes
type: array; default: db,ts
If missing, inherit these attributes from the last event that had them.
This option sets which attributes are inherited or carried forward to events which do not have them.
For example, if one event has the db attribute equal to "foo", but the next event doesn't have the db
attribute, then it inherits "foo" for its db attribute.
--interval
type: float; default: .1
How frequently to poll the processlist, in seconds.
--iterations
type: int; default: 1
How many times to iterate through the collect-and-report cycle. If 0, iterate to infinity. Each
iteration runs for "--run-time" amount of time. An iteration is usually determined by an amount of
time and a report is printed when that amount of time elapses. With "--run-time-mode" "interval", an
interval is instead determined by the interval time you specify with "--run-time". See "--run-time"
and "--run-time-mode" for more information.
--limit
type: Array; default: 95%:20
Limit output to the given percentage or count.
If the argument is an integer, report only the top N worst queries. If the argument is an integer
followed by the "%" sign, report that percentage of the worst queries. If the percentage is followed
by a colon and another integer, report the top percentage or the number specified by that integer,
whichever comes first.
The value is actually a comma-separated array of values, one for each item in "--group-by". If you
don't specify a value for any of those items, the default is the top 95%.
See also "--outliers".
--log
type: string
Print all output to this file when daemonized.
--max-hostname-length
type: int; default: 10
Trim host names in reports to this length. 0=Do not trim host names.
--max-line-length
type: int; default: 74
Trim lines to this length. 0=Do not trim lines.
--order-by
type: Array; default: Query_time:sum
Sort events by this attribute and aggregate function.
This is a comma-separated list of order-by expressions, one for each "--group-by" attribute. The
default "Query_time:sum" is used for "--group-by" attributes without explicitly given "--order-by"
attributes (that is, if you specify more "--group-by" attributes than corresponding "--order-by"
attributes). The syntax is "attribute:aggregate". See "ATTRIBUTES" for valid attributes. Valid
aggregates are:
Aggregate Meaning
========= ============================
sum Sum/total attribute value
min Minimum attribute value
max Maximum attribute value
cnt Frequency/count of the query
For example, the default "Query_time:sum" means that queries in the query analysis report will be
ordered (sorted) by their total query execution time ("Exec time"). "Query_time:max" orders the
queries by their maximum query execution time, so the query with the single largest "Query_time" will
be list first. "cnt" refers more to the frequency of the query as a whole, how often it appears;
"Count" is its corresponding line in the query analysis report. So any attribute and "cnt" should
yield the same report wherein queries are sorted by the number of times they appear.
When parsing general logs ("--type" "genlog"), the default "--order-by" becomes "Query_time:cnt".
General logs do not report query times so only the "cnt" aggregate makes sense because all query
times are zero.
If you specify an attribute that doesn't exist in the events, then pt-query-digest falls back to the
default "Query_time:sum" and prints a notice at the beginning of the report for each query class.
You can create attributes with "--filter" and order by them; see "ATTRIBUTES" for an example.
--outliers
type: array; default: Query_time:1:10
Report outliers by attribute:percentile:count.
The syntax of this option is a comma-separated list of colon-delimited strings. The first field is
the attribute by which an outlier is defined. The second is a number that is compared to the
attribute's 95th percentile. The third is optional, and is compared to the attribute's cnt
aggregate. Queries that pass this specification are added to the report, regardless of any limits
you specified in "--limit".
For example, to report queries whose 95th percentile Query_time is at least 60 seconds and which are
seen at least 5 times, use the following argument:
--outliers Query_time:60:5
You can specify an --outliers option for each value in "--group-by".
--output
type: string; default: report
How to format and print the query analysis results. Accepted values are:
VALUE FORMAT
======= ==============================
report Standard query analysis report
slowlog MySQL slow log
json JSON, on array per query class
json-anon JSON without example queries
secure-slowlog JSON without example queries
The entire "report" output can be disabled by specifying "--no-report" (see "--[no]report"), and its
sections can be disabled or rearranged by specifying "--report-format".
"json" output was introduced in 2.2.1 and is still in development, so the data structure may change
in future versions.
--password
short form: -p; type: string
Password to use when connecting. If password contains commas they must be escaped with a backslash:
"exam\,ple"
--pid
type: string
Create the given PID file. The tool won't start if the PID file already exists and the PID it
contains is different than the current PID. However, if the PID file exists and the PID it contains
is no longer running, the tool will overwrite the PID file with the current PID. The PID file is
removed automatically when the tool exits.
--port
short form: -P; type: int
Port number to use for connection.
--preserve-embedded-numbers
Preserve numbers in database/table names when fingerprinting queries. The standard fingeprint method
replaces numbers in db/tables names, making a query like 'SELECT * FROM db1.table2' to be
figerprinted as 'SELECT * FROM db?.table?'. This option changes that behaviour and the fingerprint
will become 'SELECT * FROM db1.table2'.
--processlist
type: DSN
Poll this DSN's processlist for queries, with "--interval" sleep between.
If the connection fails, pt-query-digest tries to reopen it once per second.
--progress
type: array; default: time,30
Print progress reports to STDERR. The value is a comma-separated list with two parts. The first
part can be percentage, time, or iterations; the second part specifies how often an update should be
printed, in percentage, seconds, or number of iterations.
--read-timeout
type: time; default: 0
Wait this long for an event from the input; 0 to wait forever.
This option sets the maximum time to wait for an event from the input. It applies to all types of
input except "--processlist". If an event is not received after the specified time, the script stops
reading the input and prints its reports. If "--iterations" is 0 or greater than 1, the next
iteration will begin, else the script will exit.
This option requires the Perl POSIX module.
--[no]report
default: yes
Print query analysis reports for each "--group-by" attribute. This is the standard slow log analysis
functionality. See "OUTPUT" for the description of what this does and what the results look like.
If you don't need a report (for example, when using "--review" or "--history"), it is best to specify
"--no-report" because this allows the tool to skip some expensive operations.
--report-all
Report all queries, even ones that have been reviewed. This only affects the "report" "--output"
when using "--review". Otherwise, all queries are always printed.
--report-format
type: Array; default: rusage,date,hostname,files,header,profile,query_report,prepared
Print these sections of the query analysis report.
SECTION PRINTS
============ ======================================================
rusage CPU times and memory usage reported by ps
date Current local date and time
hostname Hostname of machine on which pt-query-digest was run
files Input files read/parse
header Summary of the entire analysis run
profile Compact table of queries for an overview of the report
query_report Detailed information about each unique query
prepared Prepared statements
The sections are printed in the order specified. The rusage, date, files and header sections are
grouped together if specified together; other sections are separated by blank lines.
See "OUTPUT" for more information on the various parts of the query report.
--report-histogram
type: string; default: Query_time
Chart the distribution of this attribute's values.
The distribution chart is limited to time-based attributes, so charting "Rows_examined", for example,
will produce a useless chart. Charts look like:
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms ###########################
# 100ms ########################################################
# 1s ########
# 10s+
See "OUTPUT" for more information.
--resume
type: string
If specified, the tool writes the last file offset, if there is one, to the given filename. When ran
again with the same value for this option, the tool reads the last file offset from the file, seeks
to that position in the log, and resumes parsing events from that point onward.
--review
type: DSN
Save query classes for later review, and don't report already reviewed classes.
The default table is "percona_schema.query_review". Specify database (D) and table (t) DSN options
to override the default. The database and table are automatically created unless
"--no-create-review-table" is specified (see "--[no]create-review-table").
If the table was created manually, it must have at least the following columns. You can add more
columns for your own special purposes, but they won't be used by pt-query-digest.
CREATE TABLE IF NOT EXISTS query_review (
checksum CHAR(32) NOT NULL PRIMARY KEY,
fingerprint TEXT NOT NULL,
sample TEXT NOT NULL,
first_seen DATETIME,
last_seen DATETIME,
reviewed_by VARCHAR(20),
reviewed_on DATETIME,
comments TEXT
)
The columns are:
COLUMN MEANING
=========== ====================================================
checksum A 64-bit checksum of the query fingerprint
fingerprint The abstracted version of the query; its primary key
sample The query text of a sample of the class of queries
first_seen The smallest timestamp of this class of queries
last_seen The largest timestamp of this class of queries
reviewed_by Initially NULL; if set, query is skipped thereafter
reviewed_on Initially NULL; not assigned any special meaning
comments Initially NULL; not assigned any special meaning
Note that the "fingerprint" column is the true primary key for a class of queries. The "checksum" is
just a cryptographic hash of this value, which provides a shorter value that is very likely to also
be unique.
After parsing and aggregating events, your table should contain a row for each fingerprint. This
option depends on "--group-by fingerprint" (which is the default). It will not work otherwise.
--run-time
type: time
How long to run for each "--iterations". The default is to run forever (you can interrupt with CTRL-
C). Because "--iterations" defaults to 1, if you only specify "--run-time", pt-query-digest runs for
that amount of time and then exits. The two options are specified together to do collect-and-report
cycles. For example, specifying "--iterations" 4 "--run-time" "15m" with a continuous input (like
STDIN or "--processlist") will cause pt-query-digest to run for 1 hour (15 minutes x 4), reporting
four times, once at each 15 minute interval.
--run-time-mode
type: string; default: clock
Set what the value of "--run-time" operates on. Following are the possible values for this option:
clock
"--run-time" specifies an amount of real clock time during which the tool should run for each
"--iterations".
event
"--run-time" specifies an amount of log time. Log time is determined by timestamps in the log.
The first timestamp seen is remembered, and each timestamp after that is compared to the first to
determine how much log time has passed. For example, if the first timestamp seen is "12:00:00"
and the next is "12:01:30", that is 1 minute and 30 seconds of log time. The tool will read
events until the log time is greater than or equal to the specified "--run-time" value.
Since timestamps in logs are not always printed, or not always printed frequently, this mode
varies in accuracy.
interval
"--run-time" specifies interval boundaries of log time into which events are divided and reports
are generated. This mode is different from the others because it doesn't specify how long to
run. The value of "--run-time" must be an interval that divides evenly into minutes, hours or
days. For example, "5m" divides evenly into hours (60/5=12, so 12 5 minutes intervals per hour)
but "7m" does not (60/7=8.6).
Specifying "--run-time-mode interval --run-time 30m --iterations 0" is similar to specifying
"--run-time-mode clock --run-time 30m --iterations 0". In the latter case, pt-query-digest will
run forever, producing reports every 30 minutes, but this only works effectively with continuous
inputs like STDIN and the processlist. For fixed inputs, like log files, the former example
produces multiple reports by dividing the log into 30 minutes intervals based on timestamps.
Intervals are calculated from the zeroth second/minute/hour in which a timestamp occurs, not from
whatever time it specifies. For example, with 30 minute intervals and a timestamp of "12:10:30",
the interval is not "12:10:30" to "12:40:30", it is "12:00:00" to "12:29:59". Or, with 1 hour
intervals, it is "12:00:00" to "12:59:59". When a new timestamp exceeds the interval, a report
is printed, and the next interval is recalculated based on the new timestamp.
Since "--iterations" is 1 by default, you probably want to specify a new value else pt-query-
digest will only get and report on the first interval from the log since 1 interval = 1
iteration. If you want to get and report every interval in a log, specify "--iterations" 0.
--sample
type: int
Filter out all but the first N occurrences of each query. The queries are filtered on the first
value in "--group-by", so by default, this will filter by query fingerprint. For example, "--sample
2" will permit two sample queries for each fingerprint. Useful in conjunction with "--output
slowlog" to print the queries. You probably want to set "--no-report" to avoid the overhead of
aggregating and reporting if you're just using this to print out samples of queries. A complete
example:
pt-query-digest --sample 2 --no-report --output slowlog slow.log
--slave-user
type: string
Sets the user to be used to connect to the slaves. This parameter allows you to have a different
user with less privileges on the slaves but that user must exist on all slaves.
--slave-password
type: string
Sets the password to be used to connect to the slaves. It can be used with --slave-user and the
password for the user must be the same on all slaves.
--set-vars
type: Array
Set the MySQL variables in this comma-separated list of "variable=value" pairs.
By default, the tool sets:
wait_timeout=10000
Variables specified on the command line override these defaults. For example, specifying "--set-vars
wait_timeout=500" overrides the defaultvalue of 10000.
The tool prints a warning and continues if a variable cannot be set.
--show-all
type: Hash
Show all values for these attributes.
By default pt-query-digest only shows as many of an attribute's value that fit on a single line.
This option allows you to specify attributes for which all values will be shown (line width is
ignored). This only works for attributes with string values like user, host, db, etc. Multiple
attributes can be specified, comma-separated.
--since
type: string
Parse only queries newer than this value (parse queries since this date).
This option allows you to ignore queries older than a certain value and parse only those queries
which are more recent than the value. The value can be several types:
* Simple time value N with optional suffix: N[shmd], where
s=seconds, h=hours, m=minutes, d=days (default s if no suffix
given); this is like saying "since N[shmd] ago"
* Full date with optional hours:minutes:seconds:
YYYY-MM-DD [HH:MM:SS]
* Short, MySQL-style date:
YYMMDD [HH:MM:SS]
* Any time expression evaluated by MySQL:
CURRENT_DATE - INTERVAL 7 DAY
If you give a MySQL time expression, and you have not also specified a DSN for "--explain",
"--processlist", or "--review", then you must specify a DSN on the command line so that pt-query-
digest can connect to MySQL to evaluate the expression.
The MySQL time expression is wrapped inside a query like "SELECT UNIX_TIMESTAMP(<expression>)", so be
sure that the expression is valid inside this query. For example, do not use UNIX_TIMESTAMP()
because UNIX_TIMESTAMP(UNIX_TIMESTAMP()) returns 0.
Events are assumed to be in chronological: older events at the beginning of the log and newer events
at the end of the log. "--since" is strict: it ignores all queries until one is found that is new
enough. Therefore, if the query events are not consistently timestamped, some may be ignored which
are actually new enough.
See also "--until".
--socket
short form: -S; type: string
Socket file to use for connection.
--timeline
Show a timeline of events.
This option makes pt-query-digest print another kind of report: a timeline of the events. Each query
is still grouped and aggregate into classes according to "--group-by", but then they are printed in
chronological order. The timeline report prints out the timestamp, interval, count and value of each
classes.
If all you want is the timeline report, then specify "--no-report" to suppress the default query
analysis report. Otherwise, the timeline report will be printed at the end before the response-time
profile (see "--report-format" and "OUTPUT").
For example, this:
pt-query-digest /path/to/log --group-by distill --timeline
will print something like:
# ########################################################
# distill report
# ########################################################
# 2009-07-25 11:19:27 1+00:00:01 2 SELECT foo
# 2009-07-27 11:19:30 00:01 2 SELECT bar
# 2009-07-27 11:30:00 1+06:30:00 2 SELECT foo
--type
type: Array; default: slowlog
The type of input to parse. The permitted types are
binlog
Parse a binary log file that has first been converted to text using mysqlbinlog.
For example:
mysqlbinlog mysql-bin.000441 > mysql-bin.000441.txt
pt-query-digest --type binlog mysql-bin.000441.txt
genlog
Parse a MySQL general log file. General logs lack a lot of "ATTRIBUTES", notably "Query_time".
The default "--order-by" for general logs changes to "Query_time:cnt".
slowlog
Parse a log file in any variation of MySQL slow log format.
tcpdump
Inspect network packets and decode the MySQL client protocol, extracting queries and responses
from it.
pt-query-digest does not actually watch the network (i.e. it does NOT "sniff packets"). Instead,
it's just parsing the output of tcpdump. You are responsible for generating this output; pt-
query-digest does not do it for you. Then you send this to pt-query-digest as you would any log
file: as files on the command line or to STDIN.
The parser expects the input to be formatted with the following options: "-x -n -q -tttt". For
example, if you want to capture output from your local machine, you can do something like the
following (the port must come last on FreeBSD):
tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 \
> mysql.tcp.txt
pt-query-digest --type tcpdump mysql.tcp.txt
The other tcpdump parameters, such as -s, -c, and -i, are up to you. Just make sure the output
looks like this (there is a line break in the first line to avoid man-page problems):
2009-04-12 09:50:16.804849 IP 127.0.0.1.42167
> 127.0.0.1.3306: tcp 37
0x0000: 4508 0059 6eb2 4000 4006 cde2 7f00 0001
0x0010: ....
Remember tcpdump has a handy -c option to stop after it captures some number of packets! That's
very useful for testing your tcpdump command. Note that tcpdump can't capture traffic on a Unix
socket. Read <http://bugs.mysql.com/bug.php?id=31577> if you're confused about this.
Devananda Van Der Veen explained on the MySQL Performance Blog how to capture traffic without
dropping packets on busy servers. Dropped packets cause pt-query-digest to miss the response to
a request, then see the response to a later request and assign the wrong execution time to the
query. You can change the filter to something like the following to help capture a subset of the
queries. (See <http://www.mysqlperformanceblog.com/?p=6092> for details.)
tcpdump -i any -s 65535 -x -n -q -tttt \
'port 3306 and tcp[1] & 7 == 2 and tcp[3] & 7 == 2'
All MySQL servers running on port 3306 are automatically detected in the tcpdump output.
Therefore, if the tcpdump out contains packets from multiple servers on port 3306 (for example,
10.0.0.1:3306, 10.0.0.2:3306, etc.), all packets/queries from all these servers will be analyzed
together as if they were one server.
If you're analyzing traffic for a MySQL server that is not running on port 3306, see
"--watch-server".
Also note that pt-query-digest may fail to report the database for queries when parsing tcpdump
output. The database is discovered only in the initial connect events for a new client or when
<USE db> is executed. If the tcpdump output contains neither of these, then pt-query-digest
cannot discover the database.
Server-side prepared statements are supported. SSL-encrypted traffic cannot be inspected and
decoded.
rawlog
Raw logs are not MySQL logs but simple text files with one SQL statement per line, like:
SELECT c FROM t WHERE id=1
/* Hello, world! */ SELECT * FROM t2 LIMIT 1
INSERT INTO t (a, b) VALUES ('foo', 'bar')
INSERT INTO t SELECT * FROM monkeys
Since raw logs do not have any metrics, many options and features of pt-query-digest do not work
with them.
One use case for raw logs is ranking queries by count when the only information available is a
list of queries, from polling "SHOW PROCESSLIST" for example.
--until
type: string
Parse only queries older than this value (parse queries until this date).
This option allows you to ignore queries newer than a certain value and parse only those queries
which are older than the value. The value can be one of the same types listed for "--since".
Unlike "--since", "--until" is not strict: all queries are parsed until one has a timestamp that is
equal to or greater than "--until". Then all subsequent queries are ignored.
--user
short form: -u; type: string
User for login if not current user.
--variations
type: Array
Report the number of variations in these attributes' values.
Variations show how many distinct values an attribute had within a class. The usual value for this
option is "arg" which shows how many distinct queries were in the class. This can be useful to
determine a query's cacheability.
Distinct values are determined by CRC32 checksums of the attributes' values. These checksums are
reported in the query report for attributes specified by this option, like:
# arg crc 109 (1/25%), 144 (1/25%)... 2 more
In that class there were 4 distinct queries. The checksums of the first two variations are shown,
and each one occurred once (or, 25% of the time).
The counts of distinct variations is approximate because only 1,000 variations are saved. The mod
(%) 1000 of the full CRC32 checksum is saved, so some distinct checksums are treated as equal.
--version
Show version and exit.
--[no]version-check
default: yes
Check for the latest version of Percona Toolkit, MySQL, and other programs.
This is a standard "check for updates automatically" feature, with two additional features. First,
the tool checks its own version and also the versions of the following software: operating system,
Percona Monitoring and Management (PMM), MySQL, Perl, MySQL driver for Perl (DBD::mysql), and Percona
Toolkit. Second, it checks for and warns about versions with known problems. For example, MySQL
5.5.25 had a critical bug and was re-released as 5.5.25a.
A secure connection to Percona’s Version Check database server is done to perform these checks. Each
request is logged by the server, including software version numbers and unique ID of the checked
system. The ID is generated by the Percona Toolkit installation script or when the Version Check
database call is done for the first time.
Any updates or known problems are printed to STDOUT before the tool's normal output. This feature
should never interfere with the normal operation of the tool.
For more information, visit <https://www.percona.com/doc/percona-toolkit/LATEST/version-check.html>.
--[no]vertical-format
default: yes
Output a trailing "\G" in the reported SQL queries.
This makes the mysql client display the result using vertical format. Non-native MySQL clients like
phpMyAdmin do not support this.
--watch-server
type: string
This option tells pt-query-digest which server IP address and port (like "10.0.0.1:3306") to watch
when parsing tcpdump (for "--type" tcpdump); all other servers are ignored. If you don't specify it,
pt-query-digest watches all servers by looking for any IP address using port 3306 or "mysql". If
you're watching a server with a non-standard port, this won't work, so you must specify the IP
address and port to watch.
If you want to watch a mix of servers, some running on standard port 3306 and some running on non-
standard ports, you need to create separate tcpdump outputs for the non-standard port servers and
then specify this option for each. At present pt-query-digest cannot auto-detect servers on port
3306 and also be told to watch a server on a non-standard port.