logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

pgreplay - PostgreSQL log file replayer for performance tests

Author

       Written by Laurenz Albe <laurenz.albe@wien.gv.at>.

Jun 2011                                                                                             pgreplay(1)

Description

pgreplay  reads  a PostgreSQL log file (not a WAL file), extracts the SQL statements and executes them in
       the same order and relative time against a PostgreSQL database cluster.   A  final  report  gives  you  a
       useful statistical analysis of your workload and its execution.

       In the first form, the log file infile is replayed at the time it is read.

       With  the  -f option, pgreplay will not execute the statements, but write them to a ‘replay file’ outfile
       that can be replayed with the third form.

       With the -r option, pgreplay will execute the statements in the replay file infile that  was  created  by
       the second form.

       If  the  execution of statements gets behind schedule, warning messages are issued that indicate that the
       server cannot handle the load in a timely fashion.  The idea is to replay a real-world database  workload
       as exactly as possible.

       To  create  a  log  file  that  can  be  parsed  by pgreplay, you need to set the following parameters in
       postgresql.conf:

              log_min_messages=error (or more)
              log_min_error_statement=log (or more)
              log_connections=onlog_disconnections=onlog_line_prefix='%m|%u|%d|%c|' (if you don't use CSV logging)
              log_statement='all'lc_messages must be set to English (encoding does not matter)
              bytea_output=escape (from version 9.0 on, only if you want to replay the log on 8.4 or earlier)

       The database cluster against which you replay the SQL statements must be a clone of the database  cluster
       that generated the logs from the time immediatelybefore the logs were generated.

       pgreplay is useful for performance tests, particularly in the following situations:

       *   You want to compare the performance of your PostgreSQL application on different hardware or different
           operating systems.

       *   You want to upgrade your database and want to make sure that the new database version does not suffer
           from performance regressions that affect you.

       Moreover,  pgreplay  can  give you some feeling as to how your application might scale by allowing you to
       try to replay the workload at a higher speed.  Be warned, though, that 500 users working at double  speed
       is not really the same as 1000 users working at normal speed.

Environment

PGHOST Specifies the default value for the -h option.

       PGPORT Specifies the default value for the -p option.

       PGCLIENTENCODING
              Specifies the default value for the -E option.

Limitations

pgreplay can only replay what is logged by PostgreSQL.  This leads to some limitations:

       *   COPY statements will not be replayed, because the copy data are not logged.

       *   Fast-path  API  function calls are not logged and will not be replayed.  Unfortunately, this includes
           the Large Object API.

       *   Since the log file is always in the server encoding (which you can specify  with  the  -E  switch  of
           pgreplay), all SETclient_encoding statements will be ignored.

       *   Since the preparation time of prepared statements is not logged (unless log_min_messages is debug2 or
           more), these statements will be prepared immediately before they are first executed during replay.

       *   Because  the  log  file contains only text, query parameters and return values will always be in text
           and never in binary format. If you use binary mode to, say, transfer large binary data, pgreplay  can
           cause significantly more network traffic than the original run.

       *   Sometimes,  if  a  connection  takes  longer  to complete, the session ID unexpectedly changes in the
           PostgreSQL log file. This causes pgreplay to treat the session as two different ones, resulting in an
           additional connection. This is arguably a bug in PostgreSQL.

Name

       pgreplay - PostgreSQL log file replayer for performance tests

Options

Parseoptions:-c     Specifies that the log file is in 'csvlog' format (highly recommended) and not in 'stderr' format.

       -btimestamp
              Only  log  entries  greater  or  equal to that timestamp will be parsed.  The format is YYYY-MM-DDHH:MM:SS.FFF like in the log file.  An optional time zone part will be ignored.

       -etimestamp
              Only log entries less or equal to that  timestamp  will  be  parsed.   The  format  is  YYYY-MM-DDHH:MM:SS.FFF like in the log file.  An optional time zone part will be ignored.

       -q     Specifies  that  a  backslash  in  a simple string literal will escape the following single quote.
              This depends on configuration options like standard_conforming_strings  and  is  the  default  for
              server version 9.0 and less.

   Replayoptions:-hhostname
              Host  name where the target database cluster is running (or directory where the UNIX socket can be
              found). Defaults to local connections.
              This works just like the -h option of psql.

       -pport
              TCP port where the target database cluster can be reached.

       -Wpassword
              By  default,  pgreplay  assumes  that  the  target  database  cluster  is  configured  for   trust
              authentication.  With  the -W option you can specify a password that will be used for all users in
              the cluster.

       -sfactor
              Speed factor for replay, by default 1. This can be any valid positive  floating  point  number.  A
              factor  less than 1 will replay the workload in ‘slow motion’, while a factor greater than 1 means
              ‘fast forward’.

       -Eencoding
              Specifies the encoding of the log file, which will be used as client encoding during replay. If it
              is omitted, your default client encoding will be used.

       -j     If all connections are idle, jump ahead to the next request instead of sleeping. This  will  speed
              up  replay.  Execution  delays  will  still  be reported correctly, but replay statistics will not
              contain the idle time.

   Outputoptions:-ooutfile
              specifies the replay file where the statements will be written for later replay.

   Debugoptions:-dlevel
              Specifies the trace level (between  1  and  3).  Increasing  levels  will  produce  more  detailed
              information about what pgreplay is doing.

       -v     Prints the program version and exits.

Synopsis

pgreplay [parseoptions] [replayoptions] [-dlevel] [infile]
       pgreplay-f [parseoptions] [-ooutfile] [-dlevel] [infile]
       pgreplay-r [replayoptions] [-dlevel] [infile]

See Also