Prerequisites
There are some prerequisites for pg_createsubscriber to convert the target server into a logical replica.
If these are not met, an error will be reported. The source and target servers must have the same major
version as the pg_createsubscriber. The given target data directory must have the same system identifier
as the source data directory. The given database user for the target data directory must have privileges
for creating subscriptions and using pg_replication_origin_advance().
The target server must be used as a physical standby. The target server must have max_replication_slots
and max_logical_replication_workers configured to a value greater than or equal to the number of
specified databases. The target server must have max_worker_processes configured to a value greater than
the number of specified databases. The target server must accept local connections.
The source server must accept connections from the target server. The source server must not be in
recovery. The source server must have wal_level as logical. The source server must have
max_replication_slots configured to a value greater than or equal to the number of specified databases
plus existing replication slots. The source server must have max_wal_senders configured to a value
greater than or equal to the number of specified databases and existing WAL sender processes.
Warnings
If pg_createsubscriber fails after the target server was promoted, then the data directory is likely not
in a state that can be recovered. In such case, creating a new standby server is recommended.
pg_createsubscriber usually starts the target server with different connection settings during
transformation. Hence, connections to the target server should fail.
Since DDL commands are not replicated by logical replication, avoid executing DDL commands that change
the database schema while running pg_createsubscriber. If the target server has already been converted to
logical replica, the DDL commands might not be replicated, which might cause an error.
If pg_createsubscriber fails while processing, objects (publications, replication slots) created on the
source server are removed. The removal might fail if the target server cannot connect to the source
server. In such a case, a warning message will inform the objects left. If the target server is running,
it will be stopped.
If the replication is using primary_slot_name, it will be removed from the source server after the
logical replication setup.
If the target server is a synchronous replica, transaction commits on the primary might wait for
replication while running pg_createsubscriber.
pg_createsubscriber sets up logical replication with two-phase commit disabled. This means that any
prepared transactions will be replicated at the time of COMMITPREPARED, without advance preparation.
Once setup is complete, you can manually drop and re-create the subscription(s) with the two_phase option
enabled.
pg_createsubscriber changes the system identifier using pg_resetwal. It would avoid situations in which
the target server might use WAL files from the source server. If the target server has a standby,
replication will break and a fresh standby should be created.
HowItWorks
The basic idea is to have a replication start point from the source server and set up a logical
replication to start from this point:
1. Start the target server with the specified command-line options. If the target server is already
running, pg_createsubscriber will terminate with an error.
2. Check if the target server can be converted. There are also a few checks on the source server. If any
of the prerequisites are not met, pg_createsubscriber will terminate with an error.
3. Create a publication and replication slot for each specified database on the source server. Each
publication is created using FOR ALL TABLES. If the --publication option is not specified, the
publication has the following name pattern: “pg_createsubscriber_%u_%x” (parameter: database oid,
random int). If the --replication-slot option is not specified, the replication slot has the
following name pattern: “pg_createsubscriber_%u_%x” (parameters: database oid, random int). These
replication slots will be used by the subscriptions in a future step. The last replication slot LSN
is used as a stopping point in the recovery_target_lsn parameter and by the subscriptions as a
replication start point. It guarantees that no transaction will be lost.
4. Write recovery parameters into the target data directory and restart the target server. It specifies
an LSN (recovery_target_lsn) of the write-ahead log location up to which recovery will proceed. It
also specifies promote as the action that the server should take once the recovery target is reached.
Additional recovery parameters are added to avoid unexpected behavior during the recovery process
such as end of the recovery as soon as a consistent state is reached (WAL should be applied until the
replication start location) and multiple recovery targets that can cause a failure. This step
finishes once the server ends standby mode and is accepting read-write transactions. If
--recovery-timeout option is set, pg_createsubscriber terminates if recovery does not end until the
given number of seconds.
5. Create a subscription for each specified database on the target server. If the --subscription option
is not specified, the subscription has the following name pattern: “pg_createsubscriber_%u_%x”
(parameters: database oid, random int). It does not copy existing data from the source server. It
does not create a replication slot. Instead, it uses the replication slot that was created in a
previous step. The subscription is created but it is not enabled yet. The reason is the replication
progress must be set to the replication start point before starting the replication.
6. Drop publications on the target server that were replicated because they were created before the
replication start location. It has no use on the subscriber.
7. Set the replication progress to the replication start point for each subscription. When the target
server starts the recovery process, it catches up to the replication start point. This is the exact
LSN to be used as a initial replication location for each subscription. The replication origin name
is obtained since the subscription was created. The replication origin name and the replication start
point are used in pg_replication_origin_advance() to set up the initial replication location.
8. Enable the subscription for each specified database on the target server. The subscription starts
applying transactions from the replication start point.
9. If the standby server was using primary_slot_name, it has no use from now on so drop it.
10. If the standby server contains failover replication slots, they cannot be synchronized anymore, so
drop them.
11. Update the system identifier on the target server. The pg_resetwal(1) is run to modify the system
identifier. The target server is stopped as a pg_resetwal requirement.