ceph-bluestore-tool - bluestore administrative tool
Contents
Additional Ceph.Conf Options
Any configuration option that is accepted by OSD can be also passed to ceph-bluestore-tool. Useful to
provide necessary configuration options when access to monitor/ceph.conf is impossible and -i option
cannot be used.
Availability
ceph-bluestore-tool is part of Ceph, a massively scalable, open-source, distributed storage system.
Please refer to the Ceph documentation at https://docs.ceph.com for more information.
Bluefs Log Rescue
Some versions of BlueStore were susceptible to BlueFS log growing extremely large - beyond the point of
making booting OSD impossible. This state is indicated by booting that takes very long and fails in
_replay function.
Thiscanbefixedby::
ceph-bluestore-tool fsck --path osdpath --bluefs_replay_recovery=true
Itisadvisedtofirstcheckifrescueprocesswouldbesuccessful::
ceph-bluestore-tool fsck --path osdpath --bluefs_replay_recovery=true
--bluefs_replay_recovery_disable_compact=true
If above fsck is successful fix procedure can be applied.
Commands
help
show help
fsck [ --deep ] (on|off)or(yes|no)or(1|0)or(true|false)
run consistency check on BlueStore metadata. If --deep is specified, also read all object data and
verify checksums.
repair
Run a consistency check and repair any errors we can.
qfsck
run consistency check on BlueStore metadata comparing allocator data (from RocksDB CFB when exists and
if not uses allocation-file) with ONodes state.
allocmap
performs the same check done by qfsck and then stores a new allocation-file (command is disabled by
default and requires a special build)
restore_cfb
Reverses changes done by the new NCB code (either through ceph restart or when running allocmap
command) and restores RocksDB B Column-Family (allocator-map).
bluefs-export
Export the contents of BlueFS (i.e., RocksDB files) to an output directory.
bluefs-bdev-sizes --path osdpath
Print the device sizes, as understood by BlueFS, to stdout.
bluefs-bdev-expand --path osdpath
Instruct BlueFS to check the size of its block devices and, if they have expanded, make use of the
additional space. Please note that only the new files created by BlueFS will be allocated on the
preferred block device if it has enough free space, and the existing files that have spilled over to
the slow device will be gradually removed when RocksDB performs compaction. In other words, if there
is any data spilled over to the slow device, it will be moved to the fast device over time.
bluefs-bdev-new-wal --path osdpath --dev-target new-device
Adds WAL device to BlueFS, fails if WAL device already exists.
bluefs-bdev-new-db --path osdpath --dev-target new-device
Adds DB device to BlueFS, fails if DB device already exists.
bluefs-bdev-migrate --dev-target new-device --devs-source device1 [--devs-source device2]
Moves BlueFS data from source device(s) to the target device. Source devices (except the main one) are
removed on success. Expands the target storage (updates the size label), making "bluefs-bdev-expand"
unnecessary. The target device can be either a new device or a device that is already attached. If the
device is a new device, it is added to the OSD replacing one of the source devices. The following
replacement rules apply (in the order of precedence, stop on the first match):
• if the source list has DB volume - the target device replaces it.
• if the source list has WAL volume - the target device replaces it.
• if the source list has slow volume only - the operation isn't permitted and requires explicit
allocation via a new-DB/new-WAL command.
show-label --dev device [...]
Show device label(s). The label may be printed while an OSD is running.
free-dump --path osdpath [ --allocator block/bluefs-wal/bluefs-db/bluefs-slow ]
Dump all free regions in allocator.
free-score --path osdpath [ --allocator block/bluefs-wal/bluefs-db/bluefs-slow ]
Give a [0-1] number that represents quality of fragmentation in allocator. 0 represents case when all
free space is in one chunk. 1 represents worst possible fragmentation.
reshard --path osdpath --sharding newsharding [ --resharding-ctrl controlstring ]
Changes sharding of BlueStore's RocksDB. Sharding is build on top of RocksDB column families. This
option allows to test performance of newsharding without need to redeploy OSD. Resharding is usually
a long process, which involves walking through entire RocksDB key space and moving some of them to
different column families. Option --resharding-ctrl provides performance control over resharding
process. Interrupted resharding will prevent OSD from running. Interrupted resharding does not
corrupt data. It is always possible to continue previous resharding, or select any other sharding
scheme, including reverting to original one.
show-sharding --path osdpath
Show sharding that is currently applied to BlueStore's RocksDB.
commandzap-device --dev devpath
Zeros all device label locations. This effectively makes device appear empty.
Copyright
2010-2014, Inktank Storage, Inc. and contributors. Licensed under Creative Commons Attribution Share
Alike 3.0 (CC-BY-SA-3.0)
dev May 22, 2025 CEPH-BLUESTORE-TOOL(8)
Description
ceph-bluestore-tool is a utility to perform low-level administrative operations on a BlueStore instance.
Device Labels
Every BlueStore block device has a block label at the beginning of the device. You can dump the contents
of the label with:
ceph-bluestore-tool show-label --dev *device*
The main device will have a lot of metadata, including information that used to be stored in small files
in the OSD data directory. The auxiliary devices (db and wal) will only have the minimum required fields
(OSD UUID, size, device type, birth time). The main device contains additional label copies at offsets:
1G, 10G, 100G and 1000G. Corrupted labels are fixed as part of repair:
ceph-bluestore-tool repair --dev *device*
Name
ceph-bluestore-tool - bluestore administrative tool
Options
--dev*device*
Add device to the list of devices to consider
-i*osd_id*
Operate as OSD osd_id. Connect to monitor for OSD specific options. If monitor is unavailable,
add --no-mon-config to read from ceph.conf instead.
--devs-source*device*
Add device to the list of devices to consider as sources for migrate operation
--dev-target*device*
Specify target device migrate operation or device to add for adding new DB/WAL.
--path*osdpath*
Specify an osd path. In most cases, the device list is inferred from the symlinks present in osdpath. This is usually simpler than explicitly specifying the device(s) with --dev. Not necessary
if -i osd_id is provided.
--out-dir*dir*
Output directory for bluefs-export
-l,--log-file*logfile*
file to log to
--log-level*num*
debug log level. Default is 30 (extremely verbose), 20 is very verbose, 10 is verbose, and 1 is
not very verbose.
--deep deep scrub/repair (read and validate object data, not just metadata)
--allocator*name*
Useful for free-dump and free-score actions. Selects allocator(s).
--resharding-ctrl*controlstring*
Provides control over resharding process. Specifies how often refresh RocksDB iterator, and how
large should commit batch be before committing to RocksDB. Option format is:
<iterator_refresh_bytes>/<iterator_refresh_keys>/<batch_commit_bytes>/<batch_commit_keys> Default:
10000000/10000/1000000/1000
Osd Directory Priming
You can generate the content for an OSD data directory that can start up a BlueStore OSD with the
prime-osd-dir command:
ceph-bluestore-tool prime-osd-dir --dev *main device* --path /var/lib/ceph/osd/ceph-*id*
See Also
ceph-osd(8)
Synopsis
ceph-bluestore-toolcommand
[ --dev device ... ]
[ -i osd_id ]
[ --path osdpath ]
[ --out-dir dir ]
[ --log-file | -l filename ]
[ --deep ]
ceph-bluestore-tool fsck|repair --path osdpath [ --deep ]
ceph-bluestore-tool qfsck --path osdpathceph-bluestore-tool allocmap --path osdpathceph-bluestore-tool restore_cfb --path osdpathceph-bluestore-tool show-label --dev device ...
ceph-bluestore-tool prime-osd-dir --dev device --path osdpathceph-bluestore-tool bluefs-export --path osdpath --out-dir dirceph-bluestore-tool bluefs-bdev-new-wal --path osdpath --dev-target new-deviceceph-bluestore-tool bluefs-bdev-new-db --path osdpath --dev-target new-deviceceph-bluestore-tool bluefs-bdev-migrate --path osdpath --dev-target new-device --devs-source device1 [--devs-source device2]
ceph-bluestore-tool free-dump|free-score --path osdpath [ --allocator block/bluefs-wal/bluefs-db/bluefs-slow ]
ceph-bluestore-tool reshard --path osdpath --sharding newsharding [ --sharding-ctrl controlstring ]
ceph-bluestore-tool show-sharding --path osdpathceph-bluestore-tool zap-device --dev devpath