Discussion:
Bug#915666: linux: data corruption with blk-mq
Christoph Anton Mitterer
2018-12-05 19:55:06 UTC
Permalink
Source: linux
Version: 4.18.20-2
Severity: critical
Tags: upstream patch
Justification: causes serious data loss


Hi.

There's a bug in the blk-mq schedulers which may cause serious data
curruption...
See https://bugzilla.kernel.org/show_bug.cgi?id=201685

Seems like a patch was made recently,... maybe it would make sense
to cherry pick that one before it makes its way via the stable kernels.

AFAIU the discussions, at least 4.18 may be affected as well.


Cheers,
Chris.


-- System Information:
Debian Release: buster/sid
APT prefers unstable-debug
APT policy: (500, 'unstable-debug'), (500, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.18.0-3-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_DE.UTF-8, LC_CTYPE=en_DE.UTF-8 (charmap=UTF-8), LANGUAGE=en_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Christoph Anton Mitterer
2018-12-05 23:37:53 UTC
Permalink
For those reading along, Jens Axboe gave a summary on how to
check whether one's affected or not:

Quoting from: https://bugzilla.kernel.org/show_bug.cgi?id=201685#c294
scsi_mod.use_blk_mq=0 will do the trick, as will just ensuring that you have
# cat /sys/block/sda/queue/scheduler
bfq [mq-deadline] none
As long as that doesn't say [none], you are fine as well. Also note that
this seems to require a special circumstance of timing and devices to even
be possible in the first place. But I would recommend ensuring that one of
the above two conditions are true, and I'd further recommend just using
mq-deadline (or bfq or kyber, whatever is your preference) instead of
turning scsi-mq off.
Once you've ensured that after a fresh boot, I'd double check by running
fsck on the file systems hosted by a SCSI/SATA device.
It seems that all my own systems (with blk-mq) run with [mq-deadline]
out of the box.
Maybe Debian maintainers can tell whether this is the default (since
long) so people can easier find out whether they need to check their
data for corruptions.



Cheers,
Chris.

Loading...