Implement Trusted Path Execution#25
Open
madaidan wants to merge 1 commit into
Open
Conversation
|
Thanks for the pull-request. I've only taken a quick look but can't this be easily bypassed with |
Author
Executable memory mappings are a big issue but I don't think it should be solved in TPE. Other things like PAX_MPROTECT, S.A.R.A. LSM or SELinux's memory protections can solve this by preventing memory mappings from being writable and executable.
I've briefly looked over them. |
anthraxx
pushed a commit
that referenced
this pull request
Jun 22, 2020
[ Upstream commit f6766ff ] We need to check mddev->del_work before flush workqueu since the purpose of flush is to ensure the previous md is disappeared. Otherwise the similar deadlock appeared if LOCKDEP is enabled, it is due to md_open holds the bdev->bd_mutex before flush workqueue. kernel: [ 154.522645] ====================================================== kernel: [ 154.522647] WARNING: possible circular locking dependency detected kernel: [ 154.522650] 5.6.0-rc7-lp151.27-default #25 Tainted: G O kernel: [ 154.522651] ------------------------------------------------------ kernel: [ 154.522653] mdadm/2482 is trying to acquire lock: kernel: [ 154.522655] ffff888078529128 ((wq_completion)md_misc){+.+.}, at: flush_workqueue+0x84/0x4b0 kernel: [ 154.522673] kernel: [ 154.522673] but task is already holding lock: kernel: [ 154.522675] ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522691] kernel: [ 154.522691] which lock already depends on the new lock. kernel: [ 154.522691] kernel: [ 154.522694] kernel: [ 154.522694] the existing dependency chain (in reverse order) is: kernel: [ 154.522696] kernel: [ 154.522696] -> #4 (&bdev->bd_mutex){+.+.}: kernel: [ 154.522704] __mutex_lock+0x87/0x950 kernel: [ 154.522706] __blkdev_get+0x79/0x590 kernel: [ 154.522708] blkdev_get+0x65/0x140 kernel: [ 154.522709] blkdev_get_by_dev+0x2f/0x40 kernel: [ 154.522716] lock_rdev+0x3d/0x90 [md_mod] kernel: [ 154.522719] md_import_device+0xd6/0x1b0 [md_mod] kernel: [ 154.522723] new_dev_store+0x15e/0x210 [md_mod] kernel: [ 154.522728] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522732] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522735] vfs_write+0xad/0x1a0 kernel: [ 154.522737] ksys_write+0xa4/0xe0 kernel: [ 154.522745] do_syscall_64+0x64/0x2b0 kernel: [ 154.522748] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522749] kernel: [ 154.522749] -> #3 (&mddev->reconfig_mutex){+.+.}: kernel: [ 154.522752] __mutex_lock+0x87/0x950 kernel: [ 154.522756] new_dev_store+0xc9/0x210 [md_mod] kernel: [ 154.522759] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522761] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522763] vfs_write+0xad/0x1a0 kernel: [ 154.522765] ksys_write+0xa4/0xe0 kernel: [ 154.522767] do_syscall_64+0x64/0x2b0 kernel: [ 154.522769] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522770] kernel: [ 154.522770] -> #2 (kn->count#253){++++}: kernel: [ 154.522775] __kernfs_remove+0x253/0x2c0 kernel: [ 154.522778] kernfs_remove+0x1f/0x30 kernel: [ 154.522780] kobject_del+0x28/0x60 kernel: [ 154.522783] mddev_delayed_delete+0x24/0x30 [md_mod] kernel: [ 154.522786] process_one_work+0x2a7/0x5f0 kernel: [ 154.522788] worker_thread+0x2d/0x3d0 kernel: [ 154.522793] kthread+0x117/0x130 kernel: [ 154.522795] ret_from_fork+0x3a/0x50 kernel: [ 154.522796] kernel: [ 154.522796] -> #1 ((work_completion)(&mddev->del_work)){+.+.}: kernel: [ 154.522800] process_one_work+0x27e/0x5f0 kernel: [ 154.522802] worker_thread+0x2d/0x3d0 kernel: [ 154.522804] kthread+0x117/0x130 kernel: [ 154.522806] ret_from_fork+0x3a/0x50 kernel: [ 154.522807] kernel: [ 154.522807] -> #0 ((wq_completion)md_misc){+.+.}: kernel: [ 154.522813] __lock_acquire+0x1392/0x1690 kernel: [ 154.522816] lock_acquire+0xb4/0x1a0 kernel: [ 154.522818] flush_workqueue+0xab/0x4b0 kernel: [ 154.522821] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522823] __blkdev_get+0xea/0x590 kernel: [ 154.522825] blkdev_get+0x65/0x140 kernel: [ 154.522828] do_dentry_open+0x1d1/0x380 kernel: [ 154.522831] path_openat+0x567/0xcc0 kernel: [ 154.522834] do_filp_open+0x9b/0x110 kernel: [ 154.522836] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522838] do_sys_open+0x57/0x80 kernel: [ 154.522840] do_syscall_64+0x64/0x2b0 kernel: [ 154.522842] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522844] kernel: [ 154.522844] other info that might help us debug this: kernel: [ 154.522844] kernel: [ 154.522846] Chain exists of: kernel: [ 154.522846] (wq_completion)md_misc --> &mddev->reconfig_mutex --> &bdev->bd_mutex kernel: [ 154.522846] kernel: [ 154.522850] Possible unsafe locking scenario: kernel: [ 154.522850] kernel: [ 154.522852] CPU0 CPU1 kernel: [ 154.522853] ---- ---- kernel: [ 154.522854] lock(&bdev->bd_mutex); kernel: [ 154.522856] lock(&mddev->reconfig_mutex); kernel: [ 154.522858] lock(&bdev->bd_mutex); kernel: [ 154.522860] lock((wq_completion)md_misc); kernel: [ 154.522861] kernel: [ 154.522861] *** DEADLOCK *** kernel: [ 154.522861] kernel: [ 154.522864] 1 lock held by mdadm/2482: kernel: [ 154.522865] #0: ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522868] kernel: [ 154.522868] stack backtrace: kernel: [ 154.522873] CPU: 1 PID: 2482 Comm: mdadm Tainted: G O 5.6.0-rc7-lp151.27-default #25 kernel: [ 154.522875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 kernel: [ 154.522878] Call Trace: kernel: [ 154.522881] dump_stack+0x8f/0xcb kernel: [ 154.522884] check_noncircular+0x194/0x1b0 kernel: [ 154.522888] ? __lock_acquire+0x1392/0x1690 kernel: [ 154.522890] __lock_acquire+0x1392/0x1690 kernel: [ 154.522893] lock_acquire+0xb4/0x1a0 kernel: [ 154.522895] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522898] flush_workqueue+0xab/0x4b0 kernel: [ 154.522900] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522905] ? md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522908] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522910] __blkdev_get+0xea/0x590 kernel: [ 154.522912] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522914] blkdev_get+0x65/0x140 kernel: [ 154.522916] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522918] do_dentry_open+0x1d1/0x380 kernel: [ 154.522921] path_openat+0x567/0xcc0 kernel: [ 154.522923] ? __lock_acquire+0x380/0x1690 kernel: [ 154.522926] do_filp_open+0x9b/0x110 kernel: [ 154.522929] ? __alloc_fd+0xe5/0x1f0 kernel: [ 154.522935] ? kmem_cache_alloc+0x28c/0x630 kernel: [ 154.522939] ? do_sys_openat2+0x201/0x2a0 kernel: [ 154.522941] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522944] do_sys_open+0x57/0x80 kernel: [ 154.522946] do_syscall_64+0x64/0x2b0 kernel: [ 154.522948] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522951] RIP: 0033:0x7f98d279d9ae And md_alloc also flushed the same workqueue, but the thing is different here. Because all the paths call md_alloc don't hold bdev->bd_mutex, and the flush is necessary to avoid race condition, so leave it as it is. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Jun 22, 2020
[ Upstream commit f6766ff ] We need to check mddev->del_work before flush workqueu since the purpose of flush is to ensure the previous md is disappeared. Otherwise the similar deadlock appeared if LOCKDEP is enabled, it is due to md_open holds the bdev->bd_mutex before flush workqueue. kernel: [ 154.522645] ====================================================== kernel: [ 154.522647] WARNING: possible circular locking dependency detected kernel: [ 154.522650] 5.6.0-rc7-lp151.27-default #25 Tainted: G O kernel: [ 154.522651] ------------------------------------------------------ kernel: [ 154.522653] mdadm/2482 is trying to acquire lock: kernel: [ 154.522655] ffff888078529128 ((wq_completion)md_misc){+.+.}, at: flush_workqueue+0x84/0x4b0 kernel: [ 154.522673] kernel: [ 154.522673] but task is already holding lock: kernel: [ 154.522675] ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522691] kernel: [ 154.522691] which lock already depends on the new lock. kernel: [ 154.522691] kernel: [ 154.522694] kernel: [ 154.522694] the existing dependency chain (in reverse order) is: kernel: [ 154.522696] kernel: [ 154.522696] -> #4 (&bdev->bd_mutex){+.+.}: kernel: [ 154.522704] __mutex_lock+0x87/0x950 kernel: [ 154.522706] __blkdev_get+0x79/0x590 kernel: [ 154.522708] blkdev_get+0x65/0x140 kernel: [ 154.522709] blkdev_get_by_dev+0x2f/0x40 kernel: [ 154.522716] lock_rdev+0x3d/0x90 [md_mod] kernel: [ 154.522719] md_import_device+0xd6/0x1b0 [md_mod] kernel: [ 154.522723] new_dev_store+0x15e/0x210 [md_mod] kernel: [ 154.522728] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522732] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522735] vfs_write+0xad/0x1a0 kernel: [ 154.522737] ksys_write+0xa4/0xe0 kernel: [ 154.522745] do_syscall_64+0x64/0x2b0 kernel: [ 154.522748] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522749] kernel: [ 154.522749] -> #3 (&mddev->reconfig_mutex){+.+.}: kernel: [ 154.522752] __mutex_lock+0x87/0x950 kernel: [ 154.522756] new_dev_store+0xc9/0x210 [md_mod] kernel: [ 154.522759] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522761] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522763] vfs_write+0xad/0x1a0 kernel: [ 154.522765] ksys_write+0xa4/0xe0 kernel: [ 154.522767] do_syscall_64+0x64/0x2b0 kernel: [ 154.522769] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522770] kernel: [ 154.522770] -> #2 (kn->count#253){++++}: kernel: [ 154.522775] __kernfs_remove+0x253/0x2c0 kernel: [ 154.522778] kernfs_remove+0x1f/0x30 kernel: [ 154.522780] kobject_del+0x28/0x60 kernel: [ 154.522783] mddev_delayed_delete+0x24/0x30 [md_mod] kernel: [ 154.522786] process_one_work+0x2a7/0x5f0 kernel: [ 154.522788] worker_thread+0x2d/0x3d0 kernel: [ 154.522793] kthread+0x117/0x130 kernel: [ 154.522795] ret_from_fork+0x3a/0x50 kernel: [ 154.522796] kernel: [ 154.522796] -> #1 ((work_completion)(&mddev->del_work)){+.+.}: kernel: [ 154.522800] process_one_work+0x27e/0x5f0 kernel: [ 154.522802] worker_thread+0x2d/0x3d0 kernel: [ 154.522804] kthread+0x117/0x130 kernel: [ 154.522806] ret_from_fork+0x3a/0x50 kernel: [ 154.522807] kernel: [ 154.522807] -> #0 ((wq_completion)md_misc){+.+.}: kernel: [ 154.522813] __lock_acquire+0x1392/0x1690 kernel: [ 154.522816] lock_acquire+0xb4/0x1a0 kernel: [ 154.522818] flush_workqueue+0xab/0x4b0 kernel: [ 154.522821] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522823] __blkdev_get+0xea/0x590 kernel: [ 154.522825] blkdev_get+0x65/0x140 kernel: [ 154.522828] do_dentry_open+0x1d1/0x380 kernel: [ 154.522831] path_openat+0x567/0xcc0 kernel: [ 154.522834] do_filp_open+0x9b/0x110 kernel: [ 154.522836] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522838] do_sys_open+0x57/0x80 kernel: [ 154.522840] do_syscall_64+0x64/0x2b0 kernel: [ 154.522842] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522844] kernel: [ 154.522844] other info that might help us debug this: kernel: [ 154.522844] kernel: [ 154.522846] Chain exists of: kernel: [ 154.522846] (wq_completion)md_misc --> &mddev->reconfig_mutex --> &bdev->bd_mutex kernel: [ 154.522846] kernel: [ 154.522850] Possible unsafe locking scenario: kernel: [ 154.522850] kernel: [ 154.522852] CPU0 CPU1 kernel: [ 154.522853] ---- ---- kernel: [ 154.522854] lock(&bdev->bd_mutex); kernel: [ 154.522856] lock(&mddev->reconfig_mutex); kernel: [ 154.522858] lock(&bdev->bd_mutex); kernel: [ 154.522860] lock((wq_completion)md_misc); kernel: [ 154.522861] kernel: [ 154.522861] *** DEADLOCK *** kernel: [ 154.522861] kernel: [ 154.522864] 1 lock held by mdadm/2482: kernel: [ 154.522865] #0: ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522868] kernel: [ 154.522868] stack backtrace: kernel: [ 154.522873] CPU: 1 PID: 2482 Comm: mdadm Tainted: G O 5.6.0-rc7-lp151.27-default #25 kernel: [ 154.522875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 kernel: [ 154.522878] Call Trace: kernel: [ 154.522881] dump_stack+0x8f/0xcb kernel: [ 154.522884] check_noncircular+0x194/0x1b0 kernel: [ 154.522888] ? __lock_acquire+0x1392/0x1690 kernel: [ 154.522890] __lock_acquire+0x1392/0x1690 kernel: [ 154.522893] lock_acquire+0xb4/0x1a0 kernel: [ 154.522895] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522898] flush_workqueue+0xab/0x4b0 kernel: [ 154.522900] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522905] ? md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522908] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522910] __blkdev_get+0xea/0x590 kernel: [ 154.522912] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522914] blkdev_get+0x65/0x140 kernel: [ 154.522916] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522918] do_dentry_open+0x1d1/0x380 kernel: [ 154.522921] path_openat+0x567/0xcc0 kernel: [ 154.522923] ? __lock_acquire+0x380/0x1690 kernel: [ 154.522926] do_filp_open+0x9b/0x110 kernel: [ 154.522929] ? __alloc_fd+0xe5/0x1f0 kernel: [ 154.522935] ? kmem_cache_alloc+0x28c/0x630 kernel: [ 154.522939] ? do_sys_openat2+0x201/0x2a0 kernel: [ 154.522941] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522944] do_sys_open+0x57/0x80 kernel: [ 154.522946] do_syscall_64+0x64/0x2b0 kernel: [ 154.522948] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522951] RIP: 0033:0x7f98d279d9ae And md_alloc also flushed the same workqueue, but the thing is different here. Because all the paths call md_alloc don't hold bdev->bd_mutex, and the flush is necessary to avoid race condition, so leave it as it is. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Jun 22, 2020
[ Upstream commit f6766ff ] We need to check mddev->del_work before flush workqueu since the purpose of flush is to ensure the previous md is disappeared. Otherwise the similar deadlock appeared if LOCKDEP is enabled, it is due to md_open holds the bdev->bd_mutex before flush workqueue. kernel: [ 154.522645] ====================================================== kernel: [ 154.522647] WARNING: possible circular locking dependency detected kernel: [ 154.522650] 5.6.0-rc7-lp151.27-default #25 Tainted: G O kernel: [ 154.522651] ------------------------------------------------------ kernel: [ 154.522653] mdadm/2482 is trying to acquire lock: kernel: [ 154.522655] ffff888078529128 ((wq_completion)md_misc){+.+.}, at: flush_workqueue+0x84/0x4b0 kernel: [ 154.522673] kernel: [ 154.522673] but task is already holding lock: kernel: [ 154.522675] ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522691] kernel: [ 154.522691] which lock already depends on the new lock. kernel: [ 154.522691] kernel: [ 154.522694] kernel: [ 154.522694] the existing dependency chain (in reverse order) is: kernel: [ 154.522696] kernel: [ 154.522696] -> #4 (&bdev->bd_mutex){+.+.}: kernel: [ 154.522704] __mutex_lock+0x87/0x950 kernel: [ 154.522706] __blkdev_get+0x79/0x590 kernel: [ 154.522708] blkdev_get+0x65/0x140 kernel: [ 154.522709] blkdev_get_by_dev+0x2f/0x40 kernel: [ 154.522716] lock_rdev+0x3d/0x90 [md_mod] kernel: [ 154.522719] md_import_device+0xd6/0x1b0 [md_mod] kernel: [ 154.522723] new_dev_store+0x15e/0x210 [md_mod] kernel: [ 154.522728] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522732] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522735] vfs_write+0xad/0x1a0 kernel: [ 154.522737] ksys_write+0xa4/0xe0 kernel: [ 154.522745] do_syscall_64+0x64/0x2b0 kernel: [ 154.522748] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522749] kernel: [ 154.522749] -> #3 (&mddev->reconfig_mutex){+.+.}: kernel: [ 154.522752] __mutex_lock+0x87/0x950 kernel: [ 154.522756] new_dev_store+0xc9/0x210 [md_mod] kernel: [ 154.522759] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522761] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522763] vfs_write+0xad/0x1a0 kernel: [ 154.522765] ksys_write+0xa4/0xe0 kernel: [ 154.522767] do_syscall_64+0x64/0x2b0 kernel: [ 154.522769] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522770] kernel: [ 154.522770] -> #2 (kn->count#253){++++}: kernel: [ 154.522775] __kernfs_remove+0x253/0x2c0 kernel: [ 154.522778] kernfs_remove+0x1f/0x30 kernel: [ 154.522780] kobject_del+0x28/0x60 kernel: [ 154.522783] mddev_delayed_delete+0x24/0x30 [md_mod] kernel: [ 154.522786] process_one_work+0x2a7/0x5f0 kernel: [ 154.522788] worker_thread+0x2d/0x3d0 kernel: [ 154.522793] kthread+0x117/0x130 kernel: [ 154.522795] ret_from_fork+0x3a/0x50 kernel: [ 154.522796] kernel: [ 154.522796] -> #1 ((work_completion)(&mddev->del_work)){+.+.}: kernel: [ 154.522800] process_one_work+0x27e/0x5f0 kernel: [ 154.522802] worker_thread+0x2d/0x3d0 kernel: [ 154.522804] kthread+0x117/0x130 kernel: [ 154.522806] ret_from_fork+0x3a/0x50 kernel: [ 154.522807] kernel: [ 154.522807] -> #0 ((wq_completion)md_misc){+.+.}: kernel: [ 154.522813] __lock_acquire+0x1392/0x1690 kernel: [ 154.522816] lock_acquire+0xb4/0x1a0 kernel: [ 154.522818] flush_workqueue+0xab/0x4b0 kernel: [ 154.522821] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522823] __blkdev_get+0xea/0x590 kernel: [ 154.522825] blkdev_get+0x65/0x140 kernel: [ 154.522828] do_dentry_open+0x1d1/0x380 kernel: [ 154.522831] path_openat+0x567/0xcc0 kernel: [ 154.522834] do_filp_open+0x9b/0x110 kernel: [ 154.522836] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522838] do_sys_open+0x57/0x80 kernel: [ 154.522840] do_syscall_64+0x64/0x2b0 kernel: [ 154.522842] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522844] kernel: [ 154.522844] other info that might help us debug this: kernel: [ 154.522844] kernel: [ 154.522846] Chain exists of: kernel: [ 154.522846] (wq_completion)md_misc --> &mddev->reconfig_mutex --> &bdev->bd_mutex kernel: [ 154.522846] kernel: [ 154.522850] Possible unsafe locking scenario: kernel: [ 154.522850] kernel: [ 154.522852] CPU0 CPU1 kernel: [ 154.522853] ---- ---- kernel: [ 154.522854] lock(&bdev->bd_mutex); kernel: [ 154.522856] lock(&mddev->reconfig_mutex); kernel: [ 154.522858] lock(&bdev->bd_mutex); kernel: [ 154.522860] lock((wq_completion)md_misc); kernel: [ 154.522861] kernel: [ 154.522861] *** DEADLOCK *** kernel: [ 154.522861] kernel: [ 154.522864] 1 lock held by mdadm/2482: kernel: [ 154.522865] #0: ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522868] kernel: [ 154.522868] stack backtrace: kernel: [ 154.522873] CPU: 1 PID: 2482 Comm: mdadm Tainted: G O 5.6.0-rc7-lp151.27-default #25 kernel: [ 154.522875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 kernel: [ 154.522878] Call Trace: kernel: [ 154.522881] dump_stack+0x8f/0xcb kernel: [ 154.522884] check_noncircular+0x194/0x1b0 kernel: [ 154.522888] ? __lock_acquire+0x1392/0x1690 kernel: [ 154.522890] __lock_acquire+0x1392/0x1690 kernel: [ 154.522893] lock_acquire+0xb4/0x1a0 kernel: [ 154.522895] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522898] flush_workqueue+0xab/0x4b0 kernel: [ 154.522900] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522905] ? md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522908] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522910] __blkdev_get+0xea/0x590 kernel: [ 154.522912] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522914] blkdev_get+0x65/0x140 kernel: [ 154.522916] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522918] do_dentry_open+0x1d1/0x380 kernel: [ 154.522921] path_openat+0x567/0xcc0 kernel: [ 154.522923] ? __lock_acquire+0x380/0x1690 kernel: [ 154.522926] do_filp_open+0x9b/0x110 kernel: [ 154.522929] ? __alloc_fd+0xe5/0x1f0 kernel: [ 154.522935] ? kmem_cache_alloc+0x28c/0x630 kernel: [ 154.522939] ? do_sys_openat2+0x201/0x2a0 kernel: [ 154.522941] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522944] do_sys_open+0x57/0x80 kernel: [ 154.522946] do_syscall_64+0x64/0x2b0 kernel: [ 154.522948] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522951] RIP: 0033:0x7f98d279d9ae And md_alloc also flushed the same workqueue, but the thing is different here. Because all the paths call md_alloc don't hold bdev->bd_mutex, and the flush is necessary to avoid race condition, so leave it as it is. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Jul 1, 2020
[ Upstream commit f6766ff ] We need to check mddev->del_work before flush workqueu since the purpose of flush is to ensure the previous md is disappeared. Otherwise the similar deadlock appeared if LOCKDEP is enabled, it is due to md_open holds the bdev->bd_mutex before flush workqueue. kernel: [ 154.522645] ====================================================== kernel: [ 154.522647] WARNING: possible circular locking dependency detected kernel: [ 154.522650] 5.6.0-rc7-lp151.27-default #25 Tainted: G O kernel: [ 154.522651] ------------------------------------------------------ kernel: [ 154.522653] mdadm/2482 is trying to acquire lock: kernel: [ 154.522655] ffff888078529128 ((wq_completion)md_misc){+.+.}, at: flush_workqueue+0x84/0x4b0 kernel: [ 154.522673] kernel: [ 154.522673] but task is already holding lock: kernel: [ 154.522675] ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522691] kernel: [ 154.522691] which lock already depends on the new lock. kernel: [ 154.522691] kernel: [ 154.522694] kernel: [ 154.522694] the existing dependency chain (in reverse order) is: kernel: [ 154.522696] kernel: [ 154.522696] -> #4 (&bdev->bd_mutex){+.+.}: kernel: [ 154.522704] __mutex_lock+0x87/0x950 kernel: [ 154.522706] __blkdev_get+0x79/0x590 kernel: [ 154.522708] blkdev_get+0x65/0x140 kernel: [ 154.522709] blkdev_get_by_dev+0x2f/0x40 kernel: [ 154.522716] lock_rdev+0x3d/0x90 [md_mod] kernel: [ 154.522719] md_import_device+0xd6/0x1b0 [md_mod] kernel: [ 154.522723] new_dev_store+0x15e/0x210 [md_mod] kernel: [ 154.522728] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522732] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522735] vfs_write+0xad/0x1a0 kernel: [ 154.522737] ksys_write+0xa4/0xe0 kernel: [ 154.522745] do_syscall_64+0x64/0x2b0 kernel: [ 154.522748] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522749] kernel: [ 154.522749] -> #3 (&mddev->reconfig_mutex){+.+.}: kernel: [ 154.522752] __mutex_lock+0x87/0x950 kernel: [ 154.522756] new_dev_store+0xc9/0x210 [md_mod] kernel: [ 154.522759] md_attr_store+0x7a/0xc0 [md_mod] kernel: [ 154.522761] kernfs_fop_write+0x117/0x1b0 kernel: [ 154.522763] vfs_write+0xad/0x1a0 kernel: [ 154.522765] ksys_write+0xa4/0xe0 kernel: [ 154.522767] do_syscall_64+0x64/0x2b0 kernel: [ 154.522769] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522770] kernel: [ 154.522770] -> #2 (kn->count#253){++++}: kernel: [ 154.522775] __kernfs_remove+0x253/0x2c0 kernel: [ 154.522778] kernfs_remove+0x1f/0x30 kernel: [ 154.522780] kobject_del+0x28/0x60 kernel: [ 154.522783] mddev_delayed_delete+0x24/0x30 [md_mod] kernel: [ 154.522786] process_one_work+0x2a7/0x5f0 kernel: [ 154.522788] worker_thread+0x2d/0x3d0 kernel: [ 154.522793] kthread+0x117/0x130 kernel: [ 154.522795] ret_from_fork+0x3a/0x50 kernel: [ 154.522796] kernel: [ 154.522796] -> #1 ((work_completion)(&mddev->del_work)){+.+.}: kernel: [ 154.522800] process_one_work+0x27e/0x5f0 kernel: [ 154.522802] worker_thread+0x2d/0x3d0 kernel: [ 154.522804] kthread+0x117/0x130 kernel: [ 154.522806] ret_from_fork+0x3a/0x50 kernel: [ 154.522807] kernel: [ 154.522807] -> #0 ((wq_completion)md_misc){+.+.}: kernel: [ 154.522813] __lock_acquire+0x1392/0x1690 kernel: [ 154.522816] lock_acquire+0xb4/0x1a0 kernel: [ 154.522818] flush_workqueue+0xab/0x4b0 kernel: [ 154.522821] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522823] __blkdev_get+0xea/0x590 kernel: [ 154.522825] blkdev_get+0x65/0x140 kernel: [ 154.522828] do_dentry_open+0x1d1/0x380 kernel: [ 154.522831] path_openat+0x567/0xcc0 kernel: [ 154.522834] do_filp_open+0x9b/0x110 kernel: [ 154.522836] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522838] do_sys_open+0x57/0x80 kernel: [ 154.522840] do_syscall_64+0x64/0x2b0 kernel: [ 154.522842] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522844] kernel: [ 154.522844] other info that might help us debug this: kernel: [ 154.522844] kernel: [ 154.522846] Chain exists of: kernel: [ 154.522846] (wq_completion)md_misc --> &mddev->reconfig_mutex --> &bdev->bd_mutex kernel: [ 154.522846] kernel: [ 154.522850] Possible unsafe locking scenario: kernel: [ 154.522850] kernel: [ 154.522852] CPU0 CPU1 kernel: [ 154.522853] ---- ---- kernel: [ 154.522854] lock(&bdev->bd_mutex); kernel: [ 154.522856] lock(&mddev->reconfig_mutex); kernel: [ 154.522858] lock(&bdev->bd_mutex); kernel: [ 154.522860] lock((wq_completion)md_misc); kernel: [ 154.522861] kernel: [ 154.522861] *** DEADLOCK *** kernel: [ 154.522861] kernel: [ 154.522864] 1 lock held by mdadm/2482: kernel: [ 154.522865] #0: ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590 kernel: [ 154.522868] kernel: [ 154.522868] stack backtrace: kernel: [ 154.522873] CPU: 1 PID: 2482 Comm: mdadm Tainted: G O 5.6.0-rc7-lp151.27-default #25 kernel: [ 154.522875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 kernel: [ 154.522878] Call Trace: kernel: [ 154.522881] dump_stack+0x8f/0xcb kernel: [ 154.522884] check_noncircular+0x194/0x1b0 kernel: [ 154.522888] ? __lock_acquire+0x1392/0x1690 kernel: [ 154.522890] __lock_acquire+0x1392/0x1690 kernel: [ 154.522893] lock_acquire+0xb4/0x1a0 kernel: [ 154.522895] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522898] flush_workqueue+0xab/0x4b0 kernel: [ 154.522900] ? flush_workqueue+0x84/0x4b0 kernel: [ 154.522905] ? md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522908] md_open+0xb6/0xc0 [md_mod] kernel: [ 154.522910] __blkdev_get+0xea/0x590 kernel: [ 154.522912] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522914] blkdev_get+0x65/0x140 kernel: [ 154.522916] ? bd_acquire+0xc0/0xc0 kernel: [ 154.522918] do_dentry_open+0x1d1/0x380 kernel: [ 154.522921] path_openat+0x567/0xcc0 kernel: [ 154.522923] ? __lock_acquire+0x380/0x1690 kernel: [ 154.522926] do_filp_open+0x9b/0x110 kernel: [ 154.522929] ? __alloc_fd+0xe5/0x1f0 kernel: [ 154.522935] ? kmem_cache_alloc+0x28c/0x630 kernel: [ 154.522939] ? do_sys_openat2+0x201/0x2a0 kernel: [ 154.522941] do_sys_openat2+0x201/0x2a0 kernel: [ 154.522944] do_sys_open+0x57/0x80 kernel: [ 154.522946] do_syscall_64+0x64/0x2b0 kernel: [ 154.522948] entry_SYSCALL_64_after_hwframe+0x49/0xbe kernel: [ 154.522951] RIP: 0033:0x7f98d279d9ae And md_alloc also flushed the same workqueue, but the thing is different here. Because all the paths call md_alloc don't hold bdev->bd_mutex, and the flush is necessary to avoid race condition, so leave it as it is. Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Aug 12, 2020
[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 #10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 #11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 #12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 #13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 #14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 #15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 #16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 #17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 #18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 #19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 #20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 #21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 #22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 #23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 #24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 #25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 #26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Aug 12, 2020
[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 #10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 #11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 #12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 #13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 #14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 #15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 #16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 #17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 #18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 #19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 #20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 #21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 #22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 #23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 #24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 #25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 #26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Aug 12, 2020
[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 #10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 #11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 #12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 #13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 #14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 #15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 #16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 #17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 #18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 #19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 #20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 #21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 #22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 #23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 #24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 #25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 #26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Aug 24, 2020
[ Upstream commit e24c644 ] I compiled with AddressSanitizer and I had these memory leaks while I was using the tep_parse_format function: Direct leak of 28 byte(s) in 4 object(s) allocated from: #0 0x7fb07db49ffe in __interceptor_realloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10dffe) #1 0x7fb07a724228 in extend_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:985 #2 0x7fb07a724c21 in __read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1140 #3 0x7fb07a724f78 in read_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1206 #4 0x7fb07a725191 in __read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1291 #5 0x7fb07a7251df in read_expect_type /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1299 #6 0x7fb07a72e6c8 in process_dynamic_array_len /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:2849 #7 0x7fb07a7304b8 in process_function /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3161 #8 0x7fb07a730900 in process_arg_token /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3207 #9 0x7fb07a727c0b in process_arg /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:1786 #10 0x7fb07a731080 in event_read_print_args /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3285 #11 0x7fb07a731722 in event_read_print /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:3369 #12 0x7fb07a740054 in __tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6335 #13 0x7fb07a74047a in __parse_event /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6389 #14 0x7fb07a740536 in tep_parse_format /home/pduplessis/repo/linux/tools/lib/traceevent/event-parse.c:6431 #15 0x7fb07a785acf in parse_event ../../../src/fs-src/fs.c:251 #16 0x7fb07a785ccd in parse_systems ../../../src/fs-src/fs.c:284 #17 0x7fb07a786fb3 in read_metadata ../../../src/fs-src/fs.c:593 #18 0x7fb07a78760e in ftrace_fs_source_init ../../../src/fs-src/fs.c:727 #19 0x7fb07d90c19c in add_component_with_init_method_data ../../../../src/lib/graph/graph.c:1048 #20 0x7fb07d90c87b in add_source_component_with_initialize_method_data ../../../../src/lib/graph/graph.c:1127 #21 0x7fb07d90c92a in bt_graph_add_source_component ../../../../src/lib/graph/graph.c:1152 #22 0x55db11aa632e in cmd_run_ctx_create_components_from_config_components ../../../src/cli/babeltrace2.c:2252 #23 0x55db11aa6fda in cmd_run_ctx_create_components ../../../src/cli/babeltrace2.c:2347 #24 0x55db11aa780c in cmd_run ../../../src/cli/babeltrace2.c:2461 #25 0x55db11aa8a7d in main ../../../src/cli/babeltrace2.c:2673 #26 0x7fb07d5460b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2) The token variable in the process_dynamic_array_len function is allocated in the read_expect_type function, but is not freed before calling the read_token function. Free the token variable before calling read_token in order to plug the leak. Signed-off-by: Philippe Duplessis-Guindon <pduplessis@efficios.com> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Link: https://lore.kernel.org/linux-trace-devel/20200730150236.5392-1-pduplessis@efficios.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
276f798 to
183f7c2
Compare
anthraxx
pushed a commit
that referenced
this pull request
Sep 26, 2020
Patch series "mm: fix memory to node bad links in sysfs", v3. Sometimes, firmware may expose interleaved memory layout like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] In that case, we can see memory blocks assigned to multiple nodes in sysfs: $ ls -l /sys/devices/system/memory/memory21 total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 -rw-r--r-- 1 root root 65536 Aug 24 05:27 online -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index drwxr-xr-x 2 root root 0 Aug 24 05:27 power -r--r--r-- 1 root root 65536 Aug 24 05:27 removable -rw-r--r-- 1 root root 65536 Aug 24 05:27 state lrwxrwxrwx 1 root root 0 Aug 24 05:25 subsystem -> ../../../../bus/memory -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones The same applies in the node's directory with a memory21 link in both the node1 and node2's directory. This is wrong but doesn't prevent the system to run. However when later, one of these memory blocks is hot-unplugged and then hot-plugged, the system is detecting an inconsistency in the sysfs layout and a BUG_ON() is raised: kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This has been seen on PowerPC LPAR. The root cause of this issue is that when node's memory is registered, the range used can overlap another node's range, thus the memory block is registered to multiple nodes in sysfs. There are two issues here: (a) The sysfs memory and node's layouts are broken due to these multiple links (b) The link errors in link_mem_sections() should not lead to a system panic. To address (a) register_mem_sect_under_node should not rely on the system state to detect whether the link operation is triggered by a hot plug operation or not. This is addressed by the patches 1 and 2 of this series. Issue (b) will be addressed separately. This patch (of 2): The memmap_context enum is used to detect whether a memory operation is due to a hot-add operation or happening at boot time. Make it general to the hotplug operation and rename it as meminit_context. There is no functional change introduced by this patch Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J . Wysocki" <rafael@kernel.org> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Sep 26, 2020
In register_mem_sect_under_node() the system_state's value is checked to detect whether the call is made during boot time or during an hot-plug operation. Unfortunately, that check against SYSTEM_BOOTING is wrong because regular memory is registered at SYSTEM_SCHEDULING state. In addition, memory hot-plug operation can be triggered at this system state by the ACPI [1]. So checking against the system state is not enough. The consequence is that on system with interleaved node's ranges like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] This can be seen on PowerPC LPAR after multiple memory hot-plug and hot-unplug operations are done. At the next reboot the node's memory ranges can be interleaved and since the call to link_mem_sections() is made in topology_init() while the system is in the SYSTEM_SCHEDULING state, the node's id is not checked, and the sections registered to multiple nodes: $ ls -l /sys/devices/system/memory/memory21/node* total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 In that case, the system is able to boot but if later one of theses memory blocks is hot-unplugged and then hot-plugged, the sysfs inconsistency is detected and this is triggering a BUG_ON(): kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This patch addresses the root cause by not relying on the system_state value to detect whether the call is due to a hot-plug operation. An extra parameter is added to link_mem_sections() detailing whether the operation is due to a hot-plug operation. [1] According to Oscar Salvador, using this qemu command line, ACPI memory hotplug operations are raised at SYSTEM_SCHEDULING state: $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \ -m size=$MEM,slots=255,maxmem=4294967296k \ -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \ -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \ -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \ -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \ -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \ -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \ -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \ -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \ Fixes: 4fbce63 ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Oct 8, 2020
[ Upstream commit 96298f6 ] According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state. The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change. === Before === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 #32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65 > HCI Event: Number of Completed Packets (0x13) plen 5 #33 Num handles: 1 Handle: 256 Count: 1 ... === After === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 #32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 #33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #34 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #35 Num handles: 1 Handle: 256 Count: 1 ... Signed-off-by: Howard Chung <howardchung@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Oct 8, 2020
[ Upstream commit 96298f6 ] According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state. The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change. === Before === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 #32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65 > HCI Event: Number of Completed Packets (0x13) plen 5 #33 Num handles: 1 Handle: 256 Count: 1 ... === After === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 #32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 #33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #34 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #35 Num handles: 1 Handle: 256 Count: 1 ... Signed-off-by: Howard Chung <howardchung@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Oct 8, 2020
commit c1d0da8 upstream. Patch series "mm: fix memory to node bad links in sysfs", v3. Sometimes, firmware may expose interleaved memory layout like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] In that case, we can see memory blocks assigned to multiple nodes in sysfs: $ ls -l /sys/devices/system/memory/memory21 total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 -rw-r--r-- 1 root root 65536 Aug 24 05:27 online -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index drwxr-xr-x 2 root root 0 Aug 24 05:27 power -r--r--r-- 1 root root 65536 Aug 24 05:27 removable -rw-r--r-- 1 root root 65536 Aug 24 05:27 state lrwxrwxrwx 1 root root 0 Aug 24 05:25 subsystem -> ../../../../bus/memory -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones The same applies in the node's directory with a memory21 link in both the node1 and node2's directory. This is wrong but doesn't prevent the system to run. However when later, one of these memory blocks is hot-unplugged and then hot-plugged, the system is detecting an inconsistency in the sysfs layout and a BUG_ON() is raised: kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This has been seen on PowerPC LPAR. The root cause of this issue is that when node's memory is registered, the range used can overlap another node's range, thus the memory block is registered to multiple nodes in sysfs. There are two issues here: (a) The sysfs memory and node's layouts are broken due to these multiple links (b) The link errors in link_mem_sections() should not lead to a system panic. To address (a) register_mem_sect_under_node should not rely on the system state to detect whether the link operation is triggered by a hot plug operation or not. This is addressed by the patches 1 and 2 of this series. Issue (b) will be addressed separately. This patch (of 2): The memmap_context enum is used to detect whether a memory operation is due to a hot-add operation or happening at boot time. Make it general to the hotplug operation and rename it as meminit_context. There is no functional change introduced by this patch Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J . Wysocki" <rafael@kernel.org> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Oct 8, 2020
commit f85086f upstream. In register_mem_sect_under_node() the system_state's value is checked to detect whether the call is made during boot time or during an hot-plug operation. Unfortunately, that check against SYSTEM_BOOTING is wrong because regular memory is registered at SYSTEM_SCHEDULING state. In addition, memory hot-plug operation can be triggered at this system state by the ACPI [1]. So checking against the system state is not enough. The consequence is that on system with interleaved node's ranges like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] This can be seen on PowerPC LPAR after multiple memory hot-plug and hot-unplug operations are done. At the next reboot the node's memory ranges can be interleaved and since the call to link_mem_sections() is made in topology_init() while the system is in the SYSTEM_SCHEDULING state, the node's id is not checked, and the sections registered to multiple nodes: $ ls -l /sys/devices/system/memory/memory21/node* total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 In that case, the system is able to boot but if later one of theses memory blocks is hot-unplugged and then hot-plugged, the sysfs inconsistency is detected and this is triggering a BUG_ON(): kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This patch addresses the root cause by not relying on the system_state value to detect whether the call is due to a hot-plug operation. An extra parameter is added to link_mem_sections() detailing whether the operation is due to a hot-plug operation. [1] According to Oscar Salvador, using this qemu command line, ACPI memory hotplug operations are raised at SYSTEM_SCHEDULING state: $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \ -m size=$MEM,slots=255,maxmem=4294967296k \ -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \ -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \ -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \ -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \ -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \ -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \ -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \ -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \ Fixes: 4fbce63 ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Oct 8, 2020
[ Upstream commit 96298f6 ] According to Core Spec Version 5.2 | Vol 3, Part A 6.1.5, the incoming L2CAP_ConfigReq should be handled during OPEN state. The section below shows the btmon trace when running L2CAP/COS/CFD/BV-12-C before and after this change. === Before === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ...... < ACL Data TX: Handle 256 flags 0x00 dlen 14 #32 L2CAP: Command Reject (0x01) ident 3 len 6 Reason: Invalid CID in request (0x0002) Destination CID: 64 Source CID: 65 > HCI Event: Number of Completed Packets (0x13) plen 5 #33 Num handles: 1 Handle: 256 Count: 1 ... === After === ... > ACL Data RX: Handle 256 flags 0x02 dlen 12 #22 L2CAP: Connection Request (0x02) ident 2 len 4 PSM: 1 (0x0001) Source CID: 65 < ACL Data TX: Handle 256 flags 0x00 dlen 16 #23 L2CAP: Connection Response (0x03) ident 2 len 8 Destination CID: 64 Source CID: 65 Result: Connection successful (0x0000) Status: No further information available (0x0000) < ACL Data TX: Handle 256 flags 0x00 dlen 12 #24 L2CAP: Configure Request (0x04) ident 2 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #25 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #26 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 16 #27 L2CAP: Configure Request (0x04) ident 3 len 8 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 .. < ACL Data TX: Handle 256 flags 0x00 dlen 18 #28 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 > HCI Event: Number of Completed Packets (0x13) plen 5 #29 Num handles: 1 Handle: 256 Count: 1 > ACL Data RX: Handle 256 flags 0x02 dlen 14 #30 L2CAP: Configure Response (0x05) ident 2 len 6 Source CID: 64 Flags: 0x0000 Result: Success (0x0000) > ACL Data RX: Handle 256 flags 0x02 dlen 20 #31 L2CAP: Configure Request (0x04) ident 3 len 12 Destination CID: 64 Flags: 0x0000 Option: Unknown (0x10) [hint] 01 00 91 02 11 11 ..... < ACL Data TX: Handle 256 flags 0x00 dlen 18 #32 L2CAP: Configure Response (0x05) ident 3 len 10 Source CID: 65 Flags: 0x0000 Result: Success (0x0000) Option: Maximum Transmission Unit (0x01) [mandatory] MTU: 672 < ACL Data TX: Handle 256 flags 0x00 dlen 12 #33 L2CAP: Configure Request (0x04) ident 3 len 4 Destination CID: 65 Flags: 0x0000 > HCI Event: Number of Completed Packets (0x13) plen 5 #34 Num handles: 1 Handle: 256 Count: 1 > HCI Event: Number of Completed Packets (0x13) plen 5 #35 Num handles: 1 Handle: 256 Count: 1 ... Signed-off-by: Howard Chung <howardchung@google.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Oct 8, 2020
commit c1d0da8 upstream. Patch series "mm: fix memory to node bad links in sysfs", v3. Sometimes, firmware may expose interleaved memory layout like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] In that case, we can see memory blocks assigned to multiple nodes in sysfs: $ ls -l /sys/devices/system/memory/memory21 total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 -rw-r--r-- 1 root root 65536 Aug 24 05:27 online -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index drwxr-xr-x 2 root root 0 Aug 24 05:27 power -r--r--r-- 1 root root 65536 Aug 24 05:27 removable -rw-r--r-- 1 root root 65536 Aug 24 05:27 state lrwxrwxrwx 1 root root 0 Aug 24 05:25 subsystem -> ../../../../bus/memory -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones The same applies in the node's directory with a memory21 link in both the node1 and node2's directory. This is wrong but doesn't prevent the system to run. However when later, one of these memory blocks is hot-unplugged and then hot-plugged, the system is detecting an inconsistency in the sysfs layout and a BUG_ON() is raised: kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This has been seen on PowerPC LPAR. The root cause of this issue is that when node's memory is registered, the range used can overlap another node's range, thus the memory block is registered to multiple nodes in sysfs. There are two issues here: (a) The sysfs memory and node's layouts are broken due to these multiple links (b) The link errors in link_mem_sections() should not lead to a system panic. To address (a) register_mem_sect_under_node should not rely on the system state to detect whether the link operation is triggered by a hot plug operation or not. This is addressed by the patches 1 and 2 of this series. Issue (b) will be addressed separately. This patch (of 2): The memmap_context enum is used to detect whether a memory operation is due to a hot-add operation or happening at boot time. Make it general to the hotplug operation and rename it as meminit_context. There is no functional change introduced by this patch Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J . Wysocki" <rafael@kernel.org> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Oct 8, 2020
commit f85086f upstream. In register_mem_sect_under_node() the system_state's value is checked to detect whether the call is made during boot time or during an hot-plug operation. Unfortunately, that check against SYSTEM_BOOTING is wrong because regular memory is registered at SYSTEM_SCHEDULING state. In addition, memory hot-plug operation can be triggered at this system state by the ACPI [1]. So checking against the system state is not enough. The consequence is that on system with interleaved node's ranges like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] This can be seen on PowerPC LPAR after multiple memory hot-plug and hot-unplug operations are done. At the next reboot the node's memory ranges can be interleaved and since the call to link_mem_sections() is made in topology_init() while the system is in the SYSTEM_SCHEDULING state, the node's id is not checked, and the sections registered to multiple nodes: $ ls -l /sys/devices/system/memory/memory21/node* total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 In that case, the system is able to boot but if later one of theses memory blocks is hot-unplugged and then hot-plugged, the sysfs inconsistency is detected and this is triggering a BUG_ON(): kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This patch addresses the root cause by not relying on the system_state value to detect whether the call is due to a hot-plug operation. An extra parameter is added to link_mem_sections() detailing whether the operation is due to a hot-plug operation. [1] According to Oscar Salvador, using this qemu command line, ACPI memory hotplug operations are raised at SYSTEM_SCHEDULING state: $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \ -m size=$MEM,slots=255,maxmem=4294967296k \ -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \ -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \ -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \ -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \ -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \ -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \ -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \ -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \ Fixes: 4fbce63 ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Oct 8, 2020
commit c1d0da8 upstream. Patch series "mm: fix memory to node bad links in sysfs", v3. Sometimes, firmware may expose interleaved memory layout like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] In that case, we can see memory blocks assigned to multiple nodes in sysfs: $ ls -l /sys/devices/system/memory/memory21 total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 -rw-r--r-- 1 root root 65536 Aug 24 05:27 online -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_device -r--r--r-- 1 root root 65536 Aug 24 05:27 phys_index drwxr-xr-x 2 root root 0 Aug 24 05:27 power -r--r--r-- 1 root root 65536 Aug 24 05:27 removable -rw-r--r-- 1 root root 65536 Aug 24 05:27 state lrwxrwxrwx 1 root root 0 Aug 24 05:25 subsystem -> ../../../../bus/memory -rw-r--r-- 1 root root 65536 Aug 24 05:25 uevent -r--r--r-- 1 root root 65536 Aug 24 05:27 valid_zones The same applies in the node's directory with a memory21 link in both the node1 and node2's directory. This is wrong but doesn't prevent the system to run. However when later, one of these memory blocks is hot-unplugged and then hot-plugged, the system is detecting an inconsistency in the sysfs layout and a BUG_ON() is raised: kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This has been seen on PowerPC LPAR. The root cause of this issue is that when node's memory is registered, the range used can overlap another node's range, thus the memory block is registered to multiple nodes in sysfs. There are two issues here: (a) The sysfs memory and node's layouts are broken due to these multiple links (b) The link errors in link_mem_sections() should not lead to a system panic. To address (a) register_mem_sect_under_node should not rely on the system state to detect whether the link operation is triggered by a hot plug operation or not. This is addressed by the patches 1 and 2 of this series. Issue (b) will be addressed separately. This patch (of 2): The memmap_context enum is used to detect whether a memory operation is due to a hot-add operation or happening at boot time. Make it general to the hotplug operation and rename it as meminit_context. There is no functional change introduced by this patch Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J . Wysocki" <rafael@kernel.org> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-1-ldufour@linux.ibm.com Link: https://lkml.kernel.org/r/20200915132624.9723-1-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Oct 8, 2020
commit f85086f upstream. In register_mem_sect_under_node() the system_state's value is checked to detect whether the call is made during boot time or during an hot-plug operation. Unfortunately, that check against SYSTEM_BOOTING is wrong because regular memory is registered at SYSTEM_SCHEDULING state. In addition, memory hot-plug operation can be triggered at this system state by the ACPI [1]. So checking against the system state is not enough. The consequence is that on system with interleaved node's ranges like this: Early memory node ranges node 1: [mem 0x0000000000000000-0x000000011fffffff] node 2: [mem 0x0000000120000000-0x000000014fffffff] node 1: [mem 0x0000000150000000-0x00000001ffffffff] node 0: [mem 0x0000000200000000-0x000000048fffffff] node 2: [mem 0x0000000490000000-0x00000007ffffffff] This can be seen on PowerPC LPAR after multiple memory hot-plug and hot-unplug operations are done. At the next reboot the node's memory ranges can be interleaved and since the call to link_mem_sections() is made in topology_init() while the system is in the SYSTEM_SCHEDULING state, the node's id is not checked, and the sections registered to multiple nodes: $ ls -l /sys/devices/system/memory/memory21/node* total 0 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node1 -> ../../node/node1 lrwxrwxrwx 1 root root 0 Aug 24 05:27 node2 -> ../../node/node2 In that case, the system is able to boot but if later one of theses memory blocks is hot-unplugged and then hot-plugged, the sysfs inconsistency is detected and this is triggering a BUG_ON(): kernel BUG at /Users/laurent/src/linux-ppc/mm/memory_hotplug.c:1084! Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: rpadlpar_io rpaphp pseries_rng rng_core vmx_crypto gf128mul binfmt_misc ip_tables x_tables xfs libcrc32c crc32c_vpmsum autofs4 CPU: 8 PID: 10256 Comm: drmgr Not tainted 5.9.0-rc1+ #25 Call Trace: add_memory_resource+0x23c/0x340 (unreliable) __add_memory+0x5c/0xf0 dlpar_add_lmb+0x1b4/0x500 dlpar_memory+0x1f8/0xb80 handle_dlpar_errorlog+0xc0/0x190 dlpar_store+0x198/0x4a0 kobj_attr_store+0x30/0x50 sysfs_kf_write+0x64/0x90 kernfs_fop_write+0x1b0/0x290 vfs_write+0xe8/0x290 ksys_write+0xdc/0x130 system_call_exception+0x160/0x270 system_call_common+0xf0/0x27c This patch addresses the root cause by not relying on the system_state value to detect whether the call is due to a hot-plug operation. An extra parameter is added to link_mem_sections() detailing whether the operation is due to a hot-plug operation. [1] According to Oscar Salvador, using this qemu command line, ACPI memory hotplug operations are raised at SYSTEM_SCHEDULING state: $QEMU -enable-kvm -machine pc -smp 4,sockets=4,cores=1,threads=1 -cpu host -monitor pty \ -m size=$MEM,slots=255,maxmem=4294967296k \ -numa node,nodeid=0,cpus=0-3,mem=512 -numa node,nodeid=1,mem=512 \ -object memory-backend-ram,id=memdimm0,size=134217728 -device pc-dimm,node=0,memdev=memdimm0,id=dimm0,slot=0 \ -object memory-backend-ram,id=memdimm1,size=134217728 -device pc-dimm,node=0,memdev=memdimm1,id=dimm1,slot=1 \ -object memory-backend-ram,id=memdimm2,size=134217728 -device pc-dimm,node=0,memdev=memdimm2,id=dimm2,slot=2 \ -object memory-backend-ram,id=memdimm3,size=134217728 -device pc-dimm,node=0,memdev=memdimm3,id=dimm3,slot=3 \ -object memory-backend-ram,id=memdimm4,size=134217728 -device pc-dimm,node=1,memdev=memdimm4,id=dimm4,slot=4 \ -object memory-backend-ram,id=memdimm5,size=134217728 -device pc-dimm,node=1,memdev=memdimm5,id=dimm5,slot=5 \ -object memory-backend-ram,id=memdimm6,size=134217728 -device pc-dimm,node=1,memdev=memdimm6,id=dimm6,slot=6 \ Fixes: 4fbce63 ("mm/memory_hotplug.c: make register_mem_sect_under_node() a callback of walk_memory_range()") Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Nathan Lynch <nathanl@linux.ibm.com> Cc: Scott Cheloha <cheloha@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/20200915094143.79181-3-ldufour@linux.ibm.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Feb 4, 2021
commit 27af8e2 upstream. We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ #25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Feb 4, 2021
commit 27af8e2 upstream. We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ #25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Feb 4, 2021
commit 27af8e2 upstream. We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ #25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Feb 4, 2021
commit 27af8e2 upstream. We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ #25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anthraxx
pushed a commit
that referenced
this pull request
Feb 7, 2021
We have the following potential deadlock condition: ======================================================== WARNING: possible irq lock inversion dependency detected 5.10.0-rc2+ #25 Not tainted -------------------------------------------------------- swapper/3/0 just changed the state of lock: ffff8880063bd618 (&host->lock){-...}-{2:2}, at: ata_bmdma_interrupt+0x27/0x200 but this lock took another, HARDIRQ-READ-unsafe lock in the past: (&trig->leddev_list_lock){.+.?}-{2:2} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&trig->leddev_list_lock); local_irq_disable(); lock(&host->lock); lock(&trig->leddev_list_lock); <Interrupt> lock(&host->lock); *** DEADLOCK *** no locks held by swapper/3/0. the shortest dependencies between 2nd lock and 1st lock: -> (&trig->leddev_list_lock){.+.?}-{2:2} ops: 46 { HARDIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 IN-SOFTIRQ-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 kbd_bh+0x9e/0xc0 tasklet_action_common.constprop.0+0xe9/0x100 tasklet_action+0x22/0x30 __do_softirq+0xcc/0x46d run_ksoftirqd+0x3f/0x70 smpboot_thread_fn+0x116/0x1f0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 SOFTIRQ-ON-R at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 INITIAL READ USE at: lock_acquire+0x15f/0x420 _raw_read_lock+0x42/0x90 led_trigger_event+0x2b/0x70 rfkill_global_led_trigger_worker+0x94/0xb0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83da4c00>] __key.0+0x0/0x10 ... acquired at: _raw_read_lock+0x42/0x90 led_trigger_blink_oneshot+0x3b/0x90 ledtrig_disk_activity+0x3c/0xa0 ata_qc_complete+0x26/0x450 ata_do_link_abort+0xa3/0xe0 ata_port_freeze+0x2e/0x40 ata_hsm_qc_complete+0x94/0xa0 ata_sff_hsm_move+0x177/0x7a0 ata_sff_pio_task+0xc7/0x1b0 process_one_work+0x240/0x560 worker_thread+0x58/0x3d0 kthread+0x151/0x170 ret_from_fork+0x1f/0x30 -> (&host->lock){-...}-{2:2} ops: 69 { IN-HARDIRQ-W at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab INITIAL USE at: lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_dev_init+0x54/0xe0 ata_link_init+0x8b/0xd0 ata_port_alloc+0x1f1/0x210 ata_host_alloc+0xf1/0x130 ata_host_alloc_pinfo+0x14/0xb0 ata_pci_sff_prepare_host+0x41/0xa0 ata_pci_bmdma_prepare_host+0x14/0x30 piix_init_one+0x21f/0x600 local_pci_probe+0x48/0x80 pci_device_probe+0x105/0x1c0 really_probe+0x221/0x490 driver_probe_device+0xe9/0x160 device_driver_attach+0xb2/0xc0 __driver_attach+0x91/0x150 bus_for_each_dev+0x81/0xc0 driver_attach+0x1e/0x20 bus_add_driver+0x138/0x1f0 driver_register+0x91/0xf0 __pci_register_driver+0x73/0x80 piix_init+0x1e/0x2e do_one_initcall+0x5f/0x2d0 kernel_init_freeable+0x26f/0x2cf kernel_init+0xe/0x113 ret_from_fork+0x1f/0x30 } ... key at: [<ffffffff83d9fdc0>] __key.6+0x0/0x10 ... acquired at: __lock_acquire+0x9da/0x2370 lock_acquire+0x15f/0x420 _raw_spin_lock_irqsave+0x52/0xa0 ata_bmdma_interrupt+0x27/0x200 __handle_irq_event_percpu+0xd5/0x2b0 handle_irq_event+0x57/0xb0 handle_edge_irq+0x8c/0x230 asm_call_irq_on_stack+0xf/0x20 common_interrupt+0x100/0x1c0 asm_common_interrupt+0x1e/0x40 native_safe_halt+0xe/0x10 arch_cpu_idle+0x15/0x20 default_idle_call+0x59/0x1c0 do_idle+0x22c/0x2c0 cpu_startup_entry+0x20/0x30 start_secondary+0x11d/0x150 secondary_startup_64_no_verify+0xa6/0xab This lockdep splat is reported after: commit e918188 ("locking: More accurate annotations for read_lock()") To clarify: - read-locks are recursive only in interrupt context (when in_interrupt() returns true) - after acquiring host->lock in CPU1, another cpu (i.e. CPU2) may call write_lock(&trig->leddev_list_lock) that would be blocked by CPU0 that holds trig->leddev_list_lock in read-mode - when CPU1 (ata_ac_complete()) tries to read-lock trig->leddev_list_lock, it would be blocked by the write-lock waiter on CPU2 (because we are not in interrupt context, so the read-lock is not recursive) - at this point if an interrupt happens on CPU0 and ata_bmdma_interrupt() is executed it will try to acquire host->lock, that is held by CPU1, that is currently blocked by CPU2, so: * CPU0 blocked by CPU1 * CPU1 blocked by CPU2 * CPU2 blocked by CPU0 *** DEADLOCK *** The deadlock scenario is better represented by the following schema (thanks to Boqun Feng <boqun.feng@gmail.com> for the schema and the detailed explanation of the deadlock condition): CPU 0: CPU 1: CPU 2: ----- ----- ----- led_trigger_event(): read_lock(&trig->leddev_list_lock); <workqueue> ata_hsm_qc_complete(): spin_lock_irqsave(&host->lock); write_lock(&trig->leddev_list_lock); ata_port_freeze(): ata_do_link_abort(): ata_qc_complete(): ledtrig_disk_activity(): led_trigger_blink_oneshot(): read_lock(&trig->leddev_list_lock); // ^ not in in_interrupt() context, so could get blocked by CPU 2 <interrupt> ata_bmdma_interrupt(): spin_lock_irqsave(&host->lock); Fix by using read_lock_irqsave/irqrestore() in led_trigger_event(), so that no interrupt can happen in between, preventing the deadlock condition. Apply the same change to led_trigger_blink_setup() as well, since the same deadlock scenario can also happen in power_supply_update_bat_leds() -> led_trigger_blink() -> led_trigger_blink_setup() (workqueue context), and potentially prevent other similar usages. Link: https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/ Fixes: eb25cb9 ("leds: convert IDE trigger to common disk trigger") Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Pavel Machek <pavel@ucw.cz>
e38da9d to
95574b9
Compare
anthraxx
pushed a commit
that referenced
this pull request
May 3, 2021
After commit 997acaf (lockdep: report broken irq restoration), the guest splatting below during boot: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 1 PID: 169 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x26/0x30 Modules linked in: hid_generic usbhid hid CPU: 1 PID: 169 Comm: systemd-udevd Not tainted 5.11.0+ #25 RIP: 0010:warn_bogus_irq_restore+0x26/0x30 Call Trace: kvm_wait+0x76/0x90 __pv_queued_spin_lock_slowpath+0x285/0x2e0 do_raw_spin_lock+0xc9/0xd0 _raw_spin_lock+0x59/0x70 lockref_get_not_dead+0xf/0x50 __legitimize_path+0x31/0x60 legitimize_root+0x37/0x50 try_to_unlazy_next+0x7f/0x1d0 lookup_fast+0xb0/0x170 path_openat+0x165/0x9b0 do_filp_open+0x99/0x110 do_sys_openat2+0x1f1/0x2e0 do_sys_open+0x5c/0x80 __x64_sys_open+0x21/0x30 do_syscall_64+0x32/0x50 entry_SYSCALL_64_after_hwframe+0x44/0xae The new consistency checking, expects local_irq_save() and local_irq_restore() to be paired and sanely nested, and therefore expects local_irq_restore() to be called with irqs disabled. The irqflags handling in kvm_wait() which ends up doing: local_irq_save(flags); safe_halt(); local_irq_restore(flags); instead triggers it. This patch fixes it by using local_irq_disable()/enable() directly. Cc: Thomas Gleixner <tglx@linutronix.de> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Message-Id: <1615791328-2735-1-git-send-email-wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
anthraxx
pushed a commit
that referenced
this pull request
May 12, 2021
commit f02d408 upstream. Commit a6dcfe0 ("scsi: qla2xxx: Limit interrupt vectors to number of CPUs") lowers the number of allocated MSI-X vectors to the number of CPUs. That breaks vector allocation assumptions in qla83xx_iospace_config(), qla24xx_enable_msix() and qla2x00_iospace_config(). Either of the functions computes maximum number of qpairs as: ha->max_qpairs = ha->msix_count - 1 (MB interrupt) - 1 (default response queue) - 1 (ATIO, in dual or pure target mode) max_qpairs is set to zero in case of two CPUs and initiator mode. The number is then used to allocate ha->queue_pair_map inside qla2x00_alloc_queues(). No allocation happens and ha->queue_pair_map is left NULL but the driver thinks there are queue pairs available. qla2xxx_queuecommand() tries to find a qpair in the map and crashes: if (ha->mqenable) { uint32_t tag; uint16_t hwq; struct qla_qpair *qpair = NULL; tag = blk_mq_unique_tag(cmd->request); hwq = blk_mq_unique_tag_to_hwq(tag); qpair = ha->queue_pair_map[hwq]; # <- HERE if (qpair) return qla2xxx_mqueuecommand(host, cmd, qpair); } BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 72 Comm: kworker/u4:3 Tainted: G W 5.10.0-rc1+ #25 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: scsi_wq_7 fc_scsi_scan_rport [scsi_transport_fc] RIP: 0010:qla2xxx_queuecommand+0x16b/0x3f0 [qla2xxx] Call Trace: scsi_queue_rq+0x58c/0xa60 blk_mq_dispatch_rq_list+0x2b7/0x6f0 ? __sbitmap_get_word+0x2a/0x80 __blk_mq_sched_dispatch_requests+0xb8/0x170 blk_mq_sched_dispatch_requests+0x2b/0x50 __blk_mq_run_hw_queue+0x49/0xb0 __blk_mq_delay_run_hw_queue+0xfb/0x150 blk_mq_sched_insert_request+0xbe/0x110 blk_execute_rq+0x45/0x70 __scsi_execute+0x10e/0x250 scsi_probe_and_add_lun+0x228/0xda0 __scsi_scan_target+0xf4/0x620 ? __pm_runtime_resume+0x4f/0x70 scsi_scan_target+0x100/0x110 fc_scsi_scan_rport+0xa1/0xb0 [scsi_transport_fc] process_one_work+0x1ea/0x3b0 worker_thread+0x28/0x3b0 ? process_one_work+0x3b0/0x3b0 kthread+0x112/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30 The driver should allocate enough vectors to provide every CPU it's own HW queue and still handle reserved (MB, RSP, ATIO) interrupts. The change fixes the crash on dual core VM and prevents unbalanced QP allocation where nr_hw_queues is two less than the number of CPUs. Link: https://lore.kernel.org/r/20210412165740.39318-1-r.bolshakov@yadro.com Fixes: a6dcfe0 ("scsi: qla2xxx: Limit interrupt vectors to number of CPUs") Cc: Daniel Wagner <daniel.wagner@suse.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Cc: Quinn Tran <qutran@marvell.com> Cc: Nilesh Javali <njavali@marvell.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: stable@vger.kernel.org # 5.11+ Reported-by: Aleksandr Volkov <a.y.volkov@yadro.com> Reported-by: Aleksandr Miloserdov <a.miloserdov@yadro.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
anthraxx
pushed a commit
that referenced
this pull request
May 25, 2021
commit f02d408 upstream. Commit a6dcfe0 ("scsi: qla2xxx: Limit interrupt vectors to number of CPUs") lowers the number of allocated MSI-X vectors to the number of CPUs. That breaks vector allocation assumptions in qla83xx_iospace_config(), qla24xx_enable_msix() and qla2x00_iospace_config(). Either of the functions computes maximum number of qpairs as: ha->max_qpairs = ha->msix_count - 1 (MB interrupt) - 1 (default response queue) - 1 (ATIO, in dual or pure target mode) max_qpairs is set to zero in case of two CPUs and initiator mode. The number is then used to allocate ha->queue_pair_map inside qla2x00_alloc_queues(). No allocation happens and ha->queue_pair_map is left NULL but the driver thinks there are queue pairs available. qla2xxx_queuecommand() tries to find a qpair in the map and crashes: if (ha->mqenable) { uint32_t tag; uint16_t hwq; struct qla_qpair *qpair = NULL; tag = blk_mq_unique_tag(cmd->request); hwq = blk_mq_unique_tag_to_hwq(tag); qpair = ha->queue_pair_map[hwq]; # <- HERE if (qpair) return qla2xxx_mqueuecommand(host, cmd, qpair); } BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 0 PID: 72 Comm: kworker/u4:3 Tainted: G W 5.10.0-rc1+ #25 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 Workqueue: scsi_wq_7 fc_scsi_scan_rport [scsi_transport_fc] RIP: 0010:qla2xxx_queuecommand+0x16b/0x3f0 [qla2xxx] Call Trace: scsi_queue_rq+0x58c/0xa60 blk_mq_dispatch_rq_list+0x2b7/0x6f0 ? __sbitmap_get_word+0x2a/0x80 __blk_mq_sched_dispatch_requests+0xb8/0x170 blk_mq_sched_dispatch_requests+0x2b/0x50 __blk_mq_run_hw_queue+0x49/0xb0 __blk_mq_delay_run_hw_queue+0xfb/0x150 blk_mq_sched_insert_request+0xbe/0x110 blk_execute_rq+0x45/0x70 __scsi_execute+0x10e/0x250 scsi_probe_and_add_lun+0x228/0xda0 __scsi_scan_target+0xf4/0x620 ? __pm_runtime_resume+0x4f/0x70 scsi_scan_target+0x100/0x110 fc_scsi_scan_rport+0xa1/0xb0 [scsi_transport_fc] process_one_work+0x1ea/0x3b0 worker_thread+0x28/0x3b0 ? process_one_work+0x3b0/0x3b0 kthread+0x112/0x130 ? kthread_park+0x80/0x80 ret_from_fork+0x22/0x30 The driver should allocate enough vectors to provide every CPU it's own HW queue and still handle reserved (MB, RSP, ATIO) interrupts. The change fixes the crash on dual core VM and prevents unbalanced QP allocation where nr_hw_queues is two less than the number of CPUs. Link: https://lore.kernel.org/r/20210412165740.39318-1-r.bolshakov@yadro.com Fixes: a6dcfe0 ("scsi: qla2xxx: Limit interrupt vectors to number of CPUs") Cc: Daniel Wagner <daniel.wagner@suse.com> Cc: Himanshu Madhani <himanshu.madhani@oracle.com> Cc: Quinn Tran <qutran@marvell.com> Cc: Nilesh Javali <njavali@marvell.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: stable@vger.kernel.org # 5.11+ Reported-by: Aleksandr Volkov <a.y.volkov@yadro.com> Reported-by: Aleksandr Miloserdov <a.miloserdov@yadro.com> Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is modified from Brad Spengler/PaX Team's code in the last public patch of grsecurity/PaX based on my understanding of the code. Changes or omissions from the original code are mine and don't reflect the original grsecurity/PaX code. Trusted Path Execution (TPE) will restrict certain users so that they are only able to execute files in root-owned directories, writable only by root. This makes it significantly harder for an attacker to execute their own code.
anthraxx
pushed a commit
that referenced
this pull request
Nov 18, 2021
[ Upstream commit d412137 ] The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU #24 skipping offline CPU #25 skipping offline CPU #26 skipping offline CPU #27 skipping offline CPU #28 skipping offline CPU #29 skipping offline CPU #30 skipping offline CPU #31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211021114132.8196-2-jolsa@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Nov 18, 2021
[ Upstream commit d412137 ] The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU #24 skipping offline CPU #25 skipping offline CPU #26 skipping offline CPU #27 skipping offline CPU #28 skipping offline CPU #29 skipping offline CPU #30 skipping offline CPU #31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211021114132.8196-2-jolsa@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Nov 18, 2021
[ Upstream commit d412137 ] The perf_buffer fails on system with offline cpus: # test_progs -t perf_buffer test_perf_buffer:PASS:nr_cpus 0 nsec test_perf_buffer:PASS:nr_on_cpus 0 nsec test_perf_buffer:PASS:skel_load 0 nsec test_perf_buffer:PASS:attach_kprobe 0 nsec test_perf_buffer:PASS:perf_buf__new 0 nsec test_perf_buffer:PASS:epoll_fd 0 nsec skipping offline CPU #24 skipping offline CPU #25 skipping offline CPU #26 skipping offline CPU #27 skipping offline CPU #28 skipping offline CPU #29 skipping offline CPU #30 skipping offline CPU #31 test_perf_buffer:PASS:perf_buffer__poll 0 nsec test_perf_buffer:PASS:seen_cpu_cnt 0 nsec test_perf_buffer:FAIL:buf_cnt got 24, expected 32 Summary: 0/0 PASSED, 0 SKIPPED, 1 FAILED Changing the test to check online cpus instead of possible. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20211021114132.8196-2-jolsa@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Jan 3, 2022
[ Upstream commit 9695b7d ] After commit d3256ef ("veth: allow enabling NAPI even without XDP"), if GRO is enabled on a veth device and TSO is disabled on the peer device, TCP skbs will go through the NAPI callback. If there is no XDP program attached, the veth code does not perform any share check, and shared/cloned skbs could enter the GRO engine. Ignat reported a BUG triggered later-on due to the above condition: [ 53.970529][ C1] kernel BUG at net/core/skbuff.c:3574! [ 53.981755][ C1] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [ 53.982634][ C1] CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 5.16.0-rc5+ #25 [ 53.982634][ C1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 53.982634][ C1] RIP: 0010:skb_shift+0x13ef/0x23b0 [ 53.982634][ C1] Code: ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 41 0c 00 00 41 80 7f 02 00 4d 8d b5 d0 00 00 00 0f 85 74 f5 ff ff <0f> 0b 4d 8d 77 20 be 04 00 00 00 4c 89 44 24 78 4c 89 f7 4c 89 8c [ 53.982634][ C1] RSP: 0018:ffff8881008f7008 EFLAGS: 00010246 [ 53.982634][ C1] RAX: 0000000000000000 RBX: ffff8881180b4c80 RCX: 0000000000000000 [ 53.982634][ C1] RDX: 0000000000000002 RSI: ffff8881180b4d3c RDI: ffff88810bc9cac2 [ 53.982634][ C1] RBP: ffff8881008f70b8 R08: ffff8881180b4cf4 R09: ffff8881180b4cf0 [ 53.982634][ C1] R10: ffffed1022999e5c R11: 0000000000000002 R12: 0000000000000590 [ 53.982634][ C1] R13: ffff88810f940c80 R14: ffff88810f940d50 R15: ffff88810bc9cac0 [ 53.982634][ C1] FS: 0000000000000000(0000) GS:ffff888235880000(0000) knlGS:0000000000000000 [ 53.982634][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 53.982634][ C1] CR2: 00007ff5f9b86680 CR3: 0000000108ce8004 CR4: 0000000000170ee0 [ 53.982634][ C1] Call Trace: [ 53.982634][ C1] <TASK> [ 53.982634][ C1] tcp_sacktag_walk+0xaba/0x18e0 [ 53.982634][ C1] tcp_sacktag_write_queue+0xe7b/0x3460 [ 53.982634][ C1] tcp_ack+0x2666/0x54b0 [ 53.982634][ C1] tcp_rcv_established+0x4d9/0x20f0 [ 53.982634][ C1] tcp_v4_do_rcv+0x551/0x810 [ 53.982634][ C1] tcp_v4_rcv+0x22ed/0x2ed0 [ 53.982634][ C1] ip_protocol_deliver_rcu+0x96/0xaf0 [ 53.982634][ C1] ip_local_deliver_finish+0x1e0/0x2f0 [ 53.982634][ C1] ip_sublist_rcv_finish+0x211/0x440 [ 53.982634][ C1] ip_list_rcv_finish.constprop.0+0x424/0x660 [ 53.982634][ C1] ip_list_rcv+0x2c8/0x410 [ 53.982634][ C1] __netif_receive_skb_list_core+0x65c/0x910 [ 53.982634][ C1] netif_receive_skb_list_internal+0x5f9/0xcb0 [ 53.982634][ C1] napi_complete_done+0x188/0x6e0 [ 53.982634][ C1] gro_cell_poll+0x10c/0x1d0 [ 53.982634][ C1] __napi_poll+0xa1/0x530 [ 53.982634][ C1] net_rx_action+0x567/0x1270 [ 53.982634][ C1] __do_softirq+0x28a/0x9ba [ 53.982634][ C1] run_ksoftirqd+0x32/0x60 [ 53.982634][ C1] smpboot_thread_fn+0x559/0x8c0 [ 53.982634][ C1] kthread+0x3b9/0x490 [ 53.982634][ C1] ret_from_fork+0x22/0x30 [ 53.982634][ C1] </TASK> Address the issue by skipping the GRO stage for shared or cloned skbs. To reduce the chance of OoO, try to unclone the skbs before giving up. v1 -> v2: - use avoid skb_copy and fallback to netif_receive_skb - Eric Reported-by: Ignat Korchagin <ignat@cloudflare.com> Fixes: d3256ef ("veth: allow enabling NAPI even without XDP") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/b5f61c5602aab01bac8d711d8d1bfab0a4817db7.1640197544.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
tsautereau-anssi
pushed a commit
to tsautereau-anssi/linux-hardened
that referenced
this pull request
Jan 10, 2022
After commit d3256ef ("veth: allow enabling NAPI even without XDP"), if GRO is enabled on a veth device and TSO is disabled on the peer device, TCP skbs will go through the NAPI callback. If there is no XDP program attached, the veth code does not perform any share check, and shared/cloned skbs could enter the GRO engine. Ignat reported a BUG triggered later-on due to the above condition: [ 53.970529][ C1] kernel BUG at net/core/skbuff.c:3574! [ 53.981755][ C1] invalid opcode: 0000 [anthraxx#1] PREEMPT SMP KASAN PTI [ 53.982634][ C1] CPU: 1 PID: 19 Comm: ksoftirqd/1 Not tainted 5.16.0-rc5+ anthraxx#25 [ 53.982634][ C1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 [ 53.982634][ C1] RIP: 0010:skb_shift+0x13ef/0x23b0 [ 53.982634][ C1] Code: ea 03 0f b6 04 02 48 89 fa 83 e2 07 38 d0 7f 08 84 c0 0f 85 41 0c 00 00 41 80 7f 02 00 4d 8d b5 d0 00 00 00 0f 85 74 f5 ff ff <0f> 0b 4d 8d 77 20 be 04 00 00 00 4c 89 44 24 78 4c 89 f7 4c 89 8c [ 53.982634][ C1] RSP: 0018:ffff8881008f7008 EFLAGS: 00010246 [ 53.982634][ C1] RAX: 0000000000000000 RBX: ffff8881180b4c80 RCX: 0000000000000000 [ 53.982634][ C1] RDX: 0000000000000002 RSI: ffff8881180b4d3c RDI: ffff88810bc9cac2 [ 53.982634][ C1] RBP: ffff8881008f70b8 R08: ffff8881180b4cf4 R09: ffff8881180b4cf0 [ 53.982634][ C1] R10: ffffed1022999e5c R11: 0000000000000002 R12: 0000000000000590 [ 53.982634][ C1] R13: ffff88810f940c80 R14: ffff88810f940d50 R15: ffff88810bc9cac0 [ 53.982634][ C1] FS: 0000000000000000(0000) GS:ffff888235880000(0000) knlGS:0000000000000000 [ 53.982634][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 53.982634][ C1] CR2: 00007ff5f9b86680 CR3: 0000000108ce8004 CR4: 0000000000170ee0 [ 53.982634][ C1] Call Trace: [ 53.982634][ C1] <TASK> [ 53.982634][ C1] tcp_sacktag_walk+0xaba/0x18e0 [ 53.982634][ C1] tcp_sacktag_write_queue+0xe7b/0x3460 [ 53.982634][ C1] tcp_ack+0x2666/0x54b0 [ 53.982634][ C1] tcp_rcv_established+0x4d9/0x20f0 [ 53.982634][ C1] tcp_v4_do_rcv+0x551/0x810 [ 53.982634][ C1] tcp_v4_rcv+0x22ed/0x2ed0 [ 53.982634][ C1] ip_protocol_deliver_rcu+0x96/0xaf0 [ 53.982634][ C1] ip_local_deliver_finish+0x1e0/0x2f0 [ 53.982634][ C1] ip_sublist_rcv_finish+0x211/0x440 [ 53.982634][ C1] ip_list_rcv_finish.constprop.0+0x424/0x660 [ 53.982634][ C1] ip_list_rcv+0x2c8/0x410 [ 53.982634][ C1] __netif_receive_skb_list_core+0x65c/0x910 [ 53.982634][ C1] netif_receive_skb_list_internal+0x5f9/0xcb0 [ 53.982634][ C1] napi_complete_done+0x188/0x6e0 [ 53.982634][ C1] gro_cell_poll+0x10c/0x1d0 [ 53.982634][ C1] __napi_poll+0xa1/0x530 [ 53.982634][ C1] net_rx_action+0x567/0x1270 [ 53.982634][ C1] __do_softirq+0x28a/0x9ba [ 53.982634][ C1] run_ksoftirqd+0x32/0x60 [ 53.982634][ C1] smpboot_thread_fn+0x559/0x8c0 [ 53.982634][ C1] kthread+0x3b9/0x490 [ 53.982634][ C1] ret_from_fork+0x22/0x30 [ 53.982634][ C1] </TASK> Address the issue by skipping the GRO stage for shared or cloned skbs. To reduce the chance of OoO, try to unclone the skbs before giving up. v1 -> v2: - use avoid skb_copy and fallback to netif_receive_skb - Eric Reported-by: Ignat Korchagin <ignat@cloudflare.com> Fixes: d3256ef ("veth: allow enabling NAPI even without XDP") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Tested-by: Ignat Korchagin <ignat@cloudflare.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/b5f61c5602aab01bac8d711d8d1bfab0a4817db7.1640197544.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Feb 1, 2022
[ Upstream commit b132030 ] A out-of-bounds bug can be triggered by an interrupt, the reason for this bug is the lack of checking of register values. In flexcop_pci_isr, the driver reads value from a register and uses it as a dma address. Finally, this address will be passed to the count parameter of find_next_packet. If this value is larger than the size of dma, the index of buffer will be out-of-bounds. Fix this by adding a check after reading the value of the register. The following KASAN report reveals it: BUG: KASAN: slab-out-of-bounds in find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] BUG: KASAN: slab-out-of-bounds in _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] BUG: KASAN: slab-out-of-bounds in dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 Read of size 1 at addr ffff8880608c00a0 by task swapper/2/0 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.177-gdba4159c14ef #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xec/0x156 lib/dump_stack.c:118 print_address_description+0x78/0x290 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x25b/0x380 mm/kasan/report.c:412 __asan_report_load1_noabort+0x19/0x20 mm/kasan/report.c:430 find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 flexcop_pass_dmx_data+0x2e/0x40 drivers/media/common/b2c2/flexcop.c:167 flexcop_pci_isr+0x3d1/0x5d0 drivers/media/pci/b2c2/flexcop-pci.c:212 __handle_irq_event_percpu+0xfb/0x770 kernel/irq/handle.c:149 handle_irq_event_percpu+0x79/0x150 kernel/irq/handle.c:189 handle_irq_event+0xac/0x140 kernel/irq/handle.c:206 handle_fasteoi_irq+0x232/0x5c0 kernel/irq/chip.c:725 generic_handle_irq_desc include/linux/irqdesc.h:155 [inline] handle_irq+0x230/0x3a0 arch/x86/kernel/irq_64.c:87 do_IRQ+0xa7/0x1e0 arch/x86/kernel/irq.c:247 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:670 </IRQ> RIP: 0010:native_safe_halt+0x28/0x30 arch/x86/include/asm/irqflags.h:61 Code: 00 00 55 be 04 00 00 00 48 c7 c7 00 62 2f 8c 48 89 e5 e8 fb 31 e8 f8 8b 05 75 4f 8e 03 85 c0 7e 07 0f 00 2d 8a 61 66 00 fb f4 <5d> c3 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 RSP: 0018:ffff88806b71fcc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde RAX: 0000000000000000 RBX: ffffffff8bde44c8 RCX: ffffffff88a11285 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff8c2f6200 RBP: ffff88806b71fcc8 R08: fffffbfff185ec40 R09: fffffbfff185ec40 R10: 0000000000000001 R11: fffffbfff185ec40 R12: 0000000000000002 R13: ffffffff8be9d6e0 R14: 0000000000000000 R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0x6f/0x360 arch/x86/kernel/process.c:557 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:548 default_idle_call+0x3b/0x60 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x2ab/0x3c0 kernel/sched/idle.c:263 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:369 start_secondary+0x3b8/0x4e0 arch/x86/kernel/smpboot.c:271 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Allocated by task 1: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x11/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2741 [inline] slab_alloc mm/slub.c:2749 [inline] kmem_cache_alloc+0xeb/0x280 mm/slub.c:2754 kmem_cache_zalloc include/linux/slab.h:699 [inline] __kernfs_new_node+0xe2/0x6f0 fs/kernfs/dir.c:633 kernfs_new_node+0x9a/0x120 fs/kernfs/dir.c:693 __kernfs_create_file+0x5f/0x340 fs/kernfs/file.c:992 sysfs_add_file_mode_ns+0x22a/0x4e0 fs/sysfs/file.c:306 create_files fs/sysfs/group.c:63 [inline] internal_create_group+0x34e/0xc30 fs/sysfs/group.c:147 sysfs_create_group fs/sysfs/group.c:173 [inline] sysfs_create_groups+0x9c/0x140 fs/sysfs/group.c:200 driver_add_groups+0x3e/0x50 drivers/base/driver.c:129 bus_add_driver+0x3a5/0x790 drivers/base/bus.c:684 driver_register+0x1cd/0x410 drivers/base/driver.c:170 __pci_register_driver+0x197/0x200 drivers/pci/pci-driver.c:1411 cx88_audio_pci_driver_init+0x23/0x25 drivers/media/pci/cx88/cx88-alsa.c: 1017 do_one_initcall+0xe0/0x610 init/main.c:884 do_initcall_level init/main.c:952 [inline] do_initcalls init/main.c:960 [inline] do_basic_setup init/main.c:978 [inline] kernel_init_freeable+0x4d0/0x592 init/main.c:1145 kernel_init+0x18/0x190 init/main.c:1062 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff8880608c0000 which belongs to the cache kernfs_node_cache of size 160 The buggy address is located 0 bytes to the right of 160-byte region [ffff8880608c0000, ffff8880608c00a0) The buggy address belongs to the page: page:ffffea0001823000 count:1 mapcount:0 mapping:ffff88806bed1e00 index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 dead000000000100 dead000000000200 ffff88806bed1e00 raw: 0000000000000000 0000000000240024 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880608bff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8880608c0080: 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 ^ ffff8880608c0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0180: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ================================================================== Link: https://lore.kernel.org/linux-media/1620723603-30912-1-git-send-email-zheyuma97@gmail.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Feb 1, 2022
[ Upstream commit b132030 ] A out-of-bounds bug can be triggered by an interrupt, the reason for this bug is the lack of checking of register values. In flexcop_pci_isr, the driver reads value from a register and uses it as a dma address. Finally, this address will be passed to the count parameter of find_next_packet. If this value is larger than the size of dma, the index of buffer will be out-of-bounds. Fix this by adding a check after reading the value of the register. The following KASAN report reveals it: BUG: KASAN: slab-out-of-bounds in find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] BUG: KASAN: slab-out-of-bounds in _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] BUG: KASAN: slab-out-of-bounds in dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 Read of size 1 at addr ffff8880608c00a0 by task swapper/2/0 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.177-gdba4159c14ef #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xec/0x156 lib/dump_stack.c:118 print_address_description+0x78/0x290 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x25b/0x380 mm/kasan/report.c:412 __asan_report_load1_noabort+0x19/0x20 mm/kasan/report.c:430 find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 flexcop_pass_dmx_data+0x2e/0x40 drivers/media/common/b2c2/flexcop.c:167 flexcop_pci_isr+0x3d1/0x5d0 drivers/media/pci/b2c2/flexcop-pci.c:212 __handle_irq_event_percpu+0xfb/0x770 kernel/irq/handle.c:149 handle_irq_event_percpu+0x79/0x150 kernel/irq/handle.c:189 handle_irq_event+0xac/0x140 kernel/irq/handle.c:206 handle_fasteoi_irq+0x232/0x5c0 kernel/irq/chip.c:725 generic_handle_irq_desc include/linux/irqdesc.h:155 [inline] handle_irq+0x230/0x3a0 arch/x86/kernel/irq_64.c:87 do_IRQ+0xa7/0x1e0 arch/x86/kernel/irq.c:247 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:670 </IRQ> RIP: 0010:native_safe_halt+0x28/0x30 arch/x86/include/asm/irqflags.h:61 Code: 00 00 55 be 04 00 00 00 48 c7 c7 00 62 2f 8c 48 89 e5 e8 fb 31 e8 f8 8b 05 75 4f 8e 03 85 c0 7e 07 0f 00 2d 8a 61 66 00 fb f4 <5d> c3 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 RSP: 0018:ffff88806b71fcc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde RAX: 0000000000000000 RBX: ffffffff8bde44c8 RCX: ffffffff88a11285 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff8c2f6200 RBP: ffff88806b71fcc8 R08: fffffbfff185ec40 R09: fffffbfff185ec40 R10: 0000000000000001 R11: fffffbfff185ec40 R12: 0000000000000002 R13: ffffffff8be9d6e0 R14: 0000000000000000 R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0x6f/0x360 arch/x86/kernel/process.c:557 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:548 default_idle_call+0x3b/0x60 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x2ab/0x3c0 kernel/sched/idle.c:263 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:369 start_secondary+0x3b8/0x4e0 arch/x86/kernel/smpboot.c:271 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Allocated by task 1: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x11/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2741 [inline] slab_alloc mm/slub.c:2749 [inline] kmem_cache_alloc+0xeb/0x280 mm/slub.c:2754 kmem_cache_zalloc include/linux/slab.h:699 [inline] __kernfs_new_node+0xe2/0x6f0 fs/kernfs/dir.c:633 kernfs_new_node+0x9a/0x120 fs/kernfs/dir.c:693 __kernfs_create_file+0x5f/0x340 fs/kernfs/file.c:992 sysfs_add_file_mode_ns+0x22a/0x4e0 fs/sysfs/file.c:306 create_files fs/sysfs/group.c:63 [inline] internal_create_group+0x34e/0xc30 fs/sysfs/group.c:147 sysfs_create_group fs/sysfs/group.c:173 [inline] sysfs_create_groups+0x9c/0x140 fs/sysfs/group.c:200 driver_add_groups+0x3e/0x50 drivers/base/driver.c:129 bus_add_driver+0x3a5/0x790 drivers/base/bus.c:684 driver_register+0x1cd/0x410 drivers/base/driver.c:170 __pci_register_driver+0x197/0x200 drivers/pci/pci-driver.c:1411 cx88_audio_pci_driver_init+0x23/0x25 drivers/media/pci/cx88/cx88-alsa.c: 1017 do_one_initcall+0xe0/0x610 init/main.c:884 do_initcall_level init/main.c:952 [inline] do_initcalls init/main.c:960 [inline] do_basic_setup init/main.c:978 [inline] kernel_init_freeable+0x4d0/0x592 init/main.c:1145 kernel_init+0x18/0x190 init/main.c:1062 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff8880608c0000 which belongs to the cache kernfs_node_cache of size 160 The buggy address is located 0 bytes to the right of 160-byte region [ffff8880608c0000, ffff8880608c00a0) The buggy address belongs to the page: page:ffffea0001823000 count:1 mapcount:0 mapping:ffff88806bed1e00 index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 dead000000000100 dead000000000200 ffff88806bed1e00 raw: 0000000000000000 0000000000240024 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880608bff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8880608c0080: 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 ^ ffff8880608c0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0180: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ================================================================== Link: https://lore.kernel.org/linux-media/1620723603-30912-1-git-send-email-zheyuma97@gmail.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Feb 1, 2022
[ Upstream commit b132030 ] A out-of-bounds bug can be triggered by an interrupt, the reason for this bug is the lack of checking of register values. In flexcop_pci_isr, the driver reads value from a register and uses it as a dma address. Finally, this address will be passed to the count parameter of find_next_packet. If this value is larger than the size of dma, the index of buffer will be out-of-bounds. Fix this by adding a check after reading the value of the register. The following KASAN report reveals it: BUG: KASAN: slab-out-of-bounds in find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] BUG: KASAN: slab-out-of-bounds in _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] BUG: KASAN: slab-out-of-bounds in dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 Read of size 1 at addr ffff8880608c00a0 by task swapper/2/0 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.177-gdba4159c14ef #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xec/0x156 lib/dump_stack.c:118 print_address_description+0x78/0x290 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x25b/0x380 mm/kasan/report.c:412 __asan_report_load1_noabort+0x19/0x20 mm/kasan/report.c:430 find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 flexcop_pass_dmx_data+0x2e/0x40 drivers/media/common/b2c2/flexcop.c:167 flexcop_pci_isr+0x3d1/0x5d0 drivers/media/pci/b2c2/flexcop-pci.c:212 __handle_irq_event_percpu+0xfb/0x770 kernel/irq/handle.c:149 handle_irq_event_percpu+0x79/0x150 kernel/irq/handle.c:189 handle_irq_event+0xac/0x140 kernel/irq/handle.c:206 handle_fasteoi_irq+0x232/0x5c0 kernel/irq/chip.c:725 generic_handle_irq_desc include/linux/irqdesc.h:155 [inline] handle_irq+0x230/0x3a0 arch/x86/kernel/irq_64.c:87 do_IRQ+0xa7/0x1e0 arch/x86/kernel/irq.c:247 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:670 </IRQ> RIP: 0010:native_safe_halt+0x28/0x30 arch/x86/include/asm/irqflags.h:61 Code: 00 00 55 be 04 00 00 00 48 c7 c7 00 62 2f 8c 48 89 e5 e8 fb 31 e8 f8 8b 05 75 4f 8e 03 85 c0 7e 07 0f 00 2d 8a 61 66 00 fb f4 <5d> c3 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 RSP: 0018:ffff88806b71fcc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde RAX: 0000000000000000 RBX: ffffffff8bde44c8 RCX: ffffffff88a11285 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff8c2f6200 RBP: ffff88806b71fcc8 R08: fffffbfff185ec40 R09: fffffbfff185ec40 R10: 0000000000000001 R11: fffffbfff185ec40 R12: 0000000000000002 R13: ffffffff8be9d6e0 R14: 0000000000000000 R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0x6f/0x360 arch/x86/kernel/process.c:557 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:548 default_idle_call+0x3b/0x60 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x2ab/0x3c0 kernel/sched/idle.c:263 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:369 start_secondary+0x3b8/0x4e0 arch/x86/kernel/smpboot.c:271 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Allocated by task 1: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x11/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2741 [inline] slab_alloc mm/slub.c:2749 [inline] kmem_cache_alloc+0xeb/0x280 mm/slub.c:2754 kmem_cache_zalloc include/linux/slab.h:699 [inline] __kernfs_new_node+0xe2/0x6f0 fs/kernfs/dir.c:633 kernfs_new_node+0x9a/0x120 fs/kernfs/dir.c:693 __kernfs_create_file+0x5f/0x340 fs/kernfs/file.c:992 sysfs_add_file_mode_ns+0x22a/0x4e0 fs/sysfs/file.c:306 create_files fs/sysfs/group.c:63 [inline] internal_create_group+0x34e/0xc30 fs/sysfs/group.c:147 sysfs_create_group fs/sysfs/group.c:173 [inline] sysfs_create_groups+0x9c/0x140 fs/sysfs/group.c:200 driver_add_groups+0x3e/0x50 drivers/base/driver.c:129 bus_add_driver+0x3a5/0x790 drivers/base/bus.c:684 driver_register+0x1cd/0x410 drivers/base/driver.c:170 __pci_register_driver+0x197/0x200 drivers/pci/pci-driver.c:1411 cx88_audio_pci_driver_init+0x23/0x25 drivers/media/pci/cx88/cx88-alsa.c: 1017 do_one_initcall+0xe0/0x610 init/main.c:884 do_initcall_level init/main.c:952 [inline] do_initcalls init/main.c:960 [inline] do_basic_setup init/main.c:978 [inline] kernel_init_freeable+0x4d0/0x592 init/main.c:1145 kernel_init+0x18/0x190 init/main.c:1062 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff8880608c0000 which belongs to the cache kernfs_node_cache of size 160 The buggy address is located 0 bytes to the right of 160-byte region [ffff8880608c0000, ffff8880608c00a0) The buggy address belongs to the page: page:ffffea0001823000 count:1 mapcount:0 mapping:ffff88806bed1e00 index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 dead000000000100 dead000000000200 ffff88806bed1e00 raw: 0000000000000000 0000000000240024 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880608bff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8880608c0080: 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 ^ ffff8880608c0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0180: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ================================================================== Link: https://lore.kernel.org/linux-media/1620723603-30912-1-git-send-email-zheyuma97@gmail.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Feb 1, 2022
[ Upstream commit b132030 ] A out-of-bounds bug can be triggered by an interrupt, the reason for this bug is the lack of checking of register values. In flexcop_pci_isr, the driver reads value from a register and uses it as a dma address. Finally, this address will be passed to the count parameter of find_next_packet. If this value is larger than the size of dma, the index of buffer will be out-of-bounds. Fix this by adding a check after reading the value of the register. The following KASAN report reveals it: BUG: KASAN: slab-out-of-bounds in find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] BUG: KASAN: slab-out-of-bounds in _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] BUG: KASAN: slab-out-of-bounds in dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 Read of size 1 at addr ffff8880608c00a0 by task swapper/2/0 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.177-gdba4159c14ef #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xec/0x156 lib/dump_stack.c:118 print_address_description+0x78/0x290 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x25b/0x380 mm/kasan/report.c:412 __asan_report_load1_noabort+0x19/0x20 mm/kasan/report.c:430 find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 flexcop_pass_dmx_data+0x2e/0x40 drivers/media/common/b2c2/flexcop.c:167 flexcop_pci_isr+0x3d1/0x5d0 drivers/media/pci/b2c2/flexcop-pci.c:212 __handle_irq_event_percpu+0xfb/0x770 kernel/irq/handle.c:149 handle_irq_event_percpu+0x79/0x150 kernel/irq/handle.c:189 handle_irq_event+0xac/0x140 kernel/irq/handle.c:206 handle_fasteoi_irq+0x232/0x5c0 kernel/irq/chip.c:725 generic_handle_irq_desc include/linux/irqdesc.h:155 [inline] handle_irq+0x230/0x3a0 arch/x86/kernel/irq_64.c:87 do_IRQ+0xa7/0x1e0 arch/x86/kernel/irq.c:247 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:670 </IRQ> RIP: 0010:native_safe_halt+0x28/0x30 arch/x86/include/asm/irqflags.h:61 Code: 00 00 55 be 04 00 00 00 48 c7 c7 00 62 2f 8c 48 89 e5 e8 fb 31 e8 f8 8b 05 75 4f 8e 03 85 c0 7e 07 0f 00 2d 8a 61 66 00 fb f4 <5d> c3 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 RSP: 0018:ffff88806b71fcc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde RAX: 0000000000000000 RBX: ffffffff8bde44c8 RCX: ffffffff88a11285 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff8c2f6200 RBP: ffff88806b71fcc8 R08: fffffbfff185ec40 R09: fffffbfff185ec40 R10: 0000000000000001 R11: fffffbfff185ec40 R12: 0000000000000002 R13: ffffffff8be9d6e0 R14: 0000000000000000 R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0x6f/0x360 arch/x86/kernel/process.c:557 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:548 default_idle_call+0x3b/0x60 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x2ab/0x3c0 kernel/sched/idle.c:263 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:369 start_secondary+0x3b8/0x4e0 arch/x86/kernel/smpboot.c:271 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Allocated by task 1: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x11/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2741 [inline] slab_alloc mm/slub.c:2749 [inline] kmem_cache_alloc+0xeb/0x280 mm/slub.c:2754 kmem_cache_zalloc include/linux/slab.h:699 [inline] __kernfs_new_node+0xe2/0x6f0 fs/kernfs/dir.c:633 kernfs_new_node+0x9a/0x120 fs/kernfs/dir.c:693 __kernfs_create_file+0x5f/0x340 fs/kernfs/file.c:992 sysfs_add_file_mode_ns+0x22a/0x4e0 fs/sysfs/file.c:306 create_files fs/sysfs/group.c:63 [inline] internal_create_group+0x34e/0xc30 fs/sysfs/group.c:147 sysfs_create_group fs/sysfs/group.c:173 [inline] sysfs_create_groups+0x9c/0x140 fs/sysfs/group.c:200 driver_add_groups+0x3e/0x50 drivers/base/driver.c:129 bus_add_driver+0x3a5/0x790 drivers/base/bus.c:684 driver_register+0x1cd/0x410 drivers/base/driver.c:170 __pci_register_driver+0x197/0x200 drivers/pci/pci-driver.c:1411 cx88_audio_pci_driver_init+0x23/0x25 drivers/media/pci/cx88/cx88-alsa.c: 1017 do_one_initcall+0xe0/0x610 init/main.c:884 do_initcall_level init/main.c:952 [inline] do_initcalls init/main.c:960 [inline] do_basic_setup init/main.c:978 [inline] kernel_init_freeable+0x4d0/0x592 init/main.c:1145 kernel_init+0x18/0x190 init/main.c:1062 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff8880608c0000 which belongs to the cache kernfs_node_cache of size 160 The buggy address is located 0 bytes to the right of 160-byte region [ffff8880608c0000, ffff8880608c00a0) The buggy address belongs to the page: page:ffffea0001823000 count:1 mapcount:0 mapping:ffff88806bed1e00 index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 dead000000000100 dead000000000200 ffff88806bed1e00 raw: 0000000000000000 0000000000240024 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880608bff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8880608c0080: 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 ^ ffff8880608c0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0180: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ================================================================== Link: https://lore.kernel.org/linux-media/1620723603-30912-1-git-send-email-zheyuma97@gmail.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Feb 1, 2022
[ Upstream commit b132030 ] A out-of-bounds bug can be triggered by an interrupt, the reason for this bug is the lack of checking of register values. In flexcop_pci_isr, the driver reads value from a register and uses it as a dma address. Finally, this address will be passed to the count parameter of find_next_packet. If this value is larger than the size of dma, the index of buffer will be out-of-bounds. Fix this by adding a check after reading the value of the register. The following KASAN report reveals it: BUG: KASAN: slab-out-of-bounds in find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] BUG: KASAN: slab-out-of-bounds in _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] BUG: KASAN: slab-out-of-bounds in dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 Read of size 1 at addr ffff8880608c00a0 by task swapper/2/0 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.177-gdba4159c14ef #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xec/0x156 lib/dump_stack.c:118 print_address_description+0x78/0x290 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x25b/0x380 mm/kasan/report.c:412 __asan_report_load1_noabort+0x19/0x20 mm/kasan/report.c:430 find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 flexcop_pass_dmx_data+0x2e/0x40 drivers/media/common/b2c2/flexcop.c:167 flexcop_pci_isr+0x3d1/0x5d0 drivers/media/pci/b2c2/flexcop-pci.c:212 __handle_irq_event_percpu+0xfb/0x770 kernel/irq/handle.c:149 handle_irq_event_percpu+0x79/0x150 kernel/irq/handle.c:189 handle_irq_event+0xac/0x140 kernel/irq/handle.c:206 handle_fasteoi_irq+0x232/0x5c0 kernel/irq/chip.c:725 generic_handle_irq_desc include/linux/irqdesc.h:155 [inline] handle_irq+0x230/0x3a0 arch/x86/kernel/irq_64.c:87 do_IRQ+0xa7/0x1e0 arch/x86/kernel/irq.c:247 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:670 </IRQ> RIP: 0010:native_safe_halt+0x28/0x30 arch/x86/include/asm/irqflags.h:61 Code: 00 00 55 be 04 00 00 00 48 c7 c7 00 62 2f 8c 48 89 e5 e8 fb 31 e8 f8 8b 05 75 4f 8e 03 85 c0 7e 07 0f 00 2d 8a 61 66 00 fb f4 <5d> c3 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 RSP: 0018:ffff88806b71fcc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde RAX: 0000000000000000 RBX: ffffffff8bde44c8 RCX: ffffffff88a11285 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff8c2f6200 RBP: ffff88806b71fcc8 R08: fffffbfff185ec40 R09: fffffbfff185ec40 R10: 0000000000000001 R11: fffffbfff185ec40 R12: 0000000000000002 R13: ffffffff8be9d6e0 R14: 0000000000000000 R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0x6f/0x360 arch/x86/kernel/process.c:557 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:548 default_idle_call+0x3b/0x60 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x2ab/0x3c0 kernel/sched/idle.c:263 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:369 start_secondary+0x3b8/0x4e0 arch/x86/kernel/smpboot.c:271 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Allocated by task 1: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x11/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2741 [inline] slab_alloc mm/slub.c:2749 [inline] kmem_cache_alloc+0xeb/0x280 mm/slub.c:2754 kmem_cache_zalloc include/linux/slab.h:699 [inline] __kernfs_new_node+0xe2/0x6f0 fs/kernfs/dir.c:633 kernfs_new_node+0x9a/0x120 fs/kernfs/dir.c:693 __kernfs_create_file+0x5f/0x340 fs/kernfs/file.c:992 sysfs_add_file_mode_ns+0x22a/0x4e0 fs/sysfs/file.c:306 create_files fs/sysfs/group.c:63 [inline] internal_create_group+0x34e/0xc30 fs/sysfs/group.c:147 sysfs_create_group fs/sysfs/group.c:173 [inline] sysfs_create_groups+0x9c/0x140 fs/sysfs/group.c:200 driver_add_groups+0x3e/0x50 drivers/base/driver.c:129 bus_add_driver+0x3a5/0x790 drivers/base/bus.c:684 driver_register+0x1cd/0x410 drivers/base/driver.c:170 __pci_register_driver+0x197/0x200 drivers/pci/pci-driver.c:1411 cx88_audio_pci_driver_init+0x23/0x25 drivers/media/pci/cx88/cx88-alsa.c: 1017 do_one_initcall+0xe0/0x610 init/main.c:884 do_initcall_level init/main.c:952 [inline] do_initcalls init/main.c:960 [inline] do_basic_setup init/main.c:978 [inline] kernel_init_freeable+0x4d0/0x592 init/main.c:1145 kernel_init+0x18/0x190 init/main.c:1062 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff8880608c0000 which belongs to the cache kernfs_node_cache of size 160 The buggy address is located 0 bytes to the right of 160-byte region [ffff8880608c0000, ffff8880608c00a0) The buggy address belongs to the page: page:ffffea0001823000 count:1 mapcount:0 mapping:ffff88806bed1e00 index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 dead000000000100 dead000000000200 ffff88806bed1e00 raw: 0000000000000000 0000000000240024 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880608bff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8880608c0080: 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 ^ ffff8880608c0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0180: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ================================================================== Link: https://lore.kernel.org/linux-media/1620723603-30912-1-git-send-email-zheyuma97@gmail.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Mar 21, 2022
[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Mar 21, 2022
[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Mar 21, 2022
[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Mar 21, 2022
[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Mar 21, 2022
[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Mar 25, 2022
[ Upstream commit b132030 ] A out-of-bounds bug can be triggered by an interrupt, the reason for this bug is the lack of checking of register values. In flexcop_pci_isr, the driver reads value from a register and uses it as a dma address. Finally, this address will be passed to the count parameter of find_next_packet. If this value is larger than the size of dma, the index of buffer will be out-of-bounds. Fix this by adding a check after reading the value of the register. The following KASAN report reveals it: BUG: KASAN: slab-out-of-bounds in find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] BUG: KASAN: slab-out-of-bounds in _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] BUG: KASAN: slab-out-of-bounds in dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 Read of size 1 at addr ffff8880608c00a0 by task swapper/2/0 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.177-gdba4159c14ef #25 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xec/0x156 lib/dump_stack.c:118 print_address_description+0x78/0x290 mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report+0x25b/0x380 mm/kasan/report.c:412 __asan_report_load1_noabort+0x19/0x20 mm/kasan/report.c:430 find_next_packet drivers/media/dvb-core/dvb_demux.c:528 [inline] _dvb_dmx_swfilter drivers/media/dvb-core/dvb_demux.c:572 [inline] dvb_dmx_swfilter+0x3fa/0x420 drivers/media/dvb-core/dvb_demux.c:603 flexcop_pass_dmx_data+0x2e/0x40 drivers/media/common/b2c2/flexcop.c:167 flexcop_pci_isr+0x3d1/0x5d0 drivers/media/pci/b2c2/flexcop-pci.c:212 __handle_irq_event_percpu+0xfb/0x770 kernel/irq/handle.c:149 handle_irq_event_percpu+0x79/0x150 kernel/irq/handle.c:189 handle_irq_event+0xac/0x140 kernel/irq/handle.c:206 handle_fasteoi_irq+0x232/0x5c0 kernel/irq/chip.c:725 generic_handle_irq_desc include/linux/irqdesc.h:155 [inline] handle_irq+0x230/0x3a0 arch/x86/kernel/irq_64.c:87 do_IRQ+0xa7/0x1e0 arch/x86/kernel/irq.c:247 common_interrupt+0xf/0xf arch/x86/entry/entry_64.S:670 </IRQ> RIP: 0010:native_safe_halt+0x28/0x30 arch/x86/include/asm/irqflags.h:61 Code: 00 00 55 be 04 00 00 00 48 c7 c7 00 62 2f 8c 48 89 e5 e8 fb 31 e8 f8 8b 05 75 4f 8e 03 85 c0 7e 07 0f 00 2d 8a 61 66 00 fb f4 <5d> c3 90 90 90 90 90 90 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 41 RSP: 0018:ffff88806b71fcc8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffde RAX: 0000000000000000 RBX: ffffffff8bde44c8 RCX: ffffffff88a11285 RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffffffff8c2f6200 RBP: ffff88806b71fcc8 R08: fffffbfff185ec40 R09: fffffbfff185ec40 R10: 0000000000000001 R11: fffffbfff185ec40 R12: 0000000000000002 R13: ffffffff8be9d6e0 R14: 0000000000000000 R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0x6f/0x360 arch/x86/kernel/process.c:557 arch_cpu_idle+0xf/0x20 arch/x86/kernel/process.c:548 default_idle_call+0x3b/0x60 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x2ab/0x3c0 kernel/sched/idle.c:263 cpu_startup_entry+0xcb/0xe0 kernel/sched/idle.c:369 start_secondary+0x3b8/0x4e0 arch/x86/kernel/smpboot.c:271 secondary_startup_64+0xa4/0xb0 arch/x86/kernel/head_64.S:243 Allocated by task 1: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x11/0x20 mm/kasan/kasan.c:490 slab_post_alloc_hook mm/slab.h:445 [inline] slab_alloc_node mm/slub.c:2741 [inline] slab_alloc mm/slub.c:2749 [inline] kmem_cache_alloc+0xeb/0x280 mm/slub.c:2754 kmem_cache_zalloc include/linux/slab.h:699 [inline] __kernfs_new_node+0xe2/0x6f0 fs/kernfs/dir.c:633 kernfs_new_node+0x9a/0x120 fs/kernfs/dir.c:693 __kernfs_create_file+0x5f/0x340 fs/kernfs/file.c:992 sysfs_add_file_mode_ns+0x22a/0x4e0 fs/sysfs/file.c:306 create_files fs/sysfs/group.c:63 [inline] internal_create_group+0x34e/0xc30 fs/sysfs/group.c:147 sysfs_create_group fs/sysfs/group.c:173 [inline] sysfs_create_groups+0x9c/0x140 fs/sysfs/group.c:200 driver_add_groups+0x3e/0x50 drivers/base/driver.c:129 bus_add_driver+0x3a5/0x790 drivers/base/bus.c:684 driver_register+0x1cd/0x410 drivers/base/driver.c:170 __pci_register_driver+0x197/0x200 drivers/pci/pci-driver.c:1411 cx88_audio_pci_driver_init+0x23/0x25 drivers/media/pci/cx88/cx88-alsa.c: 1017 do_one_initcall+0xe0/0x610 init/main.c:884 do_initcall_level init/main.c:952 [inline] do_initcalls init/main.c:960 [inline] do_basic_setup init/main.c:978 [inline] kernel_init_freeable+0x4d0/0x592 init/main.c:1145 kernel_init+0x18/0x190 init/main.c:1062 ret_from_fork+0x24/0x30 arch/x86/entry/entry_64.S:415 Freed by task 0: (stack is not available) The buggy address belongs to the object at ffff8880608c0000 which belongs to the cache kernfs_node_cache of size 160 The buggy address is located 0 bytes to the right of 160-byte region [ffff8880608c0000, ffff8880608c00a0) The buggy address belongs to the page: page:ffffea0001823000 count:1 mapcount:0 mapping:ffff88806bed1e00 index:0x0 compound_mapcount: 0 flags: 0x100000000008100(slab|head) raw: 0100000000008100 dead000000000100 dead000000000200 ffff88806bed1e00 raw: 0000000000000000 0000000000240024 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8880608bff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff8880608c0080: 00 00 00 00 fc fc fc fc fc fc fc fc 00 00 00 00 ^ ffff8880608c0100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8880608c0180: fc fc fc fc fc fc fc fc 00 00 00 00 00 00 00 00 ================================================================== Link: https://lore.kernel.org/linux-media/1620723603-30912-1-git-send-email-zheyuma97@gmail.com Reported-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Zheyu Ma <zheyuma97@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
anthraxx
pushed a commit
that referenced
this pull request
Mar 25, 2022
[ Upstream commit 4224cfd ] When bringing down the netdevice or system shutdown, a panic can be triggered while accessing the sysfs path because the device is already removed. [ 755.549084] mlx5_core 0000:12:00.1: Shutdown was called [ 756.404455] mlx5_core 0000:12:00.0: Shutdown was called ... [ 757.937260] BUG: unable to handle kernel NULL pointer dereference at (null) [ 758.031397] IP: [<ffffffff8ee11acb>] dma_pool_alloc+0x1ab/0x280 crash> bt ... PID: 12649 TASK: ffff8924108f2100 CPU: 1 COMMAND: "amsd" ... #9 [ffff89240e1a38b0] page_fault at ffffffff8f38c778 [exception RIP: dma_pool_alloc+0x1ab] RIP: ffffffff8ee11acb RSP: ffff89240e1a3968 RFLAGS: 00010046 RAX: 0000000000000246 RBX: ffff89243d874100 RCX: 0000000000001000 RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff89243d874090 RBP: ffff89240e1a39c0 R8: 000000000001f080 R9: ffff8905ffc03c00 R10: ffffffffc04680d4 R11: ffffffff8edde9fd R12: 00000000000080d0 R13: ffff89243d874090 R14: ffff89243d874080 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #10 [ffff89240e1a39c8] mlx5_alloc_cmd_msg at ffffffffc04680f3 [mlx5_core] #11 [ffff89240e1a3a18] cmd_exec at ffffffffc046ad62 [mlx5_core] #12 [ffff89240e1a3ab8] mlx5_cmd_exec at ffffffffc046b4fb [mlx5_core] #13 [ffff89240e1a3ae8] mlx5_core_access_reg at ffffffffc0475434 [mlx5_core] #14 [ffff89240e1a3b40] mlx5e_get_fec_caps at ffffffffc04a7348 [mlx5_core] #15 [ffff89240e1a3bb0] get_fec_supported_advertised at ffffffffc04992bf [mlx5_core] #16 [ffff89240e1a3c08] mlx5e_get_link_ksettings at ffffffffc049ab36 [mlx5_core] #17 [ffff89240e1a3ce8] __ethtool_get_link_ksettings at ffffffff8f25db46 #18 [ffff89240e1a3d48] speed_show at ffffffff8f277208 #19 [ffff89240e1a3dd8] dev_attr_show at ffffffff8f0b70e3 #20 [ffff89240e1a3df8] sysfs_kf_seq_show at ffffffff8eedbedf #21 [ffff89240e1a3e18] kernfs_seq_show at ffffffff8eeda596 #22 [ffff89240e1a3e28] seq_read at ffffffff8ee76d10 #23 [ffff89240e1a3e98] kernfs_fop_read at ffffffff8eedaef5 #24 [ffff89240e1a3ed8] vfs_read at ffffffff8ee4e3ff #25 [ffff89240e1a3f08] sys_read at ffffffff8ee4f27f #26 [ffff89240e1a3f50] system_call_fastpath at ffffffff8f395f92 crash> net_device.state ffff89443b0c0000 state = 0x5 (__LINK_STATE_START| __LINK_STATE_NOCARRIER) To prevent this scenario, we also make sure that the netdevice is present. Signed-off-by: suresh kumar <suresh2514@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Trusted Path Execution (TPE) will restrict certain users so they are only able to execute files in root-owned directories writable only by root. This makes it far harder for an attacker to execute their own code.
By default, only users of the "untrusted" group will be under TPE restrictions.
This adds 4 sysctls.
fs.tpeto enable/disable TPE,fs.tpe_restrict_allto cover all non-root users under a weaker TPE restriction (they will only be allowed to execute files in directories they own that aren't group/world-writable, or in directories owned by root and writable only by root),fs.tpe_invertto turn the "untrusted" group into a "trusted" group (only users in that group are exempt from the restriction and all other non-root users are restricted) andfs.tpe_gidto configure the GID of the trusted/untrusted group. There are kconfig options to set the defaults of all of these.This is disabled by default as it could break many of the user's own programs.
This is based on GRKERNSEC_TPE.