[Android稳定性] 第006篇 [问题篇] hungtask causing panic-死锁

0. 问题现象

dmesg_TZ.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
(3)[71:khungtaskd]INFO: task Binder:2848_9:3784 tgid:2848 blocked for 120s in whitelist
10707 cpu4
Call trace:
__switch_to+0x244/0x460
__schedule+0x590/0xac4
schedule+0x64/0x188
__mutex_lock+0x444/0x998
__mutex_lock_slowpath+0x14/0x20
regulator_lock_dependent+0x6c/0x378
regulator_enable+0x3c/0x84
arm_smmu_power_on+0xa8/0x298 [arm_smmu]
arm_smmu_runtime_resume+0x18/0x24 [arm_smmu]
pm_generic_runtime_resume+0x44/0x80
__rpm_callback+0x18c/0xb34
rpm_resume+0x8e0/0x11ec
__pm_runtime_resume+0x68/0x148
arm_smmu_flush_iotlb_all+0x44/0x13c [arm_smmu]
_iopgtbl_unmap+0xb4/0x110 [msm_kgsl]
kgsl_iopgtbl_unmap+0x3c/0x48 [msm_kgsl]
kgsl_mmu_unmap+0xa4/0x334 [msm_kgsl]
kgsl_unmap_and_put_gpuaddr+0x40/0xdc [msm_kgsl]
kgsl_mem_entry_detach_process+0x80/0x190 [msm_kgsl]
kgsl_mem_entry_destroy+0x58/0x1a0 [msm_kgsl]
kgsl_ioctl_gpuobj_free+0x200/0x294 [msm_kgsl]
kgsl_ioctl_helper+0x114/0x208 [msm_kgsl]
kgsl_ioctl+0x34/0xe0 [msm_kgsl]
__arm64_sys_ioctl+0x170/0x1ec
el0_svc_common+0xd4/0x26c
el0_svc+0x34/0x94
el0_sync_handler+0x88/0xe8
el0_sync+0x1a4/0x1c0

1. 问题分析

明显是等锁

使用tace32在task.dtask窗口中搜索2848_9,后面pid是3784 然后右键->Display Stack Frame显示调用栈

__mutex_lock() 这个函数在等锁

1
2
lock=0xFFFFFFD19C343EC8;
lock->owner->counter = 0xFFFFFF8050744D01

说明当前持锁的task是0xFFFFFF8050744D00,最后一位的1是计数

三个方法使用命令查看task:

  • B:: Frame /locals /Caller /TASK 0xFFFFFF8050744D00 /MODule
  • 在Display Stack Frame窗口的Task:后面输入地址
  • 在task.dtask窗口第一列地址栏中找到该地址

task是

crtc_commit:160 cpu6

,右键看调用栈

1
2
3
4
5
6
7
8
9
10
11
Call trace:
__switch_to+0x244/0x460
__schedule+0x590/0xac4
schedule+0x64/0x188
__ww_mutex_lock+0x4a0/0xdb4
__ww_mutex_lock_slowpath+0x18/0x24
ww_mutex_lock+0x4c/0x180
regulator_lock_recursive+0x108/0x5e8
regulator_lock_dependent+0xe4/0x378
regulator_disable+0x3c/0x84
dsi_pwr_enable_vregs+0x130/0x318 [msm_drm]

又是等锁,继续前面的步骤,查看锁

lock=0xFFFFFF800178A0B0

lock->owner->counter =

0xFFFFFF804DEE8001

task: “vendor.qti.came” cpu6,调用栈:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Call trace:
__switch_to+0x244/0x460
__schedule+0x590/0xac4
schedule+0x64/0x188
__mutex_lock+0x444/0x998
__mutex_lock_slowpath+0x14/0x20
clk_prepare+0xd0/0x250
se_geni_clks_on+0x52c/0x648 [msm_geni_se]
se_geni_resources_on+0xac/0x148 [msm_geni_se]
geni_i2c_runtime_resume+0xd0/0x440 [i2c_msm_geni]
pm_generic_runtime_resume+0x44/0x80
__rpm_callback+0x18c/0xb34
rpm_resume+0x8e0/0x11ec
__pm_runtime_resume+0x68/0x148
geni_i2c_xfer+0xb8/0x1254 [i2c_msm_geni]
__i2c_transfer+0x27c/0x8ec
i2c_transfer+0xd4/0x1d4
regmap_i2c_read+0x58/0x94
_regmap_raw_read+0x250/0x488
_regmap_bus_read+0x44/0xb0
_regmap_read+0xac/0x248
regulator_get_voltage_sel_regmap+0x84/0x148
regulator_get_voltage_rdev+0xa8/0x2b4

等锁,

lock=0xFFFFFFD19C342048

lock->owner->counter =

0xFFFFFF8050081341

task:”kworker/u16:7” cpu0,调用栈:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Call trace:
__switch_to+0x244/0x460
__schedule+0x590/0xac4
schedule+0x64/0x188
__mutex_lock+0x444/0x998
__mutex_lock_slowpath+0x14/0x20
regulator_lock_dependent+0x6c/0x378
regulator_set_voltage+0x44/0xa0
clk_aggregate_vdd+0xd8/0x1c0 [clk_qcom]
clk_unvote_vdd_level+0x88/0x238 [clk_qcom]
clk_unprepare_regmap+0x20/0x2c [clk_qcom]
clk_core_unprepare+0x94/0x2b0
clk_core_unprepare+0xb4/0x2b0
clk_unprepare+0x118/0x248
clk_bulk_unprepare+0x30/0x4c
gen7_gmu_power_off+0x288/0x334 [msm_kgsl]
gen7_power_off+0xac/0x37c [msm_kgsl]

等锁:

lock=0xFFFFFFD19C343EC8

lock->owner->counter =

0xFFFFFF8050744D01

task:”crtc_commit:160” cpu0

嗯?这个不是死机进程所等的锁么,3方死锁了。

2848_9 –等待–>crtc_commit:160 –等待–>vendor.qti.came–等待–>kworker/u16:7–等待>crtc_commit:160

至于怎么解决嘛~ 我不知道,[手动滑稽]