记一次charger的无效虚拟地址的访问KE crash

top

一、问题现象

老化测试时出现黑屏现象,
9/12:
今天已经确认的现象

  1. 使用9-11的版本 72台机器出现27 个黑屏,其中25个为USB问题引起的dump(2个是电量低关机),通过LOG分析是在老化45次重启测试的时候出问题(45次重启1个半小时),9-11版本带了高通的等待probe完成 wait_for_device_probe
  2. 出现黑屏的问题是使用33瓦的充电器出现问题,5V2A 的充电器没有问题(上海没有出现黑屏的原因就是因为使用的是5V2A)
  3. 老化充电100的原因也和充电器有关,使用33瓦的时候设置了停充概率有复冲的情况已经找到原因,今天给出解决方式

9/13 验证结果:
使用0912的版本验证50PCS,出现1带黑屏,原因是低电量关机,电量在正常范围(60~80之间),出现adb 端口没有
0912 临时版本修改项

  1. 解决方案将在报错的函数前加空指针return出去和加log–吴超
  2. 修改了老化电量管控节点问题—曾祥源

9/14 验证结果:
验证老化黑屏的2个版本
工厂端:加一个判断获取到的设备只是否为空—–24/50 黑屏
上海:报错的函数前错误指针return出去 ——-4/16 黑屏 确认是USB 黑屏问题

9/15 验证结果:
上海 端 17PCS 修改了USB问题,使用11V3A,N7标准充电线(橙色数据线),出现5PCS黑屏(USB问题)
工厂 端 37PCS 熔丝版本,没有做任何修改,使用白色6A数据线,出现2PCS黑屏,2PCS主页,6PCS低电关机

二、问题分析

2.1 dmesg

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
[   92.045816][  T109] Unable to handle kernel write to read-only memory at virtual address 0000000000000450
[ 92.045820][ T109] Mem abort info:
[ 92.045821][ T109] ESR = 0x96000045
[ 92.045823][ T109] EC = 0x25: DABT (current EL), IL = 32 bits
[ 92.045827][ T109] SET = 0, FnV = 0
[ 92.045829][ T109] EA = 0, S1PTW = 0
[ 92.045831][ T109] FSC = 0x05: level 1 translation fault
[ 92.045833][ T109] Data abort info:
[ 92.045834][ T109] ISV = 0, ISS = 0x00000045
[ 92.045835][ T109] CM = 0, WnR = 1
[ 92.045838][ T109] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000817bd000
[ 92.045841][ T109] [0000000000000450] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
[ 92.045853][ T109] Internal error: Oops: 96000045 [#1] PREEMPT SMP
[ 92.046141][ T109] Skip md ftrace buffer dump for: 0x1609e0
[ 92.046147][ T109] Modules linked in: coresight_tmc(E) usb_f_gsi(E) focaltech_ts_i2c(OE) usb_f_qdss(E) wlan(OE) machine_dlkm(OE) wcd937x_dlkm(OE) rx_macro_dlkm(OE) va_macro_dlkm(OE) msm_video(OE) ipa_clientsm(OE) tx_macro_dlkm(OE) camera(OE) wcd9xxx_dlkm(OE) icnss2(OE) ipanetm(OE) bt_fm_slim(OE) qcrypto_msm_dlkm(OE) adsp_loader_dlkm(OE) rmnet_wlan(OE) swr_ctrl_dlkm(OE) aw87xxx_dlkm(OE) nxp_nci(OE) qcedev_mod_dlkm(OE) pinctrl_lpi_dlkm(OE) audio_pkt_dlkm(OE) rndisipam(OE) audio_prm_dlkm(OE) bolero_cdc_dlkm(OE) spf_core_dlkm(OE) btpower(OE) msm_drm(OE) coresight_hwevent(E) coresight_dummy(E) stm_ftrace(E) qcom_spmi_adc_tm5(E) coresight_stm(E) hdcp_qseecom_dlkm(OE) mbhc_dlkm(OE) msm_mmrm(OE) stm_nfc_i2c(OE) tz_log_dlkm(OE) rmnet_core(OE) xiaomi_touch_game(OE) xm_smart_chg(E) qce50_dlkm(OE) gpr_dlkm(OE) cnss_prealloc(OE) coresight_replicator(E) cnss_nl(OE) q6_notifier_dlkm(OE) wcd937x_slave_dlkm(OE) smcinvoke_dlkm(OE) ipam(OE) cp_qc30(E) wlan_firmware_service(OE) coresight_tgu(E)
[ 92.046250][ T109] stub_dlkm(OE) wsa881x_analog_dlkm(OE) lct_tp(OE) leds_qti_flash(E) coresight_funnel(E) msm_kgsl(OE) audpkt_ion_dlkm(OE) fs1815_dlkm(OE) lct_audio_info_dlkm(OE) wcd_core_dlkm(OE) snd_usb_audio_qmi(E) gsim(OE) auth_battery(E) q6_dlkm(OE) coresight_remote_etm(E) icc_test(E) qrtr_smd(E) stm_p_ost(E) radio_i2c_rtc6226_qca(OE) snd_event_dlkm(OE) cpufreq_ondemand(E) cnss_utils(OE) rmnet_ctl(OE) mi_thermal_interface(E) bcl_pmic5(E) swr_dlkm(OE) cpumaxfreq(E) leds_aw2016(E) qcom_lpm(E) ds28e30_verify(E) lmh_cpu_vdd_cdev(E) qcom_spmi_temp_alarm(E) bcl_soc(E) qti_qmi_cdev(E) stick_verify(E) qcom_q6v5_pas(E) qcom_cpufreq_hw_debug(E) regulator_cdev(E) xiaomi_fp(E) binder_prio(CE) q6_pdr_dlkm(OE) slg_verify(E) qpnp_smb5_main(E) qti_cpufreq_cdev(E) coresight_cti(E) coresight_tpda(E) qcom_qpnp_qg(E) pd_policy_manager(E) coresight_tpdm(E) coresight_csr(E) qcom_i2c_pmic(E) qti_devfreq_cdev(E) msm_qmp(E) qcom_dload_mode(E) qti_userspace_cdev(E) spi_msm_geni(E)
[ 92.046345][ T109] cpufreq_userspace(E) qcom_spss(E) qcom_pil_info(E) bq25960_charger(E) sc8541_charger(E) cx_ipeak_cdev(E) qti_qmi_sensor(E) qcom_q6v5(E) leds_gpio(E) stm_p_basic(E) stm_console(E) cfg80211(E) coresight(E) spidev(E) usb_f_diag(E) rpm_smd_cooling_device(E) qcom_spmi_adc5(E) battery_auth_class(E) stm_core(E) thermal_pause(E) rdbg(E) msm_lmh_dcvs(E) qti_battery_charger(E) msm_tsens_driver(E) mi_wmark(CE) qcom_pmic_voter(E) ipa_fmwk(E) leds_qpnp_vibrator_ldo(E) cpu_hotplug(E) qcom_vadc_common(E) qcom_sysmon(E) leds_qti_tri_led(E) qcom_tsens(E) qcom_pon(E) glink_pkt(E) rtc_pm8xxx(E) qcom_iommu_debug(E) reboot_mode(E) i2c_msm_geni(E) dwc3_msm(E) sg(E) pm8941_pwrkey(E) hung_task_enh(E) cx_ipeak(E) usb_bam(E) qpnp_pdphy(E) phy_msm_ssusb_qmp(E) usb_f_ccid(E) usb_f_cdev(E) pinctrl_spmi_mpp(E) qfprom_sys(E) rpm_master_stat(E) ehset(E) phy_qcom_emu(E) phy_msm_snps_hs(E) altmode_glink(E) f_fs_ipc_log(E) serial_num(E) rpm_smd_debug(E) msm_sharedmem(E) smp2p_sleepstate(E)
[ 92.046443][ T109] guestvm_loader(E) memlat(E) bwmon(E) phy_xgene(E) panel_event_notifier(E) eud(E) qcom_va_minidump(E) cdsprm(E) frpc_adsprpc(E) phy_generic(E) msm_performance(E) sps_drv(E) refgen(E) qcom_dcvs(E) phy_qcom_ufs_qmp_v4_kona(E) soc_sleep_stats(E) msm_geni_serial(E) qcom_pm8008_regulator(E) fsa4480_i2c(E) ucsi_glink(E) core_hang_detect(E) msm_gpi(E) microdump_collector(E) rq_stats(E) qcom_ramdump(E) debugcc_khaje(E) charger_ulog_glink(E) msm_memshare(E) boot_stats(E) cdsp_loader(E) pinctrl_spmi_gpio(E) phy_qcom_ufs_qmp_v4_lahaina(E) qcom_aoss(E) phy_qcom_ufs_qmp_v4(E) pwm_qti_lpg(E) qti_battery_debug(E) pmic_glink_debug(E) gpucc_khaje(E) phy_qcom_ufs_qmp_v4_waipio(E) pmic_glink(E) mdt_loader(E) phy_qcom_ufs_qmp_v4_kalama(E) dispcc_khaje(E) heap_mem_ext_v01(E) phy_qcom_ufs_qmp_v4_crow(E) msm_sysstats(E) swinfo(OE) qseecom_dlkm(OE) qrng_dlkm(OE) bootinfo(OE) perf_helper(E) zram(E) zsmalloc(E) qcom_pmu_lib(E) mi_memory(E) ant_check_div(E) ant_check(E) simtray(E)
[ 92.046535][ T109] bam_dma(E) slim_qcom_ngd_ctrl(E) slimbus(E) pdr_interface(E) qmi_helpers(E) usbpd(E) sdhci_msm(E) mem_offline(E) arm_smmu(E) ufs_qcom(E) ufshcd_crypto_qti(E) stub_regulator(E) socinfo(E) sched_walt(E) qrtr(E) spmi_pmic_arb(E) qcom_spmi_pmic(E) regmap_spmi(E) qti_regmap_debugfs(E) qcom_iommu_util(E) qcom_dma_heaps(E) msm_poweroff(E) qpnp_power_on(E) phy_qcom_ufs_qrbtc_sdm845(E) phy_qcom_ufs_qmp_v4_khaje(E) phy_qcom_ufs(E) nvmem_qcom_spmi_sdam(E) nvmem_qfprom(E) ns(E) qnoc_bengal(E) qnoc_qos_rpm(E) pinctrl_bengal(E) pinctrl_khaje(E) pinctrl_msm(E) msm_dma_iommu_mapping(E) mem_buf(E) mem_buf_dev(E) secure_buffer(E) memory_dump_v2(E) iommu_logger(E) icc_debug(E) icc_rpm(E) dcc_v2(E) cqhci(E) sdhci_msm_scaling(E) crypto_qti_common(E) crypto_qti_tz(E) clk_dummy(E) qcom_cpufreq_hw(E) gcc_khaje(E) clk_smd_rpm(E) clk_qcom(E) gdsc_regulator(E) qcom_cpu_vendor_hooks(E) qcom_logbuf_vh(E) qcom_soc_wdt(E) qcom_wdt_core(E) qcom_scm(E) rpm_smd_regulator(E) rpm_smd(E)
[ 92.046628][ T109] proxy_consumer(E) debug_regulator(E) qcom_glink_rpm(E) glink_probe(E) qcom_glink_spss(E) rproc_qcom_common(E) qcom_smd(E) qcom_glink_smem(E) qcom_glink(E) qcom_mpm(E) qcom_apcs_ipc_mailbox(E) smp2p(E) qcom_ipc_logging(E) minidump(E) smem(E) qcom_hwspinlock(E)
[ 92.046656][ T109] CPU: 5 PID: 109 Comm: kworker/u16:2 Tainted: G WC OE 5.15.94 #1
[ 92.046663][ T109] Hardware name: Qualcomm Technologies, Inc. Khaje IDP sapphire (DT)
[ 92.046666][ T109] Workqueue: dwc3_wq dwc3_resume_work.d3d24a0baa3b4d28944726b453e90dd7.cfi_jt [dwc3_msm]
[ 92.046703][ T109] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 92.046708][ T109] pc : dwc3_msm_set_max_speed+0x8c/0xbc [dwc3_msm]
[ 92.046733][ T109] lr : dwc3_msm_set_max_speed+0x78/0xbc [dwc3_msm]
[ 92.046758][ T109] sp : ffffffc0087abd10
[ 92.046761][ T109] x29: ffffffc0087abd10 x28: 0000000000000402 x27: ffffff8000023020
[ 92.046767][ T109] x26: ffffff8000119b10 x25: 00000000000a0002 x24: ffffffddee847518
[ 92.046772][ T109] x23: ffffff804982fb05 x22: 0000000000000000 x21: 0000000000000000
[ 92.046778][ T109] x20: 0000000000000003 x19: ffffff8049b30080 x18: ffffffc00862d028
[ 92.046782][ T109] x17: 303a7864695f7478 x16: 00000000000000b0 x15: ffffffddf49f50e0
[ 92.046787][ T109] x14: ffffffddf5e11be0 x13: 0000000000000000 x12: 0000000000000000
[ 92.046792][ T109] x11: 00000000ffffffff x10: ffffffddf67213e8 x9 : 114f1214b7c0a100
[ 92.046797][ T109] x8 : 0000000000000000 x7 : 78616d5f7465735f x6 : 6d736d5f33637764
[ 92.046802][ T109] x5 : ffffff81f6c467a9 x4 : ffffffddee855405 x3 : ffffffc0087ab868
[ 92.046806][ T109] x2 : 0000000000000030 x1 : 0000000000000000 x0 : 0000000000000030
[ 92.046812][ T109] Call trace:
[ 92.046814][ T109] dwc3_msm_set_max_speed+0x8c/0xbc [dwc3_msm]
[ 92.046837][ T109] dwc3_resume_work+0x14c/0x410 [dwc3_msm]
[ 92.046861][ T109] process_one_work+0x224/0x4a4
[ 92.046874][ T109] worker_thread+0x29c/0x500
[ 92.046880][ T109] kthread+0x170/0x1dc
[ 92.046885][ T109] ret_from_fork+0x10/0x20
[ 92.046894][ T109] Code: b4000148 b140051f 54000068 f9405508 (b9045114)
[ 92.046897][ T109] ---[ end trace 63bb7f701b7d6a2a ]---

从现有的dmesg log来看,

  1. 问题模块:dwc3_msm.ko
  2. 问题原因:内存异常踩踏

2.2 trace32恢复现场

1731376001945.png
1731376008793.png
问题出现在这个720行:dwc->maximum_speed = spd
转成code看一下:
1731376016033.png
从trace32上可以看到如下的信息

  1. mdwc->dwc3 = 0xFFFFFF80876B7000
  2. dwc = 0x0
  3. spd = USB_SPEED_HIGH
    我们基本可以得到这个问题的根本原因:dwc的地址为空,然后将spd的值传给一个空指针的成员
    我们可以看一下这个异常地址:
    1731376025146.png
    str w20,[x8, #0x450] : 将w20寄存器的值写入以 x8寄存器值加上0x450的值的地址中
    1731376031257.png

而x8寄存器此时为0,w20值为3,就是将3写入到x8+0x450=0x450的地址中,这个地址很明显是一个异常地址,所以在dmesg中报了如下的报错。
Unable to handle kernel write to read-only memory at virtual address 0000000000000450

三、根本原因

操作指向空指针地址的结构体成员

四、解决方案

workaround方案:将所有引用dwc的地方做空指针兼容
gerrit: https://gerrit.odm.mioffice.cn/c/kernel/msm-5.15/+/430151