技术分享 KE trace32 记一次charger的无效虚拟地址的访问KE crash iliuqi 2024-11-12 2025-03-14
一、问题现象 老化测试时出现黑屏现象, 9/12: 今天已经确认的现象
使用9-11的版本 72台机器出现27 个黑屏,其中25个为USB问题引起的dump(2个是电量低关机),通过LOG分析是在老化45次重启测试的时候出问题(45次重启1个半小时),9-11版本带了高通的等待probe完成 wait_for_device_probe
出现黑屏的问题是使用33瓦的充电器出现问题,5V2A 的充电器没有问题(上海没有出现黑屏的原因就是因为使用的是5V2A)
老化充电100的原因也和充电器有关,使用33瓦的时候设置了停充概率有复冲的情况已经找到原因,今天给出解决方式
9/13 验证结果: 使用0912的版本验证50PCS,出现1带黑屏,原因是低电量关机,电量在正常范围(60~80之间),出现adb 端口没有 0912 临时版本修改项
解决方案将在报错的函数前加空指针return出去和加log–吴超
修改了老化电量管控节点问题—曾祥源
9/14 验证结果: 验证老化黑屏的2个版本 工厂端:加一个判断获取到的设备只是否为空—–24/50 黑屏 上海:报错的函数前错误指针return出去 ——-4/16 黑屏 确认是USB 黑屏问题
9/15 验证结果: 上海 端 17PCS 修改了USB问题,使用11V3A,N7标准充电线(橙色数据线),出现5PCS黑屏(USB问题) 工厂 端 37PCS 熔丝版本,没有做任何修改,使用白色6A数据线,出现2PCS黑屏,2PCS主页,6PCS低电关机
二、问题分析 2.1 dmesg 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 [ 92.045816][ T109] Unable to handle kernel write to read-only memory at virtual address 0000000000000450 [ 92.045820][ T109] Mem abort info: [ 92.045821][ T109] ESR = 0x96000045 [ 92.045823][ T109] EC = 0x25: DABT (current EL), IL = 32 bits [ 92.045827][ T109] SET = 0, FnV = 0 [ 92.045829][ T109] EA = 0, S1PTW = 0 [ 92.045831][ T109] FSC = 0x05: level 1 translation fault [ 92.045833][ T109] Data abort info: [ 92.045834][ T109] ISV = 0, ISS = 0x00000045 [ 92.045835][ T109] CM = 0, WnR = 1 [ 92.045838][ T109] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000817bd000 [ 92.045841][ T109] [0000000000000450] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [ 92.045853][ T109] Internal error: Oops: 96000045 [#1] PREEMPT SMP [ 92.046141][ T109] Skip md ftrace buffer dump for: 0x1609e0 [ 92.046147][ T109] Modules linked in: coresight_tmc(E) usb_f_gsi(E) focaltech_ts_i2c(OE) usb_f_qdss(E) wlan(OE) machine_dlkm(OE) wcd937x_dlkm(OE) rx_macro_dlkm(OE) va_macro_dlkm(OE) msm_video(OE) ipa_clientsm(OE) tx_macro_dlkm(OE) camera(OE) wcd9xxx_dlkm(OE) icnss2(OE) ipanetm(OE) bt_fm_slim(OE) qcrypto_msm_dlkm(OE) adsp_loader_dlkm(OE) rmnet_wlan(OE) swr_ctrl_dlkm(OE) aw87xxx_dlkm(OE) nxp_nci(OE) qcedev_mod_dlkm(OE) pinctrl_lpi_dlkm(OE) audio_pkt_dlkm(OE) rndisipam(OE) audio_prm_dlkm(OE) bolero_cdc_dlkm(OE) spf_core_dlkm(OE) btpower(OE) msm_drm(OE) coresight_hwevent(E) coresight_dummy(E) stm_ftrace(E) qcom_spmi_adc_tm5(E) coresight_stm(E) hdcp_qseecom_dlkm(OE) mbhc_dlkm(OE) msm_mmrm(OE) stm_nfc_i2c(OE) tz_log_dlkm(OE) rmnet_core(OE) xiaomi_touch_game(OE) xm_smart_chg(E) qce50_dlkm(OE) gpr_dlkm(OE) cnss_prealloc(OE) coresight_replicator(E) cnss_nl(OE) q6_notifier_dlkm(OE) wcd937x_slave_dlkm(OE) smcinvoke_dlkm(OE) ipam(OE) cp_qc30(E) wlan_firmware_service(OE) coresight_tgu(E) [ 92.046250][ T109] stub_dlkm(OE) wsa881x_analog_dlkm(OE) lct_tp(OE) leds_qti_flash(E) coresight_funnel(E) msm_kgsl(OE) audpkt_ion_dlkm(OE) fs1815_dlkm(OE) lct_audio_info_dlkm(OE) wcd_core_dlkm(OE) snd_usb_audio_qmi(E) gsim(OE) auth_battery(E) q6_dlkm(OE) coresight_remote_etm(E) icc_test(E) qrtr_smd(E) stm_p_ost(E) radio_i2c_rtc6226_qca(OE) snd_event_dlkm(OE) cpufreq_ondemand(E) cnss_utils(OE) rmnet_ctl(OE) mi_thermal_interface(E) bcl_pmic5(E) swr_dlkm(OE) cpumaxfreq(E) leds_aw2016(E) qcom_lpm(E) ds28e30_verify(E) lmh_cpu_vdd_cdev(E) qcom_spmi_temp_alarm(E) bcl_soc(E) qti_qmi_cdev(E) stick_verify(E) qcom_q6v5_pas(E) qcom_cpufreq_hw_debug(E) regulator_cdev(E) xiaomi_fp(E) binder_prio(CE) q6_pdr_dlkm(OE) slg_verify(E) qpnp_smb5_main(E) qti_cpufreq_cdev(E) coresight_cti(E) coresight_tpda(E) qcom_qpnp_qg(E) pd_policy_manager(E) coresight_tpdm(E) coresight_csr(E) qcom_i2c_pmic(E) qti_devfreq_cdev(E) msm_qmp(E) qcom_dload_mode(E) qti_userspace_cdev(E) spi_msm_geni(E) [ 92.046345][ T109] cpufreq_userspace(E) qcom_spss(E) qcom_pil_info(E) bq25960_charger(E) sc8541_charger(E) cx_ipeak_cdev(E) qti_qmi_sensor(E) qcom_q6v5(E) leds_gpio(E) stm_p_basic(E) stm_console(E) cfg80211(E) coresight(E) spidev(E) usb_f_diag(E) rpm_smd_cooling_device(E) qcom_spmi_adc5(E) battery_auth_class(E) stm_core(E) thermal_pause(E) rdbg(E) msm_lmh_dcvs(E) qti_battery_charger(E) msm_tsens_driver(E) mi_wmark(CE) qcom_pmic_voter(E) ipa_fmwk(E) leds_qpnp_vibrator_ldo(E) cpu_hotplug(E) qcom_vadc_common(E) qcom_sysmon(E) leds_qti_tri_led(E) qcom_tsens(E) qcom_pon(E) glink_pkt(E) rtc_pm8xxx(E) qcom_iommu_debug(E) reboot_mode(E) i2c_msm_geni(E) dwc3_msm(E) sg(E) pm8941_pwrkey(E) hung_task_enh(E) cx_ipeak(E) usb_bam(E) qpnp_pdphy(E) phy_msm_ssusb_qmp(E) usb_f_ccid(E) usb_f_cdev(E) pinctrl_spmi_mpp(E) qfprom_sys(E) rpm_master_stat(E) ehset(E) phy_qcom_emu(E) phy_msm_snps_hs(E) altmode_glink(E) f_fs_ipc_log(E) serial_num(E) rpm_smd_debug(E) msm_sharedmem(E) smp2p_sleepstate(E) [ 92.046443][ T109] guestvm_loader(E) memlat(E) bwmon(E) phy_xgene(E) panel_event_notifier(E) eud(E) qcom_va_minidump(E) cdsprm(E) frpc_adsprpc(E) phy_generic(E) msm_performance(E) sps_drv(E) refgen(E) qcom_dcvs(E) phy_qcom_ufs_qmp_v4_kona(E) soc_sleep_stats(E) msm_geni_serial(E) qcom_pm8008_regulator(E) fsa4480_i2c(E) ucsi_glink(E) core_hang_detect(E) msm_gpi(E) microdump_collector(E) rq_stats(E) qcom_ramdump(E) debugcc_khaje(E) charger_ulog_glink(E) msm_memshare(E) boot_stats(E) cdsp_loader(E) pinctrl_spmi_gpio(E) phy_qcom_ufs_qmp_v4_lahaina(E) qcom_aoss(E) phy_qcom_ufs_qmp_v4(E) pwm_qti_lpg(E) qti_battery_debug(E) pmic_glink_debug(E) gpucc_khaje(E) phy_qcom_ufs_qmp_v4_waipio(E) pmic_glink(E) mdt_loader(E) phy_qcom_ufs_qmp_v4_kalama(E) dispcc_khaje(E) heap_mem_ext_v01(E) phy_qcom_ufs_qmp_v4_crow(E) msm_sysstats(E) swinfo(OE) qseecom_dlkm(OE) qrng_dlkm(OE) bootinfo(OE) perf_helper(E) zram(E) zsmalloc(E) qcom_pmu_lib(E) mi_memory(E) ant_check_div(E) ant_check(E) simtray(E) [ 92.046535][ T109] bam_dma(E) slim_qcom_ngd_ctrl(E) slimbus(E) pdr_interface(E) qmi_helpers(E) usbpd(E) sdhci_msm(E) mem_offline(E) arm_smmu(E) ufs_qcom(E) ufshcd_crypto_qti(E) stub_regulator(E) socinfo(E) sched_walt(E) qrtr(E) spmi_pmic_arb(E) qcom_spmi_pmic(E) regmap_spmi(E) qti_regmap_debugfs(E) qcom_iommu_util(E) qcom_dma_heaps(E) msm_poweroff(E) qpnp_power_on(E) phy_qcom_ufs_qrbtc_sdm845(E) phy_qcom_ufs_qmp_v4_khaje(E) phy_qcom_ufs(E) nvmem_qcom_spmi_sdam(E) nvmem_qfprom(E) ns(E) qnoc_bengal(E) qnoc_qos_rpm(E) pinctrl_bengal(E) pinctrl_khaje(E) pinctrl_msm(E) msm_dma_iommu_mapping(E) mem_buf(E) mem_buf_dev(E) secure_buffer(E) memory_dump_v2(E) iommu_logger(E) icc_debug(E) icc_rpm(E) dcc_v2(E) cqhci(E) sdhci_msm_scaling(E) crypto_qti_common(E) crypto_qti_tz(E) clk_dummy(E) qcom_cpufreq_hw(E) gcc_khaje(E) clk_smd_rpm(E) clk_qcom(E) gdsc_regulator(E) qcom_cpu_vendor_hooks(E) qcom_logbuf_vh(E) qcom_soc_wdt(E) qcom_wdt_core(E) qcom_scm(E) rpm_smd_regulator(E) rpm_smd(E) [ 92.046628][ T109] proxy_consumer(E) debug_regulator(E) qcom_glink_rpm(E) glink_probe(E) qcom_glink_spss(E) rproc_qcom_common(E) qcom_smd(E) qcom_glink_smem(E) qcom_glink(E) qcom_mpm(E) qcom_apcs_ipc_mailbox(E) smp2p(E) qcom_ipc_logging(E) minidump(E) smem(E) qcom_hwspinlock(E) [ 92.046656][ T109] CPU: 5 PID: 109 Comm: kworker/u16:2 Tainted: G WC OE 5.15.94 #1 [ 92.046663][ T109] Hardware name: Qualcomm Technologies, Inc. Khaje IDP sapphire (DT) [ 92.046666][ T109] Workqueue: dwc3_wq dwc3_resume_work.d3d24a0baa3b4d28944726b453e90dd7.cfi_jt [dwc3_msm] [ 92.046703][ T109] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 92.046708][ T109] pc : dwc3_msm_set_max_speed+0x8c/0xbc [dwc3_msm] [ 92.046733][ T109] lr : dwc3_msm_set_max_speed+0x78/0xbc [dwc3_msm] [ 92.046758][ T109] sp : ffffffc0087abd10 [ 92.046761][ T109] x29: ffffffc0087abd10 x28: 0000000000000402 x27: ffffff8000023020 [ 92.046767][ T109] x26: ffffff8000119b10 x25: 00000000000a0002 x24: ffffffddee847518 [ 92.046772][ T109] x23: ffffff804982fb05 x22: 0000000000000000 x21: 0000000000000000 [ 92.046778][ T109] x20: 0000000000000003 x19: ffffff8049b30080 x18: ffffffc00862d028 [ 92.046782][ T109] x17: 303a7864695f7478 x16: 00000000000000b0 x15: ffffffddf49f50e0 [ 92.046787][ T109] x14: ffffffddf5e11be0 x13: 0000000000000000 x12: 0000000000000000 [ 92.046792][ T109] x11: 00000000ffffffff x10: ffffffddf67213e8 x9 : 114f1214b7c0a100 [ 92.046797][ T109] x8 : 0000000000000000 x7 : 78616d5f7465735f x6 : 6d736d5f33637764 [ 92.046802][ T109] x5 : ffffff81f6c467a9 x4 : ffffffddee855405 x3 : ffffffc0087ab868 [ 92.046806][ T109] x2 : 0000000000000030 x1 : 0000000000000000 x0 : 0000000000000030 [ 92.046812][ T109] Call trace: [ 92.046814][ T109] dwc3_msm_set_max_speed+0x8c/0xbc [dwc3_msm] [ 92.046837][ T109] dwc3_resume_work+0x14c/0x410 [dwc3_msm] [ 92.046861][ T109] process_one_work+0x224/0x4a4 [ 92.046874][ T109] worker_thread+0x29c/0x500 [ 92.046880][ T109] kthread+0x170/0x1dc [ 92.046885][ T109] ret_from_fork+0x10/0x20 [ 92.046894][ T109] Code: b4000148 b140051f 54000068 f9405508 (b9045114) [ 92.046897][ T109] ---[ end trace 63bb7f701b7d6a2a ]---
从现有的dmesg log来看,
问题模块:dwc3_msm.ko
问题原因:内存异常踩踏
2.2 trace32恢复现场 问题出现在这个720行:dwc->maximum_speed = spd 转成code看一下: 从trace32上可以看到如下的信息
mdwc->dwc3 = 0xFFFFFF80876B7000
dwc = 0x0
spd = USB_SPEED_HIGH 我们基本可以得到这个问题的根本原因:dwc的地址为空,然后将spd的值传给一个空指针的成员 我们可以看一下这个异常地址: str w20,[x8, #0x450] : 将w20寄存器的值写入以 x8寄存器值加上0x450的值的地址中
而x8寄存器此时为0,w20值为3,就是将3写入到x8+0x450=0x450的地址中,这个地址很明显是一个异常地址,所以在dmesg中报了如下的报错。 Unable to handle kernel write to read-only memory at virtual address 0000000000000450
三、根本原因 操作指向空指针地址的结构体成员
四、解决方案 workaround方案:将所有引用dwc的地方做空指针兼容 gerrit: https://gerrit.odm.mioffice.cn/c/kernel/msm-5.15/+/430151