Android稳定性 panic 异常 [Android稳定性] 第022篇 [原理篇] kernel panic的死亡信息的由来 iliuqi 2025-01-20 2025-03-14
0. 前言 内核稳定性问题复杂多样,最常见的莫过于“kernel panic”,意为“内核恐慌,不知所措”。这种情况下系统自然无法正常运转,只能自我结束生命,留下死亡信息。 诸如:
“Unable to handle kernel XXX at virtual address XXX” “undefined instruction XXX” “Bad mode in Error handler detected on CPUX, code 0xbe000011 – SError” ……
这些死亡信息是系统在什么状态下产生?如何产生?以及如何处理?
本文主要就是从这三个方面介绍,在看本章前,请确保已经看完aarch64异常模型以及Linux arm64中断处理
1. 异常处理流程 本节案例参考[Android稳定性] 第015篇 [问题篇] Unable to handle kernel NULL pointer dereference 的这个异常。
panic的异常如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 [ 9.188060][ T175] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000102 [ 9.188065][ T175] Mem abort info: [ 9.188067][ T175] ESR = 0x0000000096000005 [ 9.188069][ T175] EC = 0x25: DABT (current EL), IL = 32 bits [ 9.188072][ T175] SET = 0, FnV = 0 [ 9.188074][ T175] EA = 0, S1PTW = 0 [ 9.188075][ T175] FSC = 0x05: level 1 translation fault [ 9.188078][ T175] Data abort info: [ 9.188079][ T175] ISV = 0, ISS = 0x00000005 [ 9.188080][ T175] CM = 0, WnR = 0 [ 9.188083][ T175] user pgtable: 4k pages, 39-bit VAs, pgdp=00000000c850e000 [ 9.188086][ T175] [0000000000000102] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 [ 9.188095][ T175] Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP [ 9.188188][ T175] Dumping ftrace buffer: [ 9.188199][ T175] (ftrace buffer empty) [ 9.188845][ T175] Hardware name: Qualcomm Technologies, Inc. Spring QRD (DT) [ 9.188849][ T175] Workqueue: events power_supply_changed_work [ 9.188863][ T175] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 9.188868][ T175] pc : __queue_work+0x28/0x550 [ 9.188876][ T175] lr : queue_work_on+0x3c/0x80 [ 9.188880][ T175] sp : ffffffc00b473ca0 [ 9.188882][ T175] x29: ffffffc00b473ca0 x28: ffffff804531dbc8 x27: ffffff82f2740fa8 [ 9.188890][ T175] x26: ffffff800b791f10 x25: 0000000000000000 x24: 0000000000000007 [ 9.188896][ T175] x23: 0000000000000000 x22: 0000000000000001 x21: 0000000000000000 [ 9.188902][ T175] x20: 0000000000000000 x19: ffffff806d0f9148 x18: ffffffc00ac0d040 [ 9.188908][ T175] x17: 000000002a4cec24 x16: 000000002a4cec24 x15: 0000000000000046 [ 9.188914][ T175] x14: 0000000000000000 x13: 0000000000000ef0 x12: 0000000000000002 [ 9.188920][ T175] x11: 0000000000000000 x10: ffffffffffffd240 x9 : 000000000000001b [ 9.188926][ T175] x8 : 0000000000000001 x7 : ffffff806baa9380 x6 : 000000161b03f216 [ 9.188932][ T175] x5 : 1672031b16000000 x4 : 0080000000000000 x3 : 1b430b9338000000 [ 9.188939][ T175] x2 : ffffff806d0f9148 x1 : 0000000000000000 x0 : 0000000000000020 [ 9.188946][ T175] Call trace: [ 9.188948][ T175] __queue_work+0x28/0x550 [ 9.188953][ T175] queue_work_on+0x3c/0x80 [ 9.188957][ T175] fts_power_usb_notifier_callback+0x2c/0x40 [focaltech_spi] [ 9.189037][ T175] blocking_notifier_call_chain+0x70/0xbc [ 9.189047][ T175] power_supply_changed_work+0x7c/0xc8 [ 9.189054][ T175] process_one_work+0x1e4/0x43c [ 9.189060][ T175] worker_thread+0x25c/0x430 [ 9.189065][ T175] kthread+0x104/0x1d4 [ 9.189069][ T175] ret_from_fork+0x10/0x20 [ 9.189079][ T175] Code: a9054ff4 910003fd aa0203f3 aa0103f7 (39440828)
恐慌msg为:Unable to handle kernel NULL pointer dereference at virtual address 0000000000000102 下面我们来介绍这条语句的由来!
从aarch64异常模型以及Linux arm64中断处理 我们应该会知道有一个寄存器是用来存储异常类型的,也就是ESR寄存器(Exception Syndrome Register)。 从上面的log中我们可以知道这个异常出现时ESR寄存器的值为:0x0000000096000005
1.1 ESR寄存器的字段定义 本章截图来自于armv8-a的官方手册
我们需要关注该寄存器的 EC,bits[31:26]以及 ISS,bits[24:0] ,下面是官方对此字段的介绍
针对本文的案例中提到的ESR寄存器值:0x0000000096000005 ,EC取[31:26]
对应的EC==0b100101
,对这种类型官方文档有如下的解释:
对应的ISS==0b101
基本初步断定这个异常为Data abort
。
针对EC==0b100101
的ISS字段的解释: BIT[5:0] DFSC(Data Fault Status Code)解释了data abort发生的状态信息:
由此我们知道了如下的信息:
此异常为 0b100101 对应的为 Data Abort taken without a change in Exception level
发生的状态信息 0b000101 对应的为 Translation fault, level 1.
也就对应着log中的这部分的解释
1 2 3 4 5 6 7 8 9 [ 9.188065][ T175] Mem abort info: [ 9.188067][ T175] ESR = 0x0000000096000005 [ 9.188069][ T175] EC = 0x25: DABT (current EL), IL = 32 bits [ 9.188072][ T175] SET = 0, FnV = 0 [ 9.188074][ T175] EA = 0, S1PTW = 0 [ 9.188075][ T175] FSC = 0x05: level 1 translation fault [ 9.188078][ T175] Data abort info: [ 9.188079][ T175] ISV = 0, ISS = 0x00000005 [ 9.188080][ T175] CM = 0, WnR = 0
1.2 异常入口 每个异常都有特定的异常级别。异常所对应的异常级别是由软件编程决定,或者由异常自身性质决定的。在任何情况下,异常执行时都不会移至较低的异常级别。异常入口的基本执行内容是:
处理器状态保存到目标异常级别的SPSR_ELx中。
返回地址保存到目标异常级别的ELR_ELx中。
如果异常是同步异常或SError中断,异常的表征信息将保存在目标异常级别的ESR_ELx中。
如果是指令止异常(Instruction Abort exception),数据中止异常(Data Abort exception,),PC对齐错误异常(PC alignment fault exception),故障的虚拟地址将保存在FAR_ELx中。
堆栈指针保存到目标异常级别的专用堆栈指针寄存器SP_ELx。
执行移至目标异常级别,并从异常向量定义的地址开始执行。
1.3 异常向量表 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 SYM_CODE_START(vectors) // vectors就是异常向量表 kernel_ventry 1, t, 64, sync // Synchronous EL1t kernel_ventry 1, t, 64, irq // IRQ EL1t kernel_ventry 1, t, 64, fiq // FIQ EL1h kernel_ventry 1, t, 64, error // Error EL1t ///linux异常向量入口,这里是同步异常,kernel_ventry宏展开为el1h_64_sync kernel_ventry 1, h, 64, sync // Synchronous EL1h kernel_ventry 1, h, 64, irq // IRQ EL1h kernel_ventry 1, h, 64, fiq // FIQ EL1h kernel_ventry 1, h, 64, error // Error EL1h ///aarch64 异常向量入口,kernel_ventry宏展开为el0t_64_sync kernel_ventry 0, t, 64, sync // Synchronous 64-bit EL0 kernel_ventry 0, t, 64, irq // IRQ 64-bit EL0 kernel_ventry 0, t, 64, fiq // FIQ 64-bit EL0 kernel_ventry 0, t, 64, error // Error 64-bit EL0 ///aarch32 异常向量入口 kernel_ventry 0, t, 32, sync // Synchronous 32-bit EL0 kernel_ventry 0, t, 32, irq // IRQ 32-bit EL0 kernel_ventry 0, t, 32, fiq // FIQ 32-bit EL0 kernel_ventry 0, t, 32, error // Error 32-bit EL0 SYM_CODE_END(vectors)
用另外一张表可以更好理解这个异常向量表的入口 而在本案例中出现的Data abort异常对应的入口地址就是 0x200 最终会执行相应的异常处理函数:el1h_64_sync_handler
(调用过程中出现的macro解释见aarch64异常模型以及Linux arm64中断处理 第3.1章节)
1.4 el1h_64_sync_handler 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 asmlinkage void noinstr el1h_64_sync_handler (struct pt_regs *regs) { unsigned long esr = read_sysreg(esr_el1); switch (ESR_ELx_EC(esr)) { case ESR_ELx_EC_DABT_CUR: case ESR_ELx_EC_IABT_CUR: el1_abort(regs, esr); break ; case ESR_ELx_EC_PC_ALIGN: el1_pc(regs, esr); break ; case ESR_ELx_EC_SYS64: case ESR_ELx_EC_UNKNOWN: el1_undef(regs); break ; case ESR_ELx_EC_BREAKPT_CUR: case ESR_ELx_EC_SOFTSTP_CUR: case ESR_ELx_EC_WATCHPT_CUR: case ESR_ELx_EC_BRK64: el1_dbg(regs, esr); break ; case ESR_ELx_EC_FPAC: el1_fpac(regs, esr); break ; default : __panic_unhandled(regs, "64-bit el1h sync" , esr); } }
EC==0b100101
也就是 0x25,对应的宏就是 ESR_ELx_EC_DABT_CUR
,故函数进入el1_abort
1.5 el1_abort 1 2 3 4 5 6 7 8 9 10 static void noinstr el1_abort (struct pt_regs *regs, unsigned long esr) { unsigned long far = read_sysreg(far_el1); enter_from_kernel_mode(regs); local_daif_inherit(regs); do_mem_abort(far, esr, regs); local_daif_mask(); exit_to_kernel_mode(regs); }
1.6 do_mem_abort 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 static inline const struct fault_info *esr_to_fault_info (unsigned int esr) { return fault_info + (esr & ESR_ELx_FSC); } void do_mem_abort (unsigned long far, unsigned int esr, struct pt_regs *regs) { const struct fault_info *inf = esr_to_fault_info(esr); unsigned long addr = untagged_addr(far); if (!inf->fn(far, esr, regs)) return ; if (!user_mode(regs)) { pr_alert("Unhandled fault at 0x%016lx\n" , addr); mem_abort_decode(esr); show_pte(addr); } arm64_notify_die(inf->name, regs, inf->sig, inf->code, addr, esr); }
fault_info定义如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 static const struct fault_info fault_info [] = { { do_bad, SIGKILL, SI_KERNEL, "ttbr address size fault" }, { do_bad, SIGKILL, SI_KERNEL, "level 1 address size fault" }, { do_bad, SIGKILL, SI_KERNEL, "level 2 address size fault" }, { do_bad, SIGKILL, SI_KERNEL, "level 3 address size fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 0 translation fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 1 translation fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 2 translation fault" }, { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 3 translation fault" }, { do_bad, SIGKILL, SI_KERNEL, "unknown 8" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 1 access flag fault" }, { do_page_fault, SIGSEGV, SEGV_ACCERR, "level 2 access flag fault" }, }
ESR_EL1.FSC == 0x5 所以对应的为
1 { do_translation_fault, SIGSEGV, SEGV_MAPERR, "level 1 translation fault" }
故而函数走到do_translation_fault
执行
1.7 do_translation_fault 1 2 3 4 5 6 7 8 9 10 11 12 static int __kprobes do_translation_fault (unsigned long far, unsigned int esr, struct pt_regs *regs) { unsigned long addr = untagged_addr(far); if (is_ttbr0_addr(addr)) return do_page_fault(far, esr, regs); do_bad_area(far, esr, regs); return 0 ; }
1.8 do_bad_area 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 static void do_bad_area (unsigned long far, unsigned int esr, struct pt_regs *regs) { unsigned long addr = untagged_addr(far); if (user_mode(regs)) { const struct fault_info *inf = esr_to_fault_info(esr); set_thread_esr(addr, esr); arm64_force_sig_fault(inf->sig, inf->code, far, inf->name); } else { __do_kernel_fault(addr, esr, regs); } }
1.9 __do_kernel_fault 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 static void __do_kernel_fault(unsigned long addr, unsigned int esr, struct pt_regs *regs) { const char *msg; if (!is_el1_instruction_abort(esr) && fixup_exception(regs)) return ; if (WARN_RATELIMIT(is_spurious_el1_translation_fault(addr, esr, regs), "Ignoring spurious kernel translation fault at virtual address %016lx\n" , addr)) return ; if (is_el1_mte_sync_tag_check_fault(esr)) { do_tag_recovery(addr, esr, regs); return ; } if (is_el1_permission_fault(addr, esr, regs)) { if (esr & ESR_ELx_WNR) msg = "write to read-only memory" ; else if (is_el1_instruction_abort(esr)) msg = "execute from non-executable memory" ; else msg = "read from unreadable memory" ; } else if (addr < PAGE_SIZE) { msg = "NULL pointer dereference" ; } else { if (kfence_handle_page_fault(addr, esr & ESR_ELx_WNR, regs)) return ; msg = "paging request" ; } die_kernel_fault(msg, addr, esr, regs); }
1.10 die_kernel_fault 这个函数就对应日志中的报错信息的打印,下面逐行解释
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 static void die_kernel_fault (const char *msg, unsigned long addr, unsigned int esr, struct pt_regs *regs) { bust_spinlocks(1 ); pr_alert("Unable to handle kernel %s at virtual address %016lx\n" , msg, addr); mem_abort_decode(esr); show_pte(addr); 显示与出错地址 addr 相关的pte die("Oops" , regs, esr); bust_spinlocks(0 ); do_exit(SIGKILL); }
2. die函数 die函数最终可能会调用到panic。但die函数也不是一定会走到panic,它先是走oops流程告警系统现在的异常,如果异常发生在中断上下文,走panic。或者如果设定了CONFIG_PANIC_ON_OOPS_VALUE=y
,无论是否在中断上下文均走panic。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 void die (const char *str, struct pt_regs *regs, int err) { int ret; unsigned long flags; raw_spin_lock_irqsave(&die_lock, flags); oops_enter(); console_verbose(); bust_spinlocks(1 ); ret = __die(str, err, regs); if (regs && kexec_should_crash(current)) crash_kexec(regs); bust_spinlocks(0 ); add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE); oops_exit(); if (in_interrupt()) panic("%s: Fatal exception in interrupt" , str); if (panic_on_oops) panic("%s: Fatal exception" , str); raw_spin_unlock_irqrestore(&die_lock, flags); if (ret != NOTIFY_STOP) do_exit(SIGSEGV); }
2.1 oops_enter 1 2 3 4 5 6 7 8 9 10 void oops_enter (void ) { tracing_off(); debug_locks_off(); do_oops_enter_exit(); if (sysctl_oops_all_cpu_backtrace) trigger_all_cpu_backtrace(); }
注意:这里有一个比较重要的节点:/proc/sys/kernel/oops_all_cpu_backtrace oops_all_cpu_backtrace 的作用是:
记录每个 CPU 当前的执行状态(调用栈、寄存器等)。
在多核环境下,这对调试同步问题(如死锁或竞态条件)非常重要。
2.2 console_verbose 在需要时切换控制台到最详细的日志输出模式
1 2 3 4 5 6 7 8 9 10 11 12 static bool printk_console_no_auto_verbose;void console_verbose (void ) { if (console_loglevel && !printk_console_no_auto_verbose) console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH; } EXPORT_SYMBOL_GPL(console_verbose); module_param_named(console_no_auto_verbose, printk_console_no_auto_verbose, bool , 0644 ); MODULE_PARM_DESC(console_no_auto_verbose, "Disable console loglevel raise to highest on oops/panic/etc" );
2.3 __die 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 static int __die(const char *str, int err, struct pt_regs *regs){ static int die_counter; int ret; pr_emerg("Internal error: %s: %x [#%d]" S_PREEMPT S_SMP "\n" , str, err, ++die_counter); ret = notify_die(DIE_OOPS, str, regs, err, 0 , SIGSEGV); if (ret == NOTIFY_STOP) return ret; print_modules(); show_regs(regs); dump_kernel_instr(KERN_EMERG, regs); return ret; }
show_regs
函数有两个函数组成,分别是__show_regs
以及dump_backtrace
2.3.1 __show_regs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 void __show_regs(struct pt_regs *regs){ int i, top_reg; u64 lr, sp; if (compat_user_mode(regs)) { lr = regs->compat_lr; sp = regs->compat_sp; top_reg = 12 ; } else { lr = regs->regs[30 ]; sp = regs->sp; top_reg = 29 ; } show_regs_print_info(KERN_DEFAULT); print_pstate(regs); if (!user_mode(regs)) { printk("pc : %pS\n" , (void *)regs->pc); printk("lr : %pS\n" , (void *)ptrauth_strip_insn_pac(lr)); } else { printk("pc : %016llx\n" , regs->pc); printk("lr : %016llx\n" , lr); } printk("sp : %016llx\n" , sp); if (system_uses_irq_prio_masking()) printk("pmr_save: %08llx\n" , regs->pmr_save); i = top_reg; while (i >= 0 ) { printk("x%-2d: %016llx" , i, regs->regs[i]); while (i-- % 3 ) pr_cont(" x%-2d: %016llx" , i, regs->regs[i]); pr_cont("\n" ); } }
show_regs_print_info
函数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 void show_regs_print_info (const char *log_lvl) { dump_stack_print_info(log_lvl); } void dump_stack_print_info (const char *log_lvl) { printk("%sCPU: %d PID: %d Comm: %.20s %s%s %s %.*s" BUILD_ID_FMT "\n" , log_lvl, raw_smp_processor_id(), current->pid, current->comm, kexec_crash_loaded() ? "Kdump: loaded " : "" , print_tainted(), init_utsname()->release, (int )strcspn (init_utsname()->version, " " ), init_utsname()->version, BUILD_ID_VAL); if (dump_stack_arch_desc_str[0 ] != '\0' ) printk("%sHardware name: %s\n" , log_lvl, dump_stack_arch_desc_str); print_worker_info(log_lvl, current); print_stop_info(log_lvl, current); }
print_pstate
函数
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 static void print_pstate (struct pt_regs *regs) { u64 pstate = regs->pstate; if (compat_user_mode(regs)) { printk("pstate: %08llx (%c%c%c%c %c %s %s %c%c%c %cDIT %cSSBS)\n" , pstate, pstate & PSR_AA32_N_BIT ? 'N' : 'n' , pstate & PSR_AA32_Z_BIT ? 'Z' : 'z' , pstate & PSR_AA32_C_BIT ? 'C' : 'c' , pstate & PSR_AA32_V_BIT ? 'V' : 'v' , pstate & PSR_AA32_Q_BIT ? 'Q' : 'q' , pstate & PSR_AA32_T_BIT ? "T32" : "A32" , pstate & PSR_AA32_E_BIT ? "BE" : "LE" , pstate & PSR_AA32_A_BIT ? 'A' : 'a' , pstate & PSR_AA32_I_BIT ? 'I' : 'i' , pstate & PSR_AA32_F_BIT ? 'F' : 'f' , pstate & PSR_AA32_DIT_BIT ? '+' : '-' , pstate & PSR_AA32_SSBS_BIT ? '+' : '-' ); } else { const char *btype_str = btypes[(pstate & PSR_BTYPE_MASK) >> PSR_BTYPE_SHIFT]; printk("pstate: %08llx (%c%c%c%c %c%c%c%c %cPAN %cUAO %cTCO %cDIT %cSSBS BTYPE=%s)\n" , pstate, pstate & PSR_N_BIT ? 'N' : 'n' , pstate & PSR_Z_BIT ? 'Z' : 'z' , pstate & PSR_C_BIT ? 'C' : 'c' , pstate & PSR_V_BIT ? 'V' : 'v' , pstate & PSR_D_BIT ? 'D' : 'd' , pstate & PSR_A_BIT ? 'A' : 'a' , pstate & PSR_I_BIT ? 'I' : 'i' , pstate & PSR_F_BIT ? 'F' : 'f' , pstate & PSR_PAN_BIT ? '+' : '-' , pstate & PSR_UAO_BIT ? '+' : '-' , pstate & PSR_TCO_BIT ? '+' : '-' , pstate & PSR_DIT_BIT ? '+' : '-' , pstate & PSR_SSBS_BIT ? '+' : '-' , btype_str); } }
对应的日志如下:
1 [ 9.188863][ T175] pstate: 604000c5 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
2.3.2 dump_backtrace 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 void dump_backtrace (struct pt_regs *regs, struct task_struct *tsk, const char *loglvl) { struct stackframe frame ; int skip = 0 ; pr_debug("%s(regs = %p tsk = %p)\n" , __func__, regs, tsk); if (regs) { if (user_mode(regs)) return ; skip = 1 ; } if (!tsk) tsk = current; if (!try_get_task_stack(tsk)) return ; if (tsk == current) { start_backtrace(&frame, (unsigned long )__builtin_frame_address(0 ), (unsigned long )dump_backtrace); } else { start_backtrace(&frame, thread_saved_fp(tsk), thread_saved_pc(tsk)); } printk("%sCall trace:\n" , loglvl); do { if (!skip) { dump_backtrace_entry(frame.pc, loglvl); } else if (frame.fp == regs->regs[29 ]) { skip = 0 ; dump_backtrace_entry(regs->pc, loglvl); } } while (!unwind_frame(tsk, &frame)); put_task_stack(tsk); }
这部分就对应着日志中的:
1 2 3 4 5 6 7 8 9 10 [ 9.188946][ T175] Call trace: [ 9.188948][ T175] __queue_work+0x28/0x550 [ 9.188953][ T175] queue_work_on+0x3c/0x80 [ 9.188957][ T175] fts_power_usb_notifier_callback+0x2c/0x40 [focaltech_spi] [ 9.189037][ T175] blocking_notifier_call_chain+0x70/0xbc [ 9.189047][ T175] power_supply_changed_work+0x7c/0xc8 [ 9.189054][ T175] process_one_work+0x1e4/0x43c [ 9.189060][ T175] worker_thread+0x25c/0x430 [ 9.189065][ T175] kthread+0x104/0x1d4 [ 9.189069][ T175] ret_from_fork+0x10/0x20
dump_backtrace 的核心功能是:
初始化调用栈并遍历每一帧。
打印调用栈的详细信息(地址、寄存器上下文等)。
支持用户提供寄存器上下文(如异常发生时)或指定任务的调用栈。
处理异常情况(如跳过异常处理器帧)以精确记录调用栈。
3. panic函数 1 2 3 4 5 6 7 void die (const char *str, struct pt_regs *regs, int err) { if (panic_on_oops) panic("%s: Fatal exception" , str); }
这里涉及了一个内核参数的节点:/proc/sys/kernel/panic_on_oops
只有当此参数设置为1是 oops的报错才会触发panic的流程!!!1 而在android项目中,会在init.rc中设置此参数
1 2 3 on init //... write /proc/sys/kernel/panic_on_oops 1
panic流程本章节不再介绍,后续在整理panic流程时,会有相关文章!