[Android稳定性] 第013篇 [问题篇] page allocation failure: order:0内存分配失败的异常报错

0. 问题现象

工厂老化测试中,出现卡视频现象。
插上usb可以识别到adb口。

1. 问题分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
[ 7952.504953] warn_alloc: 427 callbacks suppressed
[ 7952.504959] kworker/7:1H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
[ 7952.524812] CPU: 7 PID: 12247 Comm: kworker/7:1H Tainted: G OE 6.1.90-android14-11-maybe-dirty-qki-consolidate #1
[ 7952.524824] Hardware name: Qualcomm Technologies, Inc. Spring QRD (DT)
[ 7952.524829] Workqueue: kverityd verity_work
[ 7952.524841] Call trace:
[ 7952.524843] dump_backtrace+0xfc/0x150
[ 7952.524851] show_stack+0x18/0x24
[ 7952.524855] dump_stack_lvl+0x8c/0xf8
[ 7952.524860] warn_alloc+0xfc/0x170
[ 7952.524865] __alloc_pages_slowpath+0x13b4/0x1668
[ 7952.524870] __alloc_pages+0x1d8/0x284
[ 7952.524874] __erofs_allocpage+0x88/0xb8
[ 7952.524879] z_erofs_lz4_decompress+0x1f4/0x7b0
[ 7952.524883] z_erofs_decompress_queue+0xc20/0xd24
[ 7952.524888] z_erofs_decompressqueue_work+0x34/0x74
[ 7952.524892] z_erofs_decompress_kickoff+0x164/0x1c8
[ 7952.524896] z_erofs_submissionqueue_endio+0x134/0x15c
[ 7952.524900] bio_endio+0x1d4/0x210
[ 7952.524904] __dm_io_complete+0x228/0x2a0
[ 7952.524910] clone_endio+0xe0/0x238
[ 7952.524914] bio_endio+0x1d4/0x210
[ 7952.524917] verity_work+0x668/0x6b4
[ 7952.524920] process_one_work+0x26c/0x678
[ 7952.524925] worker_thread+0x234/0x490
[ 7952.524929] kthread+0x108/0x148
[ 7952.524932] ret_from_fork+0x10/0x20
[ 7952.524937] Mem-Info:
[ 7952.524942] active_anon:337729 inactive_anon:811584 isolated_anon:204
active_file:40432 inactive_file:39976 isolated_file:0
unevictable:14772 dirty:249 writeback:2
slab_reclaimable:120090 slab_unreclaimable:181283
mapped:60908 shmem:2324 pagetables:22362
sec_pagetables:0 bounce:0
kernel_misc_reclaimable:10068
free:4213 free_pcp:407 free_cma:0
[ 7952.524950] Node 0 active_anon:1350916kB inactive_anon:3246336kB active_file:161728kB inactive_file:159904kB unevictable:59088kB isolated(anon):816kB isolated(file):0kB mapped:243632kB dirty:996kB writeback:8kB shmem:9296kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:40896kB shadow_call_stack:10256kB pagetables:89448kB sec_pagetables:0kB all_unreclaimable? no
[ 7952.524958] Normal free:16852kB boost:0kB min:11584kB low:41940kB high:72296kB reserved_highatomic:4096KB active_anon:1350360kB inactive_anon:3246732kB active_file:162768kB inactive_file:160608kB unevictable:59088kB writepending:1004kB present:7847744kB managed:7589764kB mlocked:59084kB bounce:0kB free_pcp:1628kB local_pcp:244kB free_cma:0kB
[ 7952.524965] lowmem_reserve[]: 0 0 0 0
[ 7952.524983] Normal: 128*4kB (UMEH) 800*8kB (UMH) 297*16kB (UMEH) 98*32kB (UME) 13*64kB (UE) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15632kB
[ 7952.525040] 100864 total pagecache pages
[ 7952.525044] 4124 pages in swap cache
[ 7952.525047] Free swap = 934856kB
[ 7952.525051] Total swap = 4194300kB
[ 7952.525054] 1961936 pages RAM
[ 7952.525058] 0 pages HighMem/MovableOnly
[ 7952.525061] 64495 pages reserved
[ 7952.525064] 69632 pages cma reserved

频繁出现page allocation failure的现象,而且均为”order:0”!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
...
Line 161359: [ 8771.905523] kworker/7:2H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 161561: [ 8781.967872] CpuMonitorServi: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=foreground,mems_allowed=0
Line 161772: [ 8791.976045] kworker/4:1H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 162001: [ 8801.981072] kworker/6:2H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 162203: [ 8812.071066] kworker/6:2H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 162397: [ 8822.083661] kworker/7:9H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 162616: [ 8832.412856] kworker/6:5H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null)
Line 162842: [ 8842.420648] system_server: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=foreground,mems_allowed=0
Line 163047: [ 8852.594400] binder:1488_A: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=foreground,mems_allowed=0
Line 163248: [ 8862.764379] binder:1704_12: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=foreground,mems_allowed=0
Line 163503: [ 8872.918115] kworker/6:2H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 163723: [ 8883.027154] droid.launcher3: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=foreground,mems_allowed=0
Line 163910: [ 8893.121480] kworker/6:5H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 164105: [ 8903.334318] kworker/7:1H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 164306: [ 8913.360848] kworker/0:1H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 164499: [ 8923.375449] logcat: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 164702: [ 8933.415607] kworker/7:1H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 164936: [ 8943.950999] kworker/6:5H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
Line 165146: [ 8953.959363] kworker/6:5H: page allocation failure: order:0, mode:0x10800(GFP_NOWAIT|__GFP_NORETRY), nodemask=(null),cpuset=/,mems_allowed=0
...

1.1 基本信息

从上述log可以得知如下的信息:

  1. 物理内存大小

    1
    1961936 pages RAM

    1961936 页 × 4KB = 7,847,744 KB(约7.85GB)

  2. 只有一个Node 0和只有一个zone_type:ZONE_NORMAL

    1
    2
    [ 7952.524950] Node 0 active_anon:1350916kB inactive_anon:3246336kB active_file:161728kB inactive_file:159904kB unevictable:59088kB isolated(anon):816kB isolated(file):0kB mapped:243632kB dirty:996kB writeback:8kB shmem:9296kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 0kB writeback_tmp:0kB kernel_stack:40896kB shadow_call_stack:10256kB pagetables:89448kB sec_pagetables:0kB all_unreclaimable? no
    [ 7952.524958] Normal free:16852kB boost:0kB min:11584kB low:41940kB high:72296kB reserved_highatomic:4096KB active_anon:1350360kB inactive_anon:3246732kB active_file:162768kB inactive_file:160608kB unevictable:59088kB writepending:1004kB present:7847744kB managed:7589764kB mlocked:59084kB bounce:0kB free_pcp:1628kB local_pcp:244kB free_cma:0kB
  3. ZONE_NORMAL的保留页数为0

    1
    [ 7952.524965] lowmem_reserve[]: 0 0 0 0
  4. 系统内存情况

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    [ 7952.524958] Normal free:16852kB boost:0kB min:11584kB low:41940kB high:72296kB reserved_highatomic:4096KB active_anon:1350360kB inactive_anon:3246732kB active_file:162768kB inactive_file:160608kB unevictable:59088kB writepending:1004kB present:7847744kB managed:7589764kB mlocked:59084kB bounce:0kB free_pcp:1628kB local_pcp:244kB free_cma:0kB
    [ 7952.524965] lowmem_reserve[]: 0 0 0 0
    [ 7952.524983] Normal: 128*4kB (UMEH) 800*8kB (UMH) 297*16kB (UMEH) 98*32kB (UME) 13*64kB (UE) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15632kB
    [ 7952.525040] 100864 total pagecache pages
    [ 7952.525044] 4124 pages in swap cache
    [ 7952.525047] Free swap = 934856kB
    [ 7952.525051] Total swap = 4194300kB
    [ 7952.525054] 1961936 pages RAM
    [ 7952.525058] 0 pages HighMem/MovableOnly
    [ 7952.525061] 64495 pages reserved
    [ 7952.525064] 69632 pages cma reserved

    • 当前空闲内存:16852kB
    • 最低警戒水位线:11584kB
    • 低水位线:41940kB
    • 高水位线:72296kB
    • 空闲的cma内存:0kb
    • total swap:4194300kB
    • free swap:934856kB
  5. anon内存

    • active_anon: 1350916kB 表示当前有大约 1.35GB 的匿名内存被活跃进程使用。
    • inactive_anon: 3246336kB 表示有大约 3.25GB 的匿名内存不再被频繁使用,可能会被交换到磁盘中。
    • active_file: 161728kB 表示大约 161.7MB 的文件映射内存正在被活跃进程访问。
    • inactive_file: 159904kB 表示大约 159.9MB 的文件映射内存未被频繁访问,可能会被回收或交换出去。

1.2 初步分析

从内存分配的堆栈中的信息(page allocation failure: order:0),我们可以知道这是在分配一页大小的内存空间时出现的失败,当前我们系统的配置一页page为4KB大小。但是从1.1章节中出现内存分配失败时的信息中,当前4kb大小的内存是存在的!(128*4kB (UMEH))。

注释:
UMEH的含义:
U: 表示 UNMAP(未映射),指的是这些页面不是用于映射文件或共享内存的。
M: 表示 MOVABLE(可移动),指的是这些页面是可移动的,意味着内核可以将这些页面从一个物理位置移到另一个位置。
E: 表示 EMPTY(空的),意味着这些页面在当前时刻未被使用或者是空的。
UMEH 和其他标记的解释:
UMEH:这类页面是未映射的、可移动的且为空的页面。
UMH:这类页面是未映射的、不可移动的页面。
UME:这类页面是未映射的、可移动的页面,但可能不为空。
UE:这类页面是不可映射的、空的页面。

GFP_NOWAIT:表示内存分配函数不应阻塞。如果无法立即满足内存分配请求,函数应该返回 NULL,而不是阻塞等待内存释放。因此,内核在分配内存时不会尝试等待或进行等待队列操作。
__GFP_NORETRY:表示即使内存分配失败,内核也不应尝试重试。这通常用于内存分配失败时避免重复的重试操作。

内存分配标志 GFP_NOWAIT|__GFP_NORETRY表示希望进行一个非阻塞的内存分配,这通常会选择那些当前空闲且未被映射的内存区域来分配。而 UMEH 标记的页面恰好符合这些条件,因此 GFP_NOWAIT|__GFP_NORETRY 标志的内存分配请求是可以分配到 UMEH 页面的。

还剩余128个完全可以分配的4kb内存,所以问题就很奇怪,为何内核在内存充足的情况下无法分配内存呢?

2. 问题澄清


erofs文件系统在执行decompress的时候,会根据内存情况使用GFP_NOWAIT标签进行内存申请,减少分配page的压力(查询此change在开源社区引入)。

当使用此标签时,低内存出现无法分配时,会立即返回NULL,不阻塞等待,不允许直接回收内存,只能通过唤醒kswap进程进行异步回收,会存在在分配过程中,系统内存的水位线是不满足要求的,只是到分配内存失败,通过warn_alloc(gfp_mask, ac->nodemask,“page allocation failure: order:%u”, order)打印出相关内存信息时,可能kswap进程已经回收到一部分内存到空闲链表上了,因此打印出来的内存状态会相对好一些。