1 关键字

断连重启;内存耗尽;reboot;内存泄漏;DevEco Testing

2 问题描述

环境:
产品:****
OpenHarmony版本号:OpenHarmony 3.2release
问题现象:使用DevEco Testing测试工具进行12小时“应用遍历测试-settings”,测试过程中出现REBOOT问题

3 问题原因

3.1 正常机制

oom-killer(out of memory killer),会在系统内存耗尽的情况下,选择性的杀掉一些进程以求释放一些内存。最终OOM killer 是通过/proc/pid/oom_score
这个值来决定哪个进程被干掉的。这个值是系统综合进程的内存消耗量、CPU时间(utime+stime)、存活时间(uptime-starttime)和oom_adj计算出来的,消耗内存
越多分越高,存活时间越长分越低。

05-19 05:50:09.249     0     0 I K02600/[  pid  ]:    uid     tgid   total_vm      rss pgtables_bytes swapents oom_score_adj    name
05-19 05:50:09.260     0     0 I K02600/[    231]:   1004      231        918        7          10240       97         -1000    ueventd
05-19 05:50:09.268     0     0 I K02600/[    232]:   1100      232        808        3          12288       54         -1000    watchdog_servic
05-19 05:50:09.277     0     0 I K02600/[    240]:   1036      240       1832      545          16384       55         -1000    hilogd
05-19 05:50:09.308     0     0 I K02600/[    246]:   3044      246       1627        0          16384      102         -1000    hdf_devmgr
05-19 05:50:09.317     0     0 I K02600/[    247]:   5555      247       4313      501          24576      307         -1000    samgr
05-19 05:50:09.325     0     0 I K02600/[    248]:   1090      248       8013       44          43008      524         -1000    storage_manager
05-19 05:50:09.334     0     0 I K02600/[    252]:      0      252      56347      299         100352     2821         -1000    appspawn
05-19 05:50:09.342     0     0 I K02600/[    253]:   3062      253       3209      117          20480      174         -1000    device_manager
05-19 05:50:09.351     0     0 I K02600/[    254]:   1101      254       3185       59          20480      148         -1000    param_watcher
05-19 05:50:09.360     0     0 I K02600/[    255]:      0      255       2684       30          20480      171         -1000    storage_daemon
05-19 05:50:09.375     0     0 I K02600/[    320]:   6696      320        939        5          14336       64         -1000    uinput_inject
05-19 05:50:09.387     0     0 I K02600/[    322]:   6696      322      18169      117          57344      995         -1000    multimodalinput
05-19 05:50:09.396     0     0 I K02600/[    337]:   3333      337       2972        0          18432      223         -1000    deviceauth_serv
05-19 05:50:09.412     0     0 I K02600/[    338]:   1111      338       6410       84          34816      406         -1000    memmgrservice
05-19 05:50:09.440     0     0 I K02600/[    345]:   3049      345      13977        1          43008      766         -1000    devattest_servi
05-19 05:50:09.449     0     0 I K02600/[    346]:      0      346      19150      161          65536      992         -1000    resource_schedu
05-19 05:50:09.458     0     0 I K02600/[    347]:   1021      347      16344       48          51200      728         -1000    locationhub
05-19 05:50:09.467     0     0 I K02600/[    373]:   1041      373      20185       22          22528      286         -1000    pulseaudio
05-19 05:50:09.475     0     0 I K02600/[    380]:   3048      380       6897      233          34816      591         -1000    device_usage_st
05-19 05:50:09.484     0     0 I K02600/[    382]:   3510      382       3802        0          22528      294         -1000    huks_service
05-19 05:50:09.493     0     0 I K02600/[    411]:   1103      411      18085       91          59392      871         -1000    accessibility
05-19 05:50:09.502     0     0 I K02600/[    415]:      0      415       1024        0          10240       77         -1000    modem_control.b
05-19 05:50:09.511     0     0 I K02600/[    422]:      0      422       1302        1          12288      591         -1000    cp_diskserver.b
05-19 05:50:09.520     0     0 I K02600/[    423]:      0      423        902        0          12288       55         -1000    refnotify.bin
05-19 05:50:09.529     0     0 I K02600/[    428]:      0      428        821        0          12288       58         -1000    mlogservice.bin
05-19 05:50:09.539     0     0 I K02600/[    467]:   1099      467       5557       75          30720      535         -1000    netmanager
05-19 05:50:09.551     0     0 I K02600/[    470]:   1001      470      22822      202          77824     1844         -1000    telephony
05-19 05:50:09.561     0     0 I K02600/[    533]:   3818      533       8928        0          43008      719         -1000    screenlock_serv
05-19 05:50:09.570     0     0 I K02600/[    534]:   3051      534      17134       97          57344      787         -1000    bgtaskmgr_servi
05-19 05:50:09.585     0     0 I K02600/[    539]:   6699      539       3292        1          20480      238         -1000    msdp
05-19 05:50:09.596     0     0 I K02600/[    549]:   3817      549       9159        0          43008      760         -1000    wallpaper_servi
05-19 05:50:09.607     0     0 I K02600/[    557]:   3819      557       5338       64          30720      299         -1000    time_service
05-19 05:50:09.616     0     0 I K02600/[    558]:   3057      558       7936        0          38912      565         -1000    edm
05-19 05:50:09.632     0     0 I K02600/[    567]:   1041      567      23830       89          40960      643         -1000    audio_policy
05-19 05:50:09.641     0     0 I K02600/[    572]:   1201      572      11732      228          51200      619         -1000    hiview
05-19 05:50:09.649     0     0 I K02600/[    580]:      0      580       1085       39          14336      226         -1000    udevd
05-19 05:50:09.657     0     0 I K02600/[    581]:   1010      581        947       46          12288       22         -1000    wifi_hal_servic
05-19 05:50:09.666     0     0 I K02600/[    585]:   1088      585       1836        0          16384      125         -1000    user_auth_host
05-19 05:50:09.684     0     0 I K02600/[    592]:   3031      592       2586        0          20480      187         -1000    codec_host
05-19 05:50:09.728     0     0 I K02600/[    602]:   3035      602       1369        0          12288       99         -1000    light_host
05-19 05:50:09.737     0     0 I K02600/[    617]:   3034      617       1369        0          12288       99         -1000    vibrator_host
05-19 05:50:09.746     0     0 I K02600/[    623]:   3033      623       1420        0          12288      101         -1000    sensor_host
05-19 05:50:09.895     0     0 I K02600/[    633]:      0      633       2193        6          18432      120         -1000    riladapter_host
05-19 05:50:09.904     0     0 I K02600/[    634]:   3029      634       1422        0          14336      100         -1000    input_user_host
05-19 05:50:09.914     0     0 I K02600/[    641]:   3028      641       5158       21          32768      479         -1000    camera_host
05-19 05:50:09.923     0     0 I K02600/[    657]:   3027      657       1977        0          16384      250         -1000    audio_hdi_serve
05-19 05:50:09.932     0     0 I K02600/[    659]:   3026      659       1394        0          14336      105         -1000    wifi_host
05-19 05:50:09.941     0     0 I K02600/[    660]:   3025      660       2034       38          16384      122         -1000    power_host
05-19 05:50:09.950     0     0 I K02600/[    669]:   3023      669       1545        0          14336      116         -1000    usb_host
05-19 05:50:09.959     0     0 I K02600/[    678]:   3021      678       1416       42          12288       53         -1000    blue_host
05-19 05:50:09.967     0     0 I K02600/[    679]:   3037      679       2867       38          18432      186         -1000    disp_gralloc_ho
05-19 05:50:09.977     0     0 I K02600/[    707]:   1000      707      16633     1541          88064     1525         -1000    render_service
05-19 05:50:09.986     0     0 I K02600/[    709]:   1047      709       7979        0          40960      630         -1000    camera_service
05-19 05:50:09.996     0     0 I K02600/[    714]:   3020      714       3550      269          22528      175         -1000    accesstoken_ser
05-19 05:50:10.005     0     0 I K02600/[    740]:   3012      740       7668      135          36864      561         -1000    distributeddata
05-19 05:50:10.014     0     0 I K02600/[    752]:   1099      752       3907        5          26624      284         -1000    netsysnative
05-19 05:50:10.023     0     0 I K02600/[    767]:   1013      767       9270       43          51200      988         -1000    media_service
05-19 05:50:10.032     0     0 I K02600/[    768]:   5523      768      29659     1496         110592     2771         -1000    foundation
05-19 05:50:10.041     0     0 I K02600/[    769]:   3020      769       3316       14          20480      222         -1000    privacy_service
05-19 05:50:10.050     0     0 I K02600/[    776]:   6700      776       6087       77          34816      373         -1000    av_session
05-19 05:50:10.059     0     0 I K02600/[    792]:   1009      792       4037        0          26624      280         -1000    distributedfile
05-19 05:50:10.068     0     0 I K02600/[    797]:   1048      797       3346        0          20480      230         -1000    ui_service
05-19 05:50:10.077     0     0 I K02600/[    812]:   1010      812       5654      267          30720      303         -1000    wifi_manager_se
05-19 05:50:10.086     0     0 I K02600/[    817]:   1202      817        981       19          14336       50         -1000    faultloggerd
05-19 05:50:10.095     0     0 I K02600/[    818]:   3816      818       8208       48          40960      540         -1000    pasteboard_serv
05-19 05:50:10.104     0     0 I K02600/[    819]:   6688      819       3213        1          20480      233         -1000    sensors
05-19 05:50:10.119     0     0 I K02600/[    845]:   3060      845       1618       52          14336       90         -1000    installs
05-19 05:50:10.128     0     0 I K02600/[    874]:   3820      874       8355       56          45056      550         -1000    inputmethod_ser
05-19 05:50:10.137     0     0 I K02600/[    875]:   5522      875       4686       38          28672      312         -1000    distributedsche
05-19 05:50:10.146     0     0 I K02600/[    878]:   1212      878       3407      117          20480      165         -1000    hidumper_servic
05-19 05:50:10.155     0     0 I K02600/[    879]:   1088      879       5454        6          28672      398         -1000    useriam
05-19 05:50:10.164     0     0 I K02600/[    881]:   3058      881       5552       45          30720      502         -1000    accountmgr
05-19 05:50:10.172     0     0 I K02600/[    908]:      0      908      34567       87          49152      120         -1000    hdcd
05-19 05:50:10.180     0     0 I K02600/[    949]:   3515      949       3773       17          24576      246         -1000    cert_manager_se
05-19 05:50:10.190     0     0 I K02600/[   1096]:   1088      096       3200        1          22528      228         -1000    pinauth
05-19 05:50:10.198     0     0 I K02600/[   1315]:  10006      315      93920     2097         198656     7636          -800    com.ohos.system
05-19 05:50:10.207     0     0 I K02600/[   1432]:  10005      432      66681      256         100352     3474          -800    com.ohos.settin
05-19 05:50:10.216     0     0 I K02600/[   1557]:  20010009  1557      57247      334          86016     3103          -800    com.ohos.teleph
05-19 05:50:10.226     0     0 I K02600/[   1929]:      0     1929       4008      175          22528      200         -1000    kingkong
05-19 05:50:10.234     0     0 I K02600/[  11238]:   1102    11238       2993       39          22528      165         -1000    deviceinfoservi
05-19 05:50:10.243     0     0 I K02600/[  20222]:      0    20222     453308   154011        1742848    85207         -1000    jiaolong_y
05-19 05:50:10.273     0     0 I K02600/[  23615]:   1002    23615       8703        0          45056      750         -1000    bluetooth_servi
05-19 05:50:10.282     0     0 I K02600/[  23631]:   1024    23631       9401      162          47104      691         -1000    softbus_server
05-19 05:50:10.292     0     0 I K02600/[  12261]:   1010    12261       3244       72          18432      199         -1000    wifi_hal_servic
05-19 05:50:10.308     0     0 I K02600/[  13402]:      0    13402        868       61          10240        0         -1000    sh
05-19 05:50:10.328     0     0 I K02600/[  13417]:      0    13417        868       60          10240        0         -1000    sh
05-19 05:50:10.336     0     0 I K02600/[  13420]:   1010    13420        657      241          10240        0         -1000    dhcp_client_ser
05-19 05:50:10.345     0     0 I K02600/[  13432]:      0    13432        403       20           8192        0         -1000    date
05-19 05:50:10.353     0     0 I K02600/[  13434]:      0    13434        431       26           8192        0         -1000    hidumper

3.2 异常机制

内存耗尽后,导致进程异常,watchdog收不到心跳信号而超时,导致系统被强制重启

05-19 05:49:31.449     0     0 F K02600/kmsg: watchdog0: pretimeout event
...
05-19 05:49:41.449     0     0 F K02600/kmsg: watchdog0: pretimeout event
...
05-19 05:50:14.079     0     0 F K02600/kmsg: watchdog0: pretimeout event
...
05-19 05:51:10.840     0     0 F K02600/kmsg: watchdog0: pretimeout event

4 解决方案

当前问题产生的原因是DevEco Testing测试工具的jiaolong_y进程有内存泄漏,导致内存耗尽,而jiaolong_y进程级别又比较高不会被杀掉,最终导致内存耗尽系统崩溃而出现REBOOT问题。 DevEco Testing工具的问题我们无法修改已联系DevEco Testing团队推动修改。

5 定位过程

1. 查找是什么地方调用了reboot,通过如下日志可大致确定reboot的时间点 

2. 查看此时间点附近的内核日志,有下图可见系统内存已耗尽,正在杀进程,同时watchdog0也已经打印多次预超时时间

3. 查看内存泄漏检测工具获取到的数据,寻找内存泄漏或占用内存较大的进程,经查,并未发现明确内存泄漏的进程,也未发现内存占用过大的进程

4. 查看内核日志中,杀进程时打印的进程数据,经统计排查,jiaolong_y进程占用了过多内存,如下图对比,可以判断此进程存在内存泄漏,而此进程的oom_score_adj值为-1000,不会被杀掉,由此判断reboot问题的根因在此进程

5. 查找jiaolong_y进程所属模块,在OHOS代码中未发现相关内容,然后在设备中查找jiaolong_y相关信息,发现其作用为获取cpu、内存等信息,初步判断此进程为内存泄漏检测工具或DevEco Testing测试工具带进来的进程,最终确认是DevEco Testing的进程 

 

6 知识分享

malloc返回非空指针,并不一定意味着指向的内存就是可用的,Linux允许程序申请比系统内存更多的内存,这个特性叫做overcommit。这样做是出于系统优化,因为不是所有的程序申请了内存都会立即使用,当使用的时候系统可能已经回收了一些资源。当你用到这个overcommit给你的内存的时候,系统还没有资源的话,就会返回错误。

 

 

 

Logo

社区规范:仅讨论OpenHarmony相关问题。

更多推荐