稳定性压测中reboot问题分析报告
1 关键字 断连重启;内存耗尽;reboot;内存泄漏;DevEco Testing 2 问题描述 环境: 产品:**** OpenHarmony版本号:OpenHarmony 3.2release 问题现象:使用DevEco Testing测试工具进行12小时“应用遍历测试-settings”,测试过程中出现REBOOT问题 3 问题原因 3.1 正常机制 oom-killer(out of m
1 关键字
断连重启;内存耗尽;reboot;内存泄漏;DevEco Testing
2 问题描述
环境:
产品:****
OpenHarmony版本号:OpenHarmony 3.2release
问题现象:使用DevEco Testing测试工具进行12小时“应用遍历测试-settings”,测试过程中出现REBOOT问题
3 问题原因
3.1 正常机制
oom-killer(out of memory killer),会在系统内存耗尽的情况下,选择性的杀掉一些进程以求释放一些内存。最终OOM killer 是通过/proc/pid/oom_score
这个值来决定哪个进程被干掉的。这个值是系统综合进程的内存消耗量、CPU时间(utime+stime)、存活时间(uptime-starttime)和oom_adj计算出来的,消耗内存
越多分越高,存活时间越长分越低。
05-19 05:50:09.249 0 0 I K02600/[ pid ]: uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
05-19 05:50:09.260 0 0 I K02600/[ 231]: 1004 231 918 7 10240 97 -1000 ueventd
05-19 05:50:09.268 0 0 I K02600/[ 232]: 1100 232 808 3 12288 54 -1000 watchdog_servic
05-19 05:50:09.277 0 0 I K02600/[ 240]: 1036 240 1832 545 16384 55 -1000 hilogd
05-19 05:50:09.308 0 0 I K02600/[ 246]: 3044 246 1627 0 16384 102 -1000 hdf_devmgr
05-19 05:50:09.317 0 0 I K02600/[ 247]: 5555 247 4313 501 24576 307 -1000 samgr
05-19 05:50:09.325 0 0 I K02600/[ 248]: 1090 248 8013 44 43008 524 -1000 storage_manager
05-19 05:50:09.334 0 0 I K02600/[ 252]: 0 252 56347 299 100352 2821 -1000 appspawn
05-19 05:50:09.342 0 0 I K02600/[ 253]: 3062 253 3209 117 20480 174 -1000 device_manager
05-19 05:50:09.351 0 0 I K02600/[ 254]: 1101 254 3185 59 20480 148 -1000 param_watcher
05-19 05:50:09.360 0 0 I K02600/[ 255]: 0 255 2684 30 20480 171 -1000 storage_daemon
05-19 05:50:09.375 0 0 I K02600/[ 320]: 6696 320 939 5 14336 64 -1000 uinput_inject
05-19 05:50:09.387 0 0 I K02600/[ 322]: 6696 322 18169 117 57344 995 -1000 multimodalinput
05-19 05:50:09.396 0 0 I K02600/[ 337]: 3333 337 2972 0 18432 223 -1000 deviceauth_serv
05-19 05:50:09.412 0 0 I K02600/[ 338]: 1111 338 6410 84 34816 406 -1000 memmgrservice
05-19 05:50:09.440 0 0 I K02600/[ 345]: 3049 345 13977 1 43008 766 -1000 devattest_servi
05-19 05:50:09.449 0 0 I K02600/[ 346]: 0 346 19150 161 65536 992 -1000 resource_schedu
05-19 05:50:09.458 0 0 I K02600/[ 347]: 1021 347 16344 48 51200 728 -1000 locationhub
05-19 05:50:09.467 0 0 I K02600/[ 373]: 1041 373 20185 22 22528 286 -1000 pulseaudio
05-19 05:50:09.475 0 0 I K02600/[ 380]: 3048 380 6897 233 34816 591 -1000 device_usage_st
05-19 05:50:09.484 0 0 I K02600/[ 382]: 3510 382 3802 0 22528 294 -1000 huks_service
05-19 05:50:09.493 0 0 I K02600/[ 411]: 1103 411 18085 91 59392 871 -1000 accessibility
05-19 05:50:09.502 0 0 I K02600/[ 415]: 0 415 1024 0 10240 77 -1000 modem_control.b
05-19 05:50:09.511 0 0 I K02600/[ 422]: 0 422 1302 1 12288 591 -1000 cp_diskserver.b
05-19 05:50:09.520 0 0 I K02600/[ 423]: 0 423 902 0 12288 55 -1000 refnotify.bin
05-19 05:50:09.529 0 0 I K02600/[ 428]: 0 428 821 0 12288 58 -1000 mlogservice.bin
05-19 05:50:09.539 0 0 I K02600/[ 467]: 1099 467 5557 75 30720 535 -1000 netmanager
05-19 05:50:09.551 0 0 I K02600/[ 470]: 1001 470 22822 202 77824 1844 -1000 telephony
05-19 05:50:09.561 0 0 I K02600/[ 533]: 3818 533 8928 0 43008 719 -1000 screenlock_serv
05-19 05:50:09.570 0 0 I K02600/[ 534]: 3051 534 17134 97 57344 787 -1000 bgtaskmgr_servi
05-19 05:50:09.585 0 0 I K02600/[ 539]: 6699 539 3292 1 20480 238 -1000 msdp
05-19 05:50:09.596 0 0 I K02600/[ 549]: 3817 549 9159 0 43008 760 -1000 wallpaper_servi
05-19 05:50:09.607 0 0 I K02600/[ 557]: 3819 557 5338 64 30720 299 -1000 time_service
05-19 05:50:09.616 0 0 I K02600/[ 558]: 3057 558 7936 0 38912 565 -1000 edm
05-19 05:50:09.632 0 0 I K02600/[ 567]: 1041 567 23830 89 40960 643 -1000 audio_policy
05-19 05:50:09.641 0 0 I K02600/[ 572]: 1201 572 11732 228 51200 619 -1000 hiview
05-19 05:50:09.649 0 0 I K02600/[ 580]: 0 580 1085 39 14336 226 -1000 udevd
05-19 05:50:09.657 0 0 I K02600/[ 581]: 1010 581 947 46 12288 22 -1000 wifi_hal_servic
05-19 05:50:09.666 0 0 I K02600/[ 585]: 1088 585 1836 0 16384 125 -1000 user_auth_host
05-19 05:50:09.684 0 0 I K02600/[ 592]: 3031 592 2586 0 20480 187 -1000 codec_host
05-19 05:50:09.728 0 0 I K02600/[ 602]: 3035 602 1369 0 12288 99 -1000 light_host
05-19 05:50:09.737 0 0 I K02600/[ 617]: 3034 617 1369 0 12288 99 -1000 vibrator_host
05-19 05:50:09.746 0 0 I K02600/[ 623]: 3033 623 1420 0 12288 101 -1000 sensor_host
05-19 05:50:09.895 0 0 I K02600/[ 633]: 0 633 2193 6 18432 120 -1000 riladapter_host
05-19 05:50:09.904 0 0 I K02600/[ 634]: 3029 634 1422 0 14336 100 -1000 input_user_host
05-19 05:50:09.914 0 0 I K02600/[ 641]: 3028 641 5158 21 32768 479 -1000 camera_host
05-19 05:50:09.923 0 0 I K02600/[ 657]: 3027 657 1977 0 16384 250 -1000 audio_hdi_serve
05-19 05:50:09.932 0 0 I K02600/[ 659]: 3026 659 1394 0 14336 105 -1000 wifi_host
05-19 05:50:09.941 0 0 I K02600/[ 660]: 3025 660 2034 38 16384 122 -1000 power_host
05-19 05:50:09.950 0 0 I K02600/[ 669]: 3023 669 1545 0 14336 116 -1000 usb_host
05-19 05:50:09.959 0 0 I K02600/[ 678]: 3021 678 1416 42 12288 53 -1000 blue_host
05-19 05:50:09.967 0 0 I K02600/[ 679]: 3037 679 2867 38 18432 186 -1000 disp_gralloc_ho
05-19 05:50:09.977 0 0 I K02600/[ 707]: 1000 707 16633 1541 88064 1525 -1000 render_service
05-19 05:50:09.986 0 0 I K02600/[ 709]: 1047 709 7979 0 40960 630 -1000 camera_service
05-19 05:50:09.996 0 0 I K02600/[ 714]: 3020 714 3550 269 22528 175 -1000 accesstoken_ser
05-19 05:50:10.005 0 0 I K02600/[ 740]: 3012 740 7668 135 36864 561 -1000 distributeddata
05-19 05:50:10.014 0 0 I K02600/[ 752]: 1099 752 3907 5 26624 284 -1000 netsysnative
05-19 05:50:10.023 0 0 I K02600/[ 767]: 1013 767 9270 43 51200 988 -1000 media_service
05-19 05:50:10.032 0 0 I K02600/[ 768]: 5523 768 29659 1496 110592 2771 -1000 foundation
05-19 05:50:10.041 0 0 I K02600/[ 769]: 3020 769 3316 14 20480 222 -1000 privacy_service
05-19 05:50:10.050 0 0 I K02600/[ 776]: 6700 776 6087 77 34816 373 -1000 av_session
05-19 05:50:10.059 0 0 I K02600/[ 792]: 1009 792 4037 0 26624 280 -1000 distributedfile
05-19 05:50:10.068 0 0 I K02600/[ 797]: 1048 797 3346 0 20480 230 -1000 ui_service
05-19 05:50:10.077 0 0 I K02600/[ 812]: 1010 812 5654 267 30720 303 -1000 wifi_manager_se
05-19 05:50:10.086 0 0 I K02600/[ 817]: 1202 817 981 19 14336 50 -1000 faultloggerd
05-19 05:50:10.095 0 0 I K02600/[ 818]: 3816 818 8208 48 40960 540 -1000 pasteboard_serv
05-19 05:50:10.104 0 0 I K02600/[ 819]: 6688 819 3213 1 20480 233 -1000 sensors
05-19 05:50:10.119 0 0 I K02600/[ 845]: 3060 845 1618 52 14336 90 -1000 installs
05-19 05:50:10.128 0 0 I K02600/[ 874]: 3820 874 8355 56 45056 550 -1000 inputmethod_ser
05-19 05:50:10.137 0 0 I K02600/[ 875]: 5522 875 4686 38 28672 312 -1000 distributedsche
05-19 05:50:10.146 0 0 I K02600/[ 878]: 1212 878 3407 117 20480 165 -1000 hidumper_servic
05-19 05:50:10.155 0 0 I K02600/[ 879]: 1088 879 5454 6 28672 398 -1000 useriam
05-19 05:50:10.164 0 0 I K02600/[ 881]: 3058 881 5552 45 30720 502 -1000 accountmgr
05-19 05:50:10.172 0 0 I K02600/[ 908]: 0 908 34567 87 49152 120 -1000 hdcd
05-19 05:50:10.180 0 0 I K02600/[ 949]: 3515 949 3773 17 24576 246 -1000 cert_manager_se
05-19 05:50:10.190 0 0 I K02600/[ 1096]: 1088 096 3200 1 22528 228 -1000 pinauth
05-19 05:50:10.198 0 0 I K02600/[ 1315]: 10006 315 93920 2097 198656 7636 -800 com.ohos.system
05-19 05:50:10.207 0 0 I K02600/[ 1432]: 10005 432 66681 256 100352 3474 -800 com.ohos.settin
05-19 05:50:10.216 0 0 I K02600/[ 1557]: 20010009 1557 57247 334 86016 3103 -800 com.ohos.teleph
05-19 05:50:10.226 0 0 I K02600/[ 1929]: 0 1929 4008 175 22528 200 -1000 kingkong
05-19 05:50:10.234 0 0 I K02600/[ 11238]: 1102 11238 2993 39 22528 165 -1000 deviceinfoservi
05-19 05:50:10.243 0 0 I K02600/[ 20222]: 0 20222 453308 154011 1742848 85207 -1000 jiaolong_y
05-19 05:50:10.273 0 0 I K02600/[ 23615]: 1002 23615 8703 0 45056 750 -1000 bluetooth_servi
05-19 05:50:10.282 0 0 I K02600/[ 23631]: 1024 23631 9401 162 47104 691 -1000 softbus_server
05-19 05:50:10.292 0 0 I K02600/[ 12261]: 1010 12261 3244 72 18432 199 -1000 wifi_hal_servic
05-19 05:50:10.308 0 0 I K02600/[ 13402]: 0 13402 868 61 10240 0 -1000 sh
05-19 05:50:10.328 0 0 I K02600/[ 13417]: 0 13417 868 60 10240 0 -1000 sh
05-19 05:50:10.336 0 0 I K02600/[ 13420]: 1010 13420 657 241 10240 0 -1000 dhcp_client_ser
05-19 05:50:10.345 0 0 I K02600/[ 13432]: 0 13432 403 20 8192 0 -1000 date
05-19 05:50:10.353 0 0 I K02600/[ 13434]: 0 13434 431 26 8192 0 -1000 hidumper
3.2 异常机制
内存耗尽后,导致进程异常,watchdog收不到心跳信号而超时,导致系统被强制重启
05-19 05:49:31.449 0 0 F K02600/kmsg: watchdog0: pretimeout event
...
05-19 05:49:41.449 0 0 F K02600/kmsg: watchdog0: pretimeout event
...
05-19 05:50:14.079 0 0 F K02600/kmsg: watchdog0: pretimeout event
...
05-19 05:51:10.840 0 0 F K02600/kmsg: watchdog0: pretimeout event
4 解决方案
当前问题产生的原因是DevEco Testing测试工具的jiaolong_y进程有内存泄漏,导致内存耗尽,而jiaolong_y进程级别又比较高不会被杀掉,最终导致内存耗尽系统崩溃而出现REBOOT问题。 DevEco Testing工具的问题我们无法修改已联系DevEco Testing团队推动修改。
5 定位过程
1. 查找是什么地方调用了reboot,通过如下日志可大致确定reboot的时间点
2. 查看此时间点附近的内核日志,有下图可见系统内存已耗尽,正在杀进程,同时watchdog0也已经打印多次预超时时间
3. 查看内存泄漏检测工具获取到的数据,寻找内存泄漏或占用内存较大的进程,经查,并未发现明确内存泄漏的进程,也未发现内存占用过大的进程
4. 查看内核日志中,杀进程时打印的进程数据,经统计排查,jiaolong_y进程占用了过多内存,如下图对比,可以判断此进程存在内存泄漏,而此进程的oom_score_adj值为-1000,不会被杀掉,由此判断reboot问题的根因在此进程
5. 查找jiaolong_y进程所属模块,在OHOS代码中未发现相关内容,然后在设备中查找jiaolong_y相关信息,发现其作用为获取cpu、内存等信息,初步判断此进程为内存泄漏检测工具或DevEco Testing测试工具带进来的进程,最终确认是DevEco Testing的进程
6 知识分享
malloc返回非空指针,并不一定意味着指向的内存就是可用的,Linux允许程序申请比系统内存更多的内存,这个特性叫做overcommit。这样做是出于系统优化,因为不是所有的程序申请了内存都会立即使用,当使用的时候系统可能已经回收了一些资源。当你用到这个overcommit给你的内存的时候,系统还没有资源的话,就会返回错误。
更多推荐
所有评论(0)