本系列文章旨在提供定位与解决OpenHarmony应用与子系统内存泄露的常见手段与思路,将会分成几个部分来讲解。首先我们需要掌握发现内存泄漏问题的工具与方法,以及判断是否可能存在泄漏。接着需要掌握定位泄漏问题的工具,以及抓取trace、分析trace,以确定是否有泄漏问题。如果发现问题的场景过于复杂,需要通过分解问题来简化场景。最后根据trace来找到问题代码并尝试解决。

目录:

  1. 发现问题
  2. 定位问题
  3. 分析Trace
  4. 分解问题
  5. 解决问题(Native)
  6. 解决问题(NAPI&JavaScript)
  7. 解决问题(综合)

本篇提供了一些3.2 release内存泄漏的真实案例,旨在提供常见泄漏原因的解决办法。常见的泄漏问题主要分为Native代码泄漏、NAPI代码泄漏、JavaScript代码泄漏以及综合类问题。下面是Native代码中的常见泄漏场景。

智能指针

Ability与ContextDeal循环引用

相关代码在trace分析一章中已经分析过,这里就不再贴出。要解决循环引用,最常见的解决办法就是将其中一方的share_ptr改为weak_ptr。

但是,由于不管是Ability中,还是ContextDeal中,对share_ptr使用的地方有太多,要改成weak_ptr工作量太大,因此这里选择在Ability中(父类ContextContainer)增加一个DetachBaseContext函数,在其中调用ContextDeal的share_ptr的reset方法,将其引用计数减一。并在AbilityThread(其持有了Ability对象)销毁时,调用Ability的DetachBaseContext函数。

因此,在AbilityThread销毁时,会将Ability中ContextDeal的引用计数变为0,ContextDeal会被销毁,其持有的Ability指针的引用计数也会置为0,Ability同样也会被销毁。

void ContextContainer::DetachBaseContext()
{
    if (baseContext_ != nullptr) {
        baseContext_.reset();
    }
    baseContext_ = nullptr;
}
AbilityThread::~AbilityThread()
{
    currentAbility_->DetachBaseContext();
    currentAbility_.reset();
    ...
}

代码参考pr:https://gitee.com/openharmony/ability_ability_runtime/pulls/5130/files

Modifier循环引用

这个案例的解法就是将其中一方的share_ptr改为weak_ptr,可以直接参考代码:https://gitee.com/openharmony/ability_ability_runtime/pulls/5130/files

AbilityLocalRecord

在创建Ability的过程中,会将AbilityLocalRecord对象的share_ptr存放到AbilityStage中的成员map中,且在Ability销毁的过程中,没有从中移除,导致AbilityLocalRecord不会被销毁:

std::shared_ptr<AbilityRuntime::Context> OHOSApplication::AddAbilityStage(
    const std::shared_ptr<AbilityLocalRecord> &abilityRecord)
{
    ...
    abilityStage->AddAbility(token, abilityRecord);
    return abilityStage->GetContext();
}
void AbilityStage::AddAbility(const sptr<IRemoteObject> &token,
    const std::shared_ptr<AppExecFwk::AbilityLocalRecord> &abilityRecord)
{
    if (token == nullptr) {
        HILOG_ERROR("AbilityStage::AddAbility failed, token is nullptr");
        return;
    }

    if (abilityRecord == nullptr) {
        HILOG_ERROR("AbilityStage::AddAbility failed, abilityRecord is nullptr");
        return;
    }

    abilityRecords_[token] = abilityRecord;
}

因此在这个案例中,需要找到合适的时机,来将AbilityLocalRecord从map中移除,这里选择在MainThread::HandleCleanAbility中,调用OHOSApplication::CleanAbilityStage,来清理AbilityStage中的map。

void MainThread::HandleCleanAbility(const sptr<IRemoteObject> &token)
{
    ...
    abilityRecordMgr_->RemoveAbilityRecord(token);
    application_->CleanAbilityStage(token, abilityInfo);
    ...
}
void OHOSApplication::CleanAbilityStage(const sptr<IRemoteObject> &token,
    const std::shared_ptr<AbilityInfo> &abilityInfo)
{
    ...
    std::string moduleName = abilityInfo->moduleName;
    auto iterator = abilityStages_.find(moduleName);
    if (iterator != abilityStages_.end()) {
        auto abilityStage = iterator->second;
        abilityStage->RemoveAbility(token);
        if (!abilityStage->ContainsAbility()) {
            abilityStage->OnDestroy();
            abilityStages_.erase(moduleName);
        }
    }
}
void AbilityStage::RemoveAbility(const sptr<IRemoteObject> &token)
{
    if (token == nullptr) {
        HILOG_ERROR("AbilityStage::RemoveAbility failed, token is nullptr");
        return;
    }
    abilityRecords_.erase(token);
}

代码可参考:https://gitee.com/openharmony/ability_ability_runtime/pulls/5130/files

总结

智能指针的问题原理不复杂,但是如果业务场景复杂,代码量大,引用地方特别多,就只能逐个排查每个可能使引用计数+1的代码,因此在使用时要特别注意。

new/delete不匹配

LocationNapiAdapter

trace显示泄漏的对象为:auto asyncContext = new (std::nothrow) SwitchAsyncContext(env);这一句创建的对象,由于该对象是通过关键字new创建的,因此我们得查看是否每个场景下,都被delete了。

napi_value IsLocationEnabled(napi_env env, napi_callback_info info)
{
    ...

    auto asyncContext = new (std::nothrow) SwitchAsyncContext(env);
    NAPI_ASSERT(env, asyncContext != nullptr, "asyncContext is null.");
    napi_create_string_latin1(env, "isLocationEnabled", NAPI_AUTO_LENGTH, &asyncContext->resourceName);
#ifdef ENABLE_NAPI_MANAGER
    napi_value res;
    bool isEnabled = false;
    LocationErrCode errorCode = g_locatorClient->IsLocationEnabledV9(isEnabled);
    if (errorCode != ERRCODE_SUCCESS) {
        HandleSyncErrCode(env, errorCode);
        return UndefinedNapiValue(env);
    }
    NAPI_CALL(env, napi_get_boolean(env, isEnabled, &res));
    return res;
#else
    asyncContext->executeFunc = [&](void* data) -> void {
        auto context = static_cast<SwitchAsyncContext*>(data);
        context->enable = g_locatorClient->IsLocationEnabled();
        context->errCode = SUCCESS;
    };

    asyncContext->completeFunc = [&](void* data) -> void {
        auto context = static_cast<SwitchAsyncContext*>(data);
        NAPI_CALL_RETURN_VOID(context->env, napi_get_boolean(context->env, context->enable, &context->result[PARAM1]));
        LBSLOGI(LOCATOR_STANDARD, "Push IsLocationEnabled result to client");
    };

    size_t objectArgsNum = 0;
    return DoAsyncWork(env, asyncContext, argc, argv, objectArgsNum);
#endif
}

可以看到,这里有两个分支,一个ENABLE_NAPI_MANAGER,一个else。在ENABLE_NAPI_MANAGER分支中,创建的asyncContext并未被使用,就直接return了,肯定存在泄漏。那else分支中是否也存在泄漏呢?

else分支最后返回了oAsyncWork(env, asyncContext, argc, argv, objectArgsNum),我们看看其代码:

napi_value DoAsyncWork(const napi_env& env, AsyncContext* asyncContext,
    const size_t argc, const napi_value* argv, const size_t objectArgsNum)
{
    if (asyncContext == nullptr || argv == nullptr) {
        return UndefinedNapiValue(env);
    }
    if (argc > objectArgsNum) {
        InitAsyncCallBackEnv(env, asyncContext, argc, argv, objectArgsNum);
        return CreateAsyncWork(env, asyncContext);
    } else {
        napi_value promise;
        InitAsyncPromiseEnv(env, asyncContext, promise);
        CreateAsyncWork(env, asyncContext);
        return promise;
    }
}
static napi_value CreateAsyncWork(const napi_env& env, AsyncContext* asyncContext)
{
    ...
    NAPI_CALL(env, napi_create_async_work(
        env, nullptr, asyncContext->resourceName,
        [](napi_env env, void* data) {
            ...
        },
        [](napi_env env, napi_status status, void* data) {
            ...
            AsyncContext* context = static_cast<AsyncContext *>(data);
            ...
            MemoryReclamation(env, context);
        }, static_cast<void*>(asyncContext), &asyncContext->work));
    NAPI_CALL(env, napi_queue_async_work(env, asyncContext->work));
    return UndefinedNapiValue(env);
}
void MemoryReclamation(const napi_env& env, AsyncContext* context)
{
    ...
    delete context;
}

可以看到,在DoAsyncWork的调用链中,asyncContext会在异步任务结束时,被delete,因此else分支不会产生泄漏。那么,我们程序在执行时到底走的哪个分支呢?

ohos_static_library("geolocation_static") {
  include_dirs = [
    "//base/location/frameworks/js/napi/include",
    "//base/location/interfaces/inner_api/include",
  ]

  sources = [
    "//base/location/frameworks/js/napi/source/location_napi_adapter.cpp",
    "//base/location/frameworks/js/napi/source/location_napi_entry.cpp",
    "//base/location/frameworks/js/napi/source/location_napi_event.cpp",
    "//base/location/frameworks/js/napi/source/location_napi_system.cpp",
  ]

  ...

  defines = [ "ENABLE_NAPI_MANAGER" ]

  ...

  relative_install_dir = "module"
  part_name = "location"
  subsystem_name = "location"
}

在location_napi_adapter.cpp文件所在的gn文件中,定义了ENABLE_NAPI_MANAGER,因此代码执行的分支是存在泄漏的分支。如何解决呢?很简单,将new对象的语句放入else分支中即可。

代码参考:https://gitee.com/openharmony/base_location/pulls/387/files

SystemConfig

trace显示SystemConfig对象泄漏,代码如下:

WMError WindowManagerProxy::GetSystemConfig(SystemConfig& systemConfig)
{
    MessageParcel data;
    MessageParcel reply;
    MessageOption option;
    if (!data.WriteInterfaceToken(GetDescriptor())) {
        WLOGFE("WriteInterfaceToken failed");
        return WMError::WM_ERROR_IPC_FAILED;
    }
    if (Remote()->SendRequest(static_cast<uint32_t>(WindowManagerMessage::TRANS_ID_GET_SYSTEM_CONFIG),
        data, reply, option) != ERR_NONE) {
        return WMError::WM_ERROR_IPC_FAILED;
    }
    systemConfig = *(reply.ReadParcelable<SystemConfig>());
    int32_t ret = reply.ReadInt32();
    return static_cast<WMError>(ret);
}

这段代码中没有明显的new语句,关键在于reply.ReadParcelable<SystemConfig>()。ReadParcelable函数会调用SystemConfig的Unmarshalling函数,代码如下:

static SystemConfig* Unmarshalling(Parcel& parcel)
    {
        SystemConfig* config = new SystemConfig();
        config->isSystemDecorEnable_ = parcel.ReadBool();
        config->isStretchable_ = parcel.ReadBool();
        config->defaultWindowMode_ = static_cast<WindowMode>(parcel.ReadUint32());
        config->effectConfig_.fullScreenCornerRadius_ = parcel.ReadFloat();
        config->effectConfig_.splitCornerRadius_ = parcel.ReadFloat();
        config->effectConfig_.floatCornerRadius_ = parcel.ReadFloat();
        config->effectConfig_.focusedShadow_.elevation_ = parcel.ReadFloat();
        config->effectConfig_.focusedShadow_.color_ = parcel.ReadString();
        config->effectConfig_.focusedShadow_.offsetX_ = parcel.ReadFloat();
        config->effectConfig_.focusedShadow_.offsetY_ = parcel.ReadFloat();
        config->effectConfig_.focusedShadow_.alpha_ = parcel.ReadFloat();
        config->effectConfig_.unfocusedShadow_.elevation_ = parcel.ReadFloat();
        config->effectConfig_.unfocusedShadow_.color_ = parcel.ReadString();
        config->effectConfig_.unfocusedShadow_.offsetX_ = parcel.ReadFloat();
        config->effectConfig_.unfocusedShadow_.offsetY_ = parcel.ReadFloat();
        config->effectConfig_.unfocusedShadow_.alpha_ = parcel.ReadFloat();
        return config;
    }

这里通过new创建了SystemConfig对象,并将指针返回了出去。但是外部调用函数并未对该对象做管理,导致内存泄漏。解决办法可以在外部函数,通过智能指针来自动管理该对象,如下:

sptr<SystemConfig> config = reply.ReadParcelable<SystemConfig>();
systemConfig = *config;

可参考PR:https://gitee.com/openharmony/window_window_manager/pulls/2507/files

Logo

社区规范:仅讨论OpenHarmony相关问题。

更多推荐