Fixing memory leak in Microsoft USD

Microsoft USD is a pretty cool framework, but sometimes when some requirements are complex due to technical constraints this can suddenly change.

I’ve been working on this project for one of my customers, helping them deliver an integrated desktop built with Microsoft USD. I had delivered similar solutions in the past when the product was called CCF and CCA respectively.

.NET GC is really effective at what it does, but its nature is non-deterministic, hence we cannot tell when it will collect unused references or perform any memory re-allocation. Since Internet Explorer uses and heavily relies on COM, it is very likely that GC might move references around then causing unexpected behaviors in the application, for this reason browser instances must be pinned to prevent GC from moving these references. This is good and required for Internet Explorer, but after having analyzed and diagnosed the memory leak we learned that pinning an instance of Chromium browser (CefSharpBrowser) is a bad idea.

Below are the steps taken to diagnose and fix it:

  • We launched USD and created maximum number of sessions allowed (5 as per current configuration)
  • From USD we triggered actions (i.e.: ActionCalls) that launched Apttus. Apttus and Erply are the only two host Web applications that require ChromiumBrowser. ChromiumBrowser is implemented in CefSharp.BrowserSubprocess.exe
  • We took memory dumps of USD process for later analysis with WinDBG and SOS extension. One memory dump was taken when USD had all sessions opened and the other one when we  closed those sessions. Full dumps were taken. The one we used was the latter (upon sessions closing).
  • Once we took the dump we proceeded to run commands to determine if we had a leak, but we also had performance counters and monitored process explorer closely.
  • The first thing to get our attention were some “Zombie threads” that were in the managed heap. These are threads that have finished running but their references are still sitting on the managed heap.

ZombieThreads

  • Then we checked the memory usage and allocation of USD. We observed a relatively small heap size but with committed (allocated) memory. This is usually a symptom of a memory leak and memory fragmentation

MemorySummary

  • When this issue occurs it’s most likely caused to pinned references that effectively turned into GCHandles that are leaked

GCHandles-1    GCHandles-2

  • Up until now we have found that CefSharpBrowser (Chromium) is not releasing resources due to pinned handles, so we need to see whether bytes are being leaked as well as structures of System.Threading.OverlappedData (usually left behind when a zombie thread is created), so we dumped the managed heap and managed threads and we found some byte arrays (memory allocation) and structures of the type mentioned earlier

Dumpheap-stat   DumpHeap-Threads

  • With that information we can confirm and narrow down the leak, so next step is to check the FinalizeQueue (those are objects waiting to be collected by Garbage Collector but that haven’t been collected due most likely to pinned references). In the FinalizeQueue we see some instances of CefSharpBrowser.

FinalizeQueue 

  • Next step is to see in our code where we’re pinning these objects.
  • Since we needed to integrate disparate Web Applications using different browsers, we came up with a seamless approach/programming model to do so. We created a generic class called OpaqueBrowser<T> which is an abstraction to being able to use and handle Web browsers regardless of their type (IE or Chrome), in this way the developer focuses on the business requirements instead of the plumbing or internals of each browser. Since the GC (Garbage Collector) can arbitrarily move references and objects in a non-deterministic way that can cause weird and odd behavior. We had to prevent or let the GC know that certain objects couldn’t be moved hence we implemented it for both IE and Chromium, IE works fine and it’s expected to do so, but Chromium since it’s an external application by pinning its reference at the time of destruction even when we were releasing those handles the GC was unable to collect that memory hence leaking those resources (memory).
  • Once we amended our code to not use a GCHandle for Chromium (CefSharpBrowser) we can see how memory allocation/deallocation is working as expected. In making that tiny change we’ve fixed a memory leak but also improved the performance of the application because its resource management is being done properly.

Leave a Reply

Your email address will not be published. Required fields are marked *