Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XmlLayoutSerialer memory leak #356

Closed
pkindruk opened this issue May 25, 2022 · 5 comments
Closed

XmlLayoutSerialer memory leak #356

pkindruk opened this issue May 25, 2022 · 5 comments

Comments

@pkindruk
Copy link
Contributor

Pull request #308 introduced memory leak each time you serialize/deserialize layout. Affected versions: 4.60.1 and above.
This behavior is persistent across multiple runtimes net48, netcoreapp3.1, net 5.0, net 6.0.
Our projects has auto-save layout feature so memory grows pretty quickly.

Plain C# repro:

using System;
using System.IO;
using System.Xml.Serialization;

namespace Net5Sandbox
{
    public class XmlRoot
    {
        public int Id { get; set; }
    }

    class Program
    {
        static string Serialize(XmlRoot dataObj)
        {
            //var serializer = new XmlSerializer(typeof(XmlRoot)); //doesn't leak
            var serializer = XmlSerializer.FromTypes(new[] { typeof(XmlRoot) })[0]; //leaks
            using (var stream = new StringWriter())
            {
                serializer.Serialize(stream, dataObj);
                return stream.ToString();
            }
        }


        static void Main(string[] args)
        {
            const int repeatCount = 10000;

            var totalLength = 0L;
            var dataObj = new XmlRoot { Id = 1 };
            for (int i = 0; i < repeatCount; i++)
            {
                totalLength += Serialize(dataObj).Length;
            }

            Console.WriteLine($"Done total length = {totalLength}");

            Console.WriteLine(GC.GetTotalMemory(false));
            Console.WriteLine("Running GC");
            for (int i = 0; i < 10; i++)
            {
                GC.Collect(2, GCCollectionMode.Forced, true, true);
            }
            Console.WriteLine("Done");
            Console.WriteLine(GC.GetTotalMemory(false));
            Console.ReadKey();
        }
    }
}

Final console app memory consumption is around 500MiB depending on runtime. However GC.GetTotalMemory() shows that it is not a object allocation issue. Console output:

Done total length = 1670000
13077480
Running GC
Done
4912128

Debug output with XmlSerializer.FromTypes(Type[]) contains:

... ommited for clarity
'Net5Sandbox.exe' (CoreCLR: clrhost): Loaded 'Microsoft.GeneratedCode'. 
... (10000 times)
'Net5Sandbox.exe' (CoreCLR: clrhost): Loaded 'Microsoft.GeneratedCode'. 
... ommited for clarity

While debug output with XmlSerializer.ctor(Type) contains:

... ommited for clarity
'Net5Sandbox.exe' (CoreCLR: clrhost): Loaded 'Microsoft.GeneratedCode'.  // exactly once 
... ommited for clarity

Looks like XmlSerializer.FromTypes(Type[]) generates dynamic assembly or type with serialization logic every time we call it in runtime. Which stays there until app restart and causes memory leak.

I see a few solution to this issue:

  1. Get back to XmlSerializer.ctor(Type). However this will get us back to System.IO.FileNotFoundException which is still better than memory leak.
  2. Cache XmlSerializer.FromTypes(Type[]) result. However this requires changing serialization logic.
  3. Use Xml Serializer Generator. Never done that, especially with nuget package, might require manual .nuspec.
@pkindruk
Copy link
Contributor Author

@RadvileSaveraiteFemtika

@audryste
Copy link

audryste commented Aug 9, 2022

Well, we are no longer on that project with @RadvileSaveraiteFemtika. But @pkindruk tells true.
https://github.com/microsoft/referencesource/blob/master/System.Xml/System/Xml/Serialization/XmlSerializer.cs line 628
It is using some kind of reflection and it is slow. Even documentation tells results should be cached without explanation why... I guess now it is clear why...
the first one works fast because results are internally cached by Microsoft.

@Dirkster99 he suggested three approaches. I could make changes for the second one (caching) or you can simply do the first one: revert the changes. Waiting for you answer.

@romen-h
Copy link

romen-h commented Sep 13, 2022

It is using some kind of reflection and it is slow. Even documentation tells results should be cached without explanation why... I guess now it is clear why...

I am finding that having a debugger attached while serializing layout causes it to take several seconds to complete. Is that related to the heavy reflection and leaking? I didn't notice the memory leak myself but it would be affecting me because I do auto layout saving very often. Maybe there is an earlier package version you could suggest without the performance issue?

@audryste
Copy link

It is all written if first post @romen-h Affected versions: 4.60.1 and above. Use anything below < 4.60.1.

@Dirkster99
Copy link
Owner

@audryste If you could prepare a PR to implement the recommended implementation that comes without memory leaks, I'd be more than happy to apply it :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants