-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact memory data (#101) #127
Conversation
Thanks for your PR, Francesco. I will evaluate your changes soon and let you know whether I want to merge them or not. Please be a little patient. My todo list is growing faster than I expected (which is a good thing :). |
Happy to hear that, @pemistahl. |
Hello! I realize I'm dropping in from nowhere, but I'm working on a project that involves language detection and saw this work was evolving in real time, had a little bit of spare time on a Sunday afternoon, and thought I might be able to add some value. So here I am! I was a software performance analyst in a past life, and helping think through changes like this was an important part of that role. When optimizing programs, you quickly learn that the first thing you need is data. It turns out that the Pareto principle applies to software performance: 90% of time is spent in 10% of code. It also turns out that humans -- even experts who have made a career out of optimizing programs all day long -- are very bad at guessing which 10% matters! So if we don't measure, it's really hard to know if the work we do is having the desired effect. To that end, I put together this very simple benchmark of lingua performance using JMH. There's a little bit of test harness to unpack in that repository, but most of the work is here:
Essentially, the benchmark initializes by creating a language detector for all languages with preloaded models and loading about 20K tweets pulled from social media. The benchmark itself then detects language for each tweet, over and over, as fast as it can. The JMH framework then reports performance characteristics for the benchmark. I ran the benchmark on the current code, and then on the proposed change. To be fully transparent, I did have to make a couple of (cosmetic) changes to the PR to get the benchmark to work, and you can review those changes here. (@fvasco, if you got any notifications about that work, then I apologize for the distraction, it is entirely because I have fat fingers.) Here is the performance of the current code:
Here is the performance of the current code with the proposed change:
The net results here are:
Software optimization is hard. Everything is connected to everything else, often in unpredictable ways. In this case, it looks like the change has made a nice reduction in memory usage as well as an unfortunate reduction in wall clock performance. I suspect that someone could quickly figure out what is going on with some light profiling in a nice, free tool like visualvm and make some quick performance improvements. I suspect the tool could also inform any memory usage planning, too. I would be happy to help, if that's useful. I do apologize for popping in here from nowhere. I hope it's not rude. I certainly did not pop in to criticize work (to the contrary, this is good stuff, @fvasco!), or to tell anyone what to do or not to do (@pemistahl, it's your show!). However, it's rare that I find my background to be relevant to work going on in real time, and I couldn't resist the urge to poke my head in. If this is unwelcome, then please do tell me to buzz off, and I shall! |
Hi, @sigpwned, I suspect that you measured the memory allocation rate for each run, my goal is to reduce the memory requirements, unfortunately, this library requires too much RAM, so we choose another one. I expected a performance drop, but 23% is a sensible drop, so I have decided to rework this PR to try to improve performance. If your environment is still ready, can evaluate this PR, again? Thank you in advance. |
Ah! You are correct, and that is an important difference. I think I can help there too. Hopefully I'll come back in a bit with some more information on that.
I would be very happy to! If it's useful, I'd also be happy to spend a little time in a profiler to see if I can't identify where we're losing performance. If performance (whether wall clock or memory) is becoming a major focus, then I think it might also make sense to integrate some basic benchmarks into PR checks. @pemistahl, does that sound interesting or useful? If so, I'd be happy to take a swing at it in a separate PR! |
OK! Using jol, I was able to pull object graph memory footprints. Short version here, long version below. For an object:
The object graph memory footprints are as follows: v1.1.1 in maven -- 2,281,277,024 B ≈ 2.28GB So substantially reduced memory footprint, roughly 380MB as advertised. Huge improvement, @fvasco! I've updated the code in the benchmark repo in case anyone is curious. The memory usage is a little ticklish to run because it wants elevated permissions, but it should at least be pretty clear what's going on. Full dumps are included below, in case anyone is curious. The original code has the following footprint:
The proposed code change has the following footprint:
|
src/main/kotlin/com/github/pemistahl/lingua/internal/TrainingDataLanguageModel.kt
Outdated
Show resolved
Hide resolved
src/main/kotlin/com/github/pemistahl/lingua/internal/TrainingDataLanguageModel.kt
Outdated
Show resolved
Hide resolved
@sigpwned, @Marcono1234, I changed my code to try to gain some performance, I fear that this version requires a bit more memory. |
I redesigned the frequencies' data in an n-ary search tree, the memory requirement is roughly 1.2GB. Version
Version v1.1.1-5-g2d80e26`
|
I reduced memory requirement to 440MB, this version should be faster than original both in preload and execution time. |
Thank you for your consideration, @pemistahl. I resumed the language cache with a Thread-safe one and removed the Thread Pool because it does not really improve performances but requires special care, like an explicit destructor. The opened huge problem remains the data set loading, I suppose that using the ugly Java serialization format instead of JSON may really improve the startup time without any downside (all memory optimization can be performed at learning time). Please contact me if you need further info. |
Hi @fvasco and @sigpwned, I'm very sorry for my late response. Thanks a lot for your hard work to reduce the memory footprint of the library. I've been busy doing other things recently, so I could not evaluate your changes yet. Despite all your efforts, I want to be honest with you. Especially, your code in the file The JVM has always been very memory-hungry, this is no surprise. If you have so special requirements that memory is an issue for you, you should perhaps switch to a more efficient language such as Go or Rust. You've certainly discovered my implementations of Lingua in these two other languages. But if you are bound to the JVM, perhaps it's better to use your modified version of the library exclusively in your own projects. If you plan to continue working on this PR, can you please use the branch Thanks again for your work. I really appreciate it. |
Thank you for the suggestions, @pemistahl, Accuracy is comparable to your version, so I will consider using my fork. By the way, How much memory we can recover using the Rust or the Go implementations? |
I changed the runtime memory model, the original JSON is translated to a dense map.
This reduces memory requirements at cost of speed (frequencies lookup should be slower).
Frequencies are stored as
Float
instead ofDouble
, this introduces an 0.001% error on calculation, and tests are updated accordingly.fastutil
dependency has been removed.All changes are performed in internal classes, so this request is compatible with the 1.1 version and I hope that the merge will be considered soon.