Description
Why?
The discussions and ideas in code generators and constexpr
( #16503 and
#15079 ) always leave me with some bland aftertaste... I always had the feeling that there is a LOT more than can be done to drastically improve performance. At first I thought that code-generators would be the silver bullet to save us from bad performance; but it quickly became apparent to me that we are not at the core of the problem yet.
Fortunately I have figured out some good examples.
What?
Most of the time as programmers we are perfectly aware of what code is time-critical and will be the bottleneck in our applications, but improving that can be hard, really hard actually.
I propose a combination of compiler and language features that have the potential to increase performance drastically.
By telling the compiler and runtime that some arguments to a function will remain constant for some time, the compiler and/or jit compiler can recompile a method (and all the methods it calls in turn).
Example 1 - Serialization
Here we'll serialize some type from/to JSON.
Traditionally we'd use Newtonsoft.Json or some other library and then JsonConvert.SerializeObject(obj)
.
This is bad. At every SerializeObject
call the program executes code again even though there is no reason to.
Tons of if comparisons if some options are set, null checks, ... all sorts of things that we as a human would instantly know to be always true or always false in a given context.
If there would be a way to "specialize" code it could improve performance a lot.
For example, if it were possible to do the following instead of the JsonConvert call above:
// We want to serialize the following class
class Person { public string Name; public int Age; }
var specializedSerializeMethod = Specialize(
method: JsonConvert.SerializeObject,
genericTypeArgs: new[] { typeof(Person) },
methodArgs: null,
hints: new[] { new NeverNullHint(".Name") });
var jsonText = specializedSerializeMethod(somePersonObject);
Now why is it not possible for the code inside specializedSerializeMethod
to be something like
return "{\"Name\":\"" + obj.Name + \"",\"Age\":" + obj.Age + "}";
The NeverNullHint
tells the Specialize
method that the Name
member will never be null, so if(... != null)
checks can be safely replaced with if(false)
, which can be completely optimized away.
I imagine that 90% of the code inside SerializeObject could be removed just because the Type is known beforehand (plus some hints).
The "targetted const" part from the title of this issue would be the genericTypeArgs, methodArgs and hints in the Specialize method.
Example 2 - Regex
More general than just serialization would be regex, where you have patterns.
System.Text.RegularExpressions.Regex already does something just like that, exactly for the reason that interpreted code is too slow.
You provide 'code' in the form of a regex string, then pass RegexOptions.Compiled
and it will use dynamic methods to generate a specialized version.
The only difference is that the specialized version is not generated from the regex interpreter + some "assumed to be constant" parameter.
Example 3 - Search and advanced pattern matching on data
Even more general than Regex in Example2 would be matching all sorts of patterns in all sorts of data!
Just like Regex searches for patterns in text, there are tons of other search patterns that people use every day.
For example searching for binary patterns when patching software by using delta-patches.
Also when compressing/decompressing data!
The same is true for image recognition.
To give an example without going into too much detail:
In classical image recognition and feature detection algorithms you often have loops that iterate over thousands or millions of points (pixels) and try to match patterns there.
Now if you could just pre-compile a known pattern into specialized code, you'd get enormous performance benefits. Just like in regex.
Example 4 - All sorts of interpreters!
The most general case I can think of at all, would be interpreting code, not just patterns (like regex).
Emulators for game consoles do it all the time (Emulators for the GameBoy, N64, Playstation1/2, Wii, and many many more).
They call it "dynamic recompilation".
But the same can be done for javascript and other classical programming languages.
And surprise surprise, and all major Js engines are doing just that, they generate code from known inputs.
The input being a string in javascript syntax, and the output being code instead of data. (a pointer to code that you can call).
How?
Just like generics generate new "specialized" code for different generic types the .NET runtime could generate new optimized/specialized delegates when some parameters are known.
Or even when just parts of the parameters are known, for example new ConstantValueHint(".Age", 123)
.
Now the API would be pretty difficult to design to allow for all sorts of hints.
When the created delegate is not in use anymore it would be collected by the GC eventually, that would then free the jited code as well...
Disclaimer
- I don't say this is a good idea! I want your guys opinion, so maybe we can figure out together if it is good or if its a bad idea.
- I don't have all the answers: It's quite possible that I missed some situations that would make awesome examples. And it is just as possible that I missed some critical points that would render the whole idea moot.
The only thing I do know for a fact is that there are a number of situations (some of which I listed above) where recompiling code with assumptions would help performance enourmously,
especially in the pattern image recognition part. And I know that people get performance gains of 1000% and more by "compiling" code in emulators, or in javascript.
All things that require interpreters in some form can profit from this.
Activity