-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance C Parser to allow named array bounds #754
Comments
Excellent write-up, thank you! One variation I thought of when reading through it -- probably not worth it, but I wanted to record it anyway just so we can decide about it consciously. Variation/Extension to Idea 2:There is a special case where the parser might not have to make up a new name for the bound: if the dimension was declared using an PROS:
CONS:
Overall, there is a connection to the new Maybe that code could be used in the parser (or adapted for it) to generate the same things for array sizes. This would not only give us a symbolic name for the guard, but also a symbolic name for the array size that we can use in the state relation and other places. The The |
With the V/E 2, you still need to resolve the issue of storing an extra expression along with a variable to use for bounds checking in lieu of its type, but you can indeed get away with not generating a new theorem when the entire size expression is a unat of an enum member. |
Agreed
Correct
If we only do |
Problem Statement
When trying to generalise proofs across multiple platforms or configurations, we do OK on the abstract/design side, but encounter a significant issue with
ArrayBounds
during CRefine: they show up as numbers, then get mangled viasimp
, which makes it very difficult to write a generic proof.Let's use this C code as an example:
In the above, we clearly have an enum for numDomains. Normally when used in an expression, unlike preprocessor macros, an enum will show up wrapped in its name:
This is useful for abstracting parameters, because we can bind a preprocessor macro to an enum and as long as it fits in an
int
, we can refer to it in proofs. However, we don't get to see the enum in the bounds checking. Here'supdateDom_body_def
:Note the
Guard ArrayBounds
using0x10
. There's knowledge lost between the expression used to define the array and the calculated number. There's no way to link the bound to some more abstract value/condition. Simplification rules can dig in and generate "interesting" numbers like2147483632
, or discover thati < 1
can be simplified toi = 0
making the proof specific to the number, instead of being generic innumDomains
, meaning we now need to removeWord.word_less_1
and hope for the best, etc.General idea and complication
Obviously the C parser needs to calculate the actual size of the array bounds in order to declare its type. No matter what we do, the above
domainMap
is going to end up a32 signed word[16]
(for 32-bit ints).For verification and links to other abstraction levels, we need to get hold of the expression used to calculate the size of the array in some manner.
I'm pretty sure that the C parser generates the bounds check expression from the array type rather than its name, so we run into the issue that conceptually one
'a word[16]
is not the same as another'a word[16]
when it comes to bounds calculations (which is plausible given that 16 could be anything really). This means we need to interfere with theArrayBounds
generation and make it somehow related to the specific variable name.Idea 1: Heavy annotation
This was Raf's first idea in this direction. Add an AUX annotation of some kind that we put in the C code to specify what expression we want to use directly, something like:
PROS:
CONS:
ctcb_size_bits
Idea 2: Automaic named annotation
After discussion with Raf, Gerwin proposed this: have the C parser generate a constant for that array only, and use that constant for bounds checking. E.g. for the above, something like
domainMap_array_size ≡ 0x10
.We can then separately create a
..._def'
lemma that shows the link to abstract concepts, and unfold with that when we want to be generic.PROS:
CONS:
Idea 3: Keep and throw in entire expression
This is like
Idea 2
(and might need to be a transitional stage or prototype to idea 2), but instead of generating the name, we keep the original expression, and whenever we see a bounds check, we dump the entire expression in there.The downside is that instead of 0x10 or whatever, you'll get the whole calculation, even if it's
((1 << 16) - 1) && 0xF
or something.PROS:
CONS:
The text was updated successfully, but these errors were encountered: