TMI: When in doubt...
Jun. 30th, 2012 12:35 am
tim
All I did today was work on issue 2734, but happily, I fixed it!
I turned out to be wrong in my initial hypothesis about the bug, and that led me to spend too much time barking up the wrong tree. I'd thought the wrong tydesc (type descriptor) was getting generated for the call to perform_hax at type @int. The compiler automatically generates header fields containing tydescs to handle operations like incrementing and decrementing reference counts and freeing memory when the refcount goes to zero (also, the cycle collector, part of the RTS, needs to know about these headers as well). It avoids generating multiple tydescs with the same values at different addresses, since that would just waste memory. I'd assumed that in the case of either an iface type, or an opaque box type (the latter type isn't visible to Rust programmers but gets used internally to represent ifaces), a tydesc was getting re-used when it shouldn't have been, and that was why I was seeing behavior that looked like the str drop glue getting called on an int. (The drop glue is the bit of code that lowers the refcount for a given thing.)
But that wasn't it; commenting out the code that results in tydescs getting re-used didn't fix the bug. I'd been working on this for over a day, and I finally got frustrated enough to ask Patrick and Brian, "can you explain how, at runtime, the code knows how to free the contents of an opaque box?" Patrick said, "by pulling the tydesc out of the object", and that was more or less what I'd figured out by staring at the code, but it all looked okay to me. I started explaining the bug to Brian in the hopes that he would be able to explain the opaque box code, and he asked me whether the bug was in code not getting monomorphized correctly, or in how one of the monomorphized instances got generated. I'd assumed the latter, and said so, but that I wasn't totally sure. He said the person I really wanted to talk to was the one who's on sabbatical right now, to which I said that I was going to go drinking. And I did!
...While the friend I was drinking with was in the bathroom, I pulled out my laptop, like you do, and Brian's question came back to me. Based on some print statements I'd inserted, I'd assumed that two *different* monomorphized instances of perform_hax, were getting created one for int and one for str, but when I modified the print statement, I realized that wasn't so. For some reason, perform_hax wasn't getting monomorphized at all and both callers were just calling the original version. Why was that?
...Thanks to printing out the monomorphic function's hashed ID, I saw it was the same as the original function's def ID (defining ID), with something attached called mono_any... well, what's that? I saw the mono_any was introduced in only one place, in trans::base::make_mono_id. It gets introduced if the type parameter that a function is being monomorphized on has zero "uses", meaning its function body doesn't depend on it in any way. But why would that be? perform_hax has one type parameter, T, and the function body clearly depends on it: it has to return a record pointing to a tydesc that's correct for the contents of the contents of the opaque box (which is to say, for whatever type T gets instantiated with).
The type_uses analysis is what determines the number of uses of each type parameter -- that is, how the function body depends on each of the type parameters, and it had a bug: it wasn't looking under @s; as well, it wasn't noting the dependency of the whole function body on e's actual static type, if it has a cast like e as frobbable, if frobbable is an iface ty. In that case, you have to return a boxed record containing a ty desc that's *specific* to whatever e's type really is, so you really do depend on e's type. When I fixed these bugs, the test worked. Splendid!
And this is why I love debugging: that moment when you go from knowing nothing to complete understanding. At least complete understanding of a fix. I still don't claim to understand type_uses all that well, but in my defense, there is a comment at the beginning saying "This unfortunately depends on quite a bit of knowledge about the details of the language semantics, and is likely to accidentally go out of sync when something is changed." I don't think casting to ifaces is newer than type_uses itself, but it is an infrequently used feature and not well-tested.
So the moral of the story is: if you think going drinking might help, go drinking :-D
I turned out to be wrong in my initial hypothesis about the bug, and that led me to spend too much time barking up the wrong tree. I'd thought the wrong tydesc (type descriptor) was getting generated for the call to perform_hax at type @int. The compiler automatically generates header fields containing tydescs to handle operations like incrementing and decrementing reference counts and freeing memory when the refcount goes to zero (also, the cycle collector, part of the RTS, needs to know about these headers as well). It avoids generating multiple tydescs with the same values at different addresses, since that would just waste memory. I'd assumed that in the case of either an iface type, or an opaque box type (the latter type isn't visible to Rust programmers but gets used internally to represent ifaces), a tydesc was getting re-used when it shouldn't have been, and that was why I was seeing behavior that looked like the str drop glue getting called on an int. (The drop glue is the bit of code that lowers the refcount for a given thing.)
But that wasn't it; commenting out the code that results in tydescs getting re-used didn't fix the bug. I'd been working on this for over a day, and I finally got frustrated enough to ask Patrick and Brian, "can you explain how, at runtime, the code knows how to free the contents of an opaque box?" Patrick said, "by pulling the tydesc out of the object", and that was more or less what I'd figured out by staring at the code, but it all looked okay to me. I started explaining the bug to Brian in the hopes that he would be able to explain the opaque box code, and he asked me whether the bug was in code not getting monomorphized correctly, or in how one of the monomorphized instances got generated. I'd assumed the latter, and said so, but that I wasn't totally sure. He said the person I really wanted to talk to was the one who's on sabbatical right now, to which I said that I was going to go drinking. And I did!
...While the friend I was drinking with was in the bathroom, I pulled out my laptop, like you do, and Brian's question came back to me. Based on some print statements I'd inserted, I'd assumed that two *different* monomorphized instances of perform_hax, were getting created one for int and one for str, but when I modified the print statement, I realized that wasn't so. For some reason, perform_hax wasn't getting monomorphized at all and both callers were just calling the original version. Why was that?
...Thanks to printing out the monomorphic function's hashed ID, I saw it was the same as the original function's def ID (defining ID), with something attached called mono_any... well, what's that? I saw the mono_any was introduced in only one place, in trans::base::make_mono_id. It gets introduced if the type parameter that a function is being monomorphized on has zero "uses", meaning its function body doesn't depend on it in any way. But why would that be? perform_hax has one type parameter, T, and the function body clearly depends on it: it has to return a record pointing to a tydesc that's correct for the contents of the contents of the opaque box (which is to say, for whatever type T gets instantiated with).
The type_uses analysis is what determines the number of uses of each type parameter -- that is, how the function body depends on each of the type parameters, and it had a bug: it wasn't looking under @s; as well, it wasn't noting the dependency of the whole function body on e's actual static type, if it has a cast like e as frobbable, if frobbable is an iface ty. In that case, you have to return a boxed record containing a ty desc that's *specific* to whatever e's type really is, so you really do depend on e's type. When I fixed these bugs, the test worked. Splendid!
And this is why I love debugging: that moment when you go from knowing nothing to complete understanding. At least complete understanding of a fix. I still don't claim to understand type_uses all that well, but in my defense, there is a comment at the beginning saying "This unfortunately depends on quite a bit of knowledge about the details of the language semantics, and is likely to accidentally go out of sync when something is changed." I don't think casting to ifaces is newer than type_uses itself, but it is an infrequently used feature and not well-tested.
So the moral of the story is: if you think going drinking might help, go drinking :-D