-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reimplement non-asm OPENSSL_cleanse() #455
Conversation
it works, it avoids possible clever compilers optimizing it away, doubtful we'll take this. |
@richsalz it's not a question of whether it works, but whether it could be better. Right now the non-asm OpenSSL_cleanse implementation is pointlessly complicated and slow. With a proper implementation the asm implementations would become superfluous, and removing them would make the OpenSSL code overall better. If you are worried about whether this works or not, I think the fact that OpenSSH uses a similar implementation to the one proposed here (try OS provided functions like memset_s or SecureZeroMemory first, and fallback to the volatile pointer implementation if those are not present) is a good sign that this works as well. |
* Use memset_s() if the user says so * Use SecureZeroMemory() on Windows * Use a volatile function pointer to memset() everywhere else
I've changed my mind. :) @dot-asm what do you think? |
@richsalz, you're sending mixed signals. First you say you'll endorse this and then close it. Anyway, what do I think about what? About scraping assembly OPENSSL_cleanse? I said what I think about it in RT#4116, very end of first paragraph. Assembly OPENSSL_cleanse is about not having this endless discussions about what compilers can do. On related note, references to what others do are not helpful in this context, because others still have to sleep bad at night asking themselves this question over and over. As for replacing C implementation with volatile reference to memset, I think it's appropriate. As for adhering to SecureZeroMemory() on Windows, not fan. If given choice between #ifdef-spaghetti and something that is specified to achieve the goal by following letter of standard on either platform, I prefer latter. Besides, suggesting to scrap assembly OPENSSL_cleanse and suggest to favour SecureZeroMemory() on Windows arguing that former "adds complexity" is not really coherent argument. How does line of argument goes? Here is assembly OPENSSL_cleanse that appears complex to me, and here is SecureZeroMemory that I know nothing about, hence latter is preferable? One can argue that vendor makes guarantees about SecureZeroMemory. But they make this same guarantee for any opaque function. So what's the difference really? P.S. Consider even that amount of brain power spend on this is higher than writing assembly subroutines. |
So, I rebased my "cleanse" branch and removed the second commit as per @dot-asm comment, though since the PR is closed it's not showing up here. I'm still of the opinion that the asm implementations should be removed, but whatever. Also, I don't really care about SecureZeroMemory all that much, so if you say I have to remove it, I'll do it, but I don't quite understand your argument against it.
Well, we remove 99% of the code required to implement this from the OpenSSL codebase and move it to the Windows standard library (or another libc library). One could also make the point that, while the asm implementations aren't complex, they aren't exactly "optimal" either. That is, using a standard library function (being it memset_s, SecureZeroMemory or memset with volatile pointer) we would get a more optimised implementation for free on pretty much every existing platform. Though I guess the performance gain would be pretty small in the grand scheme of things.
Not sure I follow, what's the difference between what? |
That question was specifically about favouring SecureZeroMemory over assembly OPENSSL_cleanse, not in general what is the argument for scraping OPENSSL_cleanse (and looking for alternative).
Provided that it's there. Think vxworks for example... As already said in RT#4116 is there anything one can do to be excused to deal with all the questions on 20 different systems and 20 different compilers? Yes, implement OPENSSL_cleanse in assembly and go sleep like a baby.
What I'm saying is that compiler sees no difference between SecureZeroMemory and assembly OPENSSL_cleanse. In other words SecureZeroMemory is not more magic. |
Yes @dot-asm, I had second thoughts and wanted your opinion. Thanks. |
@richsalz wouldn't merging https://github.com/ghedo/openssl/commit/6e6e17d404f134ce28fe263c456ee3ea09b97d31 be enough? I removed the SecureZeroMemory call now FWIW. |
To clarify. Personally I basically vote for volatile function pointer alone. Because memset_s implies knowledge about its availability, i.e. elaborate #ifdef or something, which also has side effect that rare platforms are effectively put aside. Less special cases is better [as long as common case is specified to work, in sense that if it doesn't, then it can be treated as compiler bug]. Of course modulo fact that we are talking about platforms without assembly support. |
Yes, no memset_s. See https://gitlab.openssl.org/openssl/openssl/merge_requests/2125 (team internal folks) |
See also the discussion in #455 Reviewed-by: Andy Polyakov <[email protected]>
see commit 104ce8a. Closed ticket 4116 also. |
My apologies for jumping in late. You need the elements to be volatile to ensure the write; and not the pointer. So you need something like the following:
I believe you can also perform the following. The address of the element is taken to avoid the "lvalue casts are not supported" error when the GNU extension is not engaged:
This is the desired remediation with the best chance of success. That's because of the intersection with defects in the C language (our inability to succinctly express what we want) modulo what the compiler writers do (they say "... but you wanted optimizations..."). The C code above should be a fallback when specialized code is not available.
On Windows 2000 and above, you should use Using Crypto++ got burned with a volatile pointer. GCC responded by only removing some of the writes (and not all of them). Crypto++ then changed its code to run the wipe in reverse (i.e., from However, the underlying problem still existed - the pointer was volatile, and not the elements. Crypto++ recently changed its zeroizer such that it uses volatile elements, and not a volatile pointer. I've talked to Robert Seacord and Aaron Ballman from CERT, and everyone agrees that CERT's example at MSC06-C. Beware of compiler optimizations is wrong. Its wrong because it uses a volatile pointer. Also note that under GCC, using Also see Ian Lance Taylor's blog at volatile qualifier. Taylor is a GCC dev. From his blog:
Finally, the Glibc folks will probably never provide |
This comment makes no sense given the implementation that was being discussed: |
Sorry, for being so harsh :/ What I meant was that we don't clear the array manually with a loop like in your example, but we use memset to do it, using a volatile function pointer to avoid having the compiler optimize the call away. |
That's OK. I try to avoid taking terseness for rudeness. Plus, I did not do a good job of emphasizing the writes to the elements in the buffer need to survive optimization.
I'm not sure the language and the compiler makes those guarantees. I'm fairly certain this still applies: "That it happens to work is coincidence....". |
You contradict yourself, because what you suggest makes pointer volatile, not elements. I.e. compiler would interpret it as vptr[i] and vptr[i-1] can refer to completely different memory regions, not that these elements are volatile. Well, one can argue that it does the trick, but one can't argue that it's not really just a trick. Because it's a private variable as assigned a line above. If anything, this suggestion is the stretching of the letter of the standard (which makes it the trick). Indeed, what compiler would have to do given the above code? It would have to allocate place for vptr on the stack[!], copy ptr to it, and de-reference the location on stack in each iteration. Once again, spot for vptr is allocated on the stack. So that you kind of tell compiler to assume that purely private variable it allocates at run-time stack can be changed by hardware or another thread. Well, I'd also call this "Think carefully about what volatile means and about what it does not mean." Now consider what we've got, i.e. volatile pointer to function. What assumption compiler can do about it? Can it assume that we won't place that variable by some linker magic in hardware-mapped area? No. Can it assume that another thread can't change it? Since it's global variable, no. And all this according to letter of standard. And given that it can't make the above assumptions, can it make assumption that pointer is always memset? No. In other words compiler will be obliged to call a sequence of instructions. |
Once again, if given a choice, I'd prefer unified solution that works on arbitrary platform. Keyword is "if given a choice". But even if you say that one should adhere to SecureZeroMemory, because it's "part of SDLC", I'd probably say "so is compiler". Meaning that Microsoft compiler is also bound not to make assumptions mentioned in my previous message. |
... and not doing it would have to be qualified as compiler bug. |
I think if you ask the compiler folks, they will tell you its not a bug because you asked for optimizations. For what its worth, I don't disagree with you on many of these points. Like OpenSSL, I knew what the pain points were. I asked the GCC devs what I should do, and these are the things I learned. GCC's position of "...but you asked for optimizations..." is especially frustrating to me. That's when I came to realization the C language does not allow us to express what we want/need to do in the real world (and not the abstract state machine). |
Standard specifies references to volatile objects as side effects, which does mean that compiler is obliged not to optimize away such references [and even keep them in original program order]. One can argue that compiler may relax when it comes to pure private variables, but it can't formally do that when it comes to global variables irregardless optimization level. |
This construct does not actually work and is rather easily optimized away, there was a discussion about it on HN following a blog post by Colin Percival two years ago. The basic point of it was that the volatile only applies to accessing but not making a call via that pointer. |
I know everyone likes to play language lawyer, BUT...
Thanks. |
目标是将MD5算法编译成build-in不依赖外部动态库
Note that there was a comment from @levitte in a previous version of the patch:
and my answer was:
Also, do the asm implementations really provide that much of a speed-up to justify the added complexity? Maybe it would make sense to drop them.