Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplement non-asm OPENSSL_cleanse() #455

Closed
wants to merge 2 commits into from
Closed

Reimplement non-asm OPENSSL_cleanse() #455

wants to merge 2 commits into from

Conversation

ghedo
Copy link
Contributor

@ghedo ghedo commented Oct 29, 2015

  • Use SecureZeroMemory() on Windows
  • Use a volatile function pointer to memset() everywhere else

Note that there was a comment from @levitte in a previous version of the patch:

Wasn't there a point in making the cleansing something other than zeros?

and my answer was:

The CHANGES entry for OPENSSL_cleanse() entry says:

*) New function OPENSSL_cleanse(), which is used to cleanse a section of
memory from it's contents. This is done with a counter that will
place alternating values in each byte. This can be used to solve
two issues: 1) the removal of calls to memset() by highly optimizing
compilers, and 2) cleansing with other values than 0, since those can
be read through on certain media, for example a swap space on disk.
[Geoff Thorpe]

However AFAICT the asm implementations don't do this, and both LibreSSL and BoringSSL changed to some equivalent of my patch (with BoringSSL using memset() and a memory barrier), so I don't quite see the point in complicating things.

Also, do the asm implementations really provide that much of a speed-up to justify the added complexity? Maybe it would make sense to drop them.

@richsalz
Copy link
Contributor

richsalz commented Jan 6, 2016

it works, it avoids possible clever compilers optimizing it away, doubtful we'll take this.

@ghedo
Copy link
Contributor Author

ghedo commented Jan 7, 2016

@richsalz it's not a question of whether it works, but whether it could be better. Right now the non-asm OpenSSL_cleanse implementation is pointlessly complicated and slow. With a proper implementation the asm implementations would become superfluous, and removing them would make the OpenSSL code overall better.

If you are worried about whether this works or not, I think the fact that OpenSSH uses a similar implementation to the one proposed here (try OS provided functions like memset_s or SecureZeroMemory first, and fallback to the volatile pointer implementation if those are not present) is a good sign that this works as well.

* Use memset_s() if the user says so
* Use SecureZeroMemory() on Windows
* Use a volatile function pointer to memset() everywhere else
@richsalz richsalz self-assigned this Feb 25, 2016
@richsalz
Copy link
Contributor

I've changed my mind. :) @dot-asm what do you think?
+1 i'll merge once reviewed.

@richsalz richsalz closed this Feb 25, 2016
@dot-asm
Copy link
Contributor

dot-asm commented Feb 25, 2016

@richsalz, you're sending mixed signals. First you say you'll endorse this and then close it. Anyway, what do I think about what? About scraping assembly OPENSSL_cleanse? I said what I think about it in RT#4116, very end of first paragraph. Assembly OPENSSL_cleanse is about not having this endless discussions about what compilers can do. On related note, references to what others do are not helpful in this context, because others still have to sleep bad at night asking themselves this question over and over.

As for replacing C implementation with volatile reference to memset, I think it's appropriate.

As for adhering to SecureZeroMemory() on Windows, not fan. If given choice between #ifdef-spaghetti and something that is specified to achieve the goal by following letter of standard on either platform, I prefer latter. Besides, suggesting to scrap assembly OPENSSL_cleanse and suggest to favour SecureZeroMemory() on Windows arguing that former "adds complexity" is not really coherent argument. How does line of argument goes? Here is assembly OPENSSL_cleanse that appears complex to me, and here is SecureZeroMemory that I know nothing about, hence latter is preferable? One can argue that vendor makes guarantees about SecureZeroMemory. But they make this same guarantee for any opaque function. So what's the difference really?

P.S. Consider even that amount of brain power spend on this is higher than writing assembly subroutines.

@ghedo
Copy link
Contributor Author

ghedo commented Feb 25, 2016

So, I rebased my "cleanse" branch and removed the second commit as per @dot-asm comment, though since the PR is closed it's not showing up here. I'm still of the opinion that the asm implementations should be removed, but whatever.

Also, I don't really care about SecureZeroMemory all that much, so if you say I have to remove it, I'll do it, but I don't quite understand your argument against it.

How does line of argument goes?

Well, we remove 99% of the code required to implement this from the OpenSSL codebase and move it to the Windows standard library (or another libc library).

One could also make the point that, while the asm implementations aren't complex, they aren't exactly "optimal" either. That is, using a standard library function (being it memset_s, SecureZeroMemory or memset with volatile pointer) we would get a more optimised implementation for free on pretty much every existing platform. Though I guess the performance gain would be pretty small in the grand scheme of things.

But they make this same guarantee for any opaque function. So what's the difference really?

Not sure I follow, what's the difference between what?

@dot-asm
Copy link
Contributor

dot-asm commented Feb 25, 2016

How does line of argument goes?

That question was specifically about favouring SecureZeroMemory over assembly OPENSSL_cleanse, not in general what is the argument for scraping OPENSSL_cleanse (and looking for alternative).

move it to the Windows standard library (or another libc library).

Provided that it's there. Think vxworks for example... As already said in RT#4116 is there anything one can do to be excused to deal with all the questions on 20 different systems and 20 different compilers? Yes, implement OPENSSL_cleanse in assembly and go sleep like a baby.

But they make this same guarantee for any opaque function. So what's the difference really?

Not sure I follow, what's the difference between what?

What I'm saying is that compiler sees no difference between SecureZeroMemory and assembly OPENSSL_cleanse. In other words SecureZeroMemory is not more magic.

@richsalz
Copy link
Contributor

Yes @dot-asm, I had second thoughts and wanted your opinion. Thanks.
I am going to take the volatile function pointer/memset_s thing and do a PR for that and then close RT4116.
This stays closed.
Thanks for the efforts here, folks!

@ghedo
Copy link
Contributor Author

ghedo commented Feb 25, 2016

@richsalz wouldn't merging https://github.com/ghedo/openssl/commit/6e6e17d404f134ce28fe263c456ee3ea09b97d31 be enough?

I removed the SecureZeroMemory call now FWIW.

@dot-asm
Copy link
Contributor

dot-asm commented Feb 25, 2016

volatile function pointer/memset_s

To clarify. Personally I basically vote for volatile function pointer alone. Because memset_s implies knowledge about its availability, i.e. elaborate #ifdef or something, which also has side effect that rare platforms are effectively put aside. Less special cases is better [as long as common case is specified to work, in sense that if it doesn't, then it can be treated as compiler bug]. Of course modulo fact that we are talking about platforms without assembly support.

@richsalz
Copy link
Contributor

Yes, no memset_s. See https://gitlab.openssl.org/openssl/openssl/merge_requests/2125 (team internal folks)

levitte pushed a commit that referenced this pull request Feb 25, 2016
See also the discussion in #455

Reviewed-by: Andy Polyakov <[email protected]>
@richsalz
Copy link
Contributor

see commit 104ce8a. Closed ticket 4116 also.
Thanks again, everyone.

@noloader
Copy link
Contributor

Use a volatile function pointer to memset() everywhere else

My apologies for jumping in late.

You need the elements to be volatile to ensure the write; and not the pointer. So you need something like the following:

void cleanse(unsigned char* ptr, size_t size)
{
  if(!size) return;

  unsigned char * volatile vptr = ptr;
  do { vptr[--size] = 0; } while (size);
}

I believe you can also perform the following. The address of the element is taken to avoid the "lvalue casts are not supported" error when the GNU extension is not engaged:

void cleanse(unsigned char* ptr, size_t size)
{
  if(!size) return;

  do { *(unsigned char * volatile) &ptr[--size] = 0; } while (size);
}

... implement OPENSSL_cleanse in assembly and go sleep like a baby.

This is the desired remediation with the best chance of success. That's because of the intersection with defects in the C language (our inability to succinctly express what we want) modulo what the compiler writers do (they say "... but you wanted optimizations...").

The C code above should be a fallback when specialized code is not available.


As for adhering to SecureZeroMemory() on Windows, not fan. If given choice between #ifdef-spaghetti and something that is specified to achieve the goal by following letter of standard on either platform, I prefer latter.

On Windows 2000 and above, you should use SecureZeroMemory. SecureZeroMemory is part of Microsoft's SDLC and its a best practice for the platform.

Using SecureZeroMemory on the Windows platform is also a compliance item. I regularly send code back to the developers during C&A audits. Its a security gate, and developers must use SecureZeroMemory or the code won't pass.


Crypto++ got burned with a volatile pointer. GCC responded by only removing some of the writes (and not all of them). Crypto++ then changed its code to run the wipe in reverse (i.e., from n-1 to 0) to tame the optimizer.

However, the underlying problem still existed - the pointer was volatile, and not the elements.

Crypto++ recently changed its zeroizer such that it uses volatile elements, and not a volatile pointer.


I've talked to Robert Seacord and Aaron Ballman from CERT, and everyone agrees that CERT's example at MSC06-C. Beware of compiler optimizations is wrong. Its wrong because it uses a volatile pointer.


Also note that under GCC, using volatile qualifier to tame the optimizer is an abuse because GCC (unlike Clang and MSVC) interprets the qualifier to mean memory that can be changed by hardware, like an I/O range mapped into memory. GCC does not recognize the case that software threads may be changing memory or the optimizer needs taming. That it happens to work is coincidence.

Also see Ian Lance Taylor's blog at volatile qualifier. Taylor is a GCC dev. From his blog:

In summary, if you are using volatile for anything other than manipulating memory mapped hardware, or for very limited communication between threads, it is very likely that you are making a mistake. Think carefully about what volatile means and about what it does not mean.


... because memset_s implies knowledge about its availability, i.e. elaborate #ifdef or something, which also has side effect that rare platforms are effectively put aside.

Finally, the Glibc folks will probably never provide memset_s, so Linux will likely never have it. I filled the bug report myself some time ago: Issue 17879: Library is missing memset_s.

@ghedo
Copy link
Contributor Author

ghedo commented Feb 27, 2016

You need the elements to be volatile to ensure the write; and not the pointer. So you need something like the following:

This comment makes no sense given the implementation that was being discussed:
https://github.com/openssl/openssl/blob/master/crypto/mem_clr.c#L62

@ghedo
Copy link
Contributor Author

ghedo commented Feb 27, 2016

This comment makes no sense given the implementation that was being discussed:

Sorry, for being so harsh :/

What I meant was that we don't clear the array manually with a loop like in your example, but we use memset to do it, using a volatile function pointer to avoid having the compiler optimize the call away.

@noloader
Copy link
Contributor

Sorry, for being so harsh :/

That's OK. I try to avoid taking terseness for rudeness. Plus, I did not do a good job of emphasizing the writes to the elements in the buffer need to survive optimization.

using a volatile function pointer to avoid having the compiler optimize the call away...

I'm not sure the language and the compiler makes those guarantees. I'm fairly certain this still applies: "That it happens to work is coincidence....".

@dot-asm
Copy link
Contributor

dot-asm commented Feb 27, 2016

You need the elements to be volatile to ensure the write; and not the pointer. [You need] unsigned char * volatile vptr = ptr;

You contradict yourself, because what you suggest makes pointer volatile, not elements. I.e. compiler would interpret it as vptr[i] and vptr[i-1] can refer to completely different memory regions, not that these elements are volatile. Well, one can argue that it does the trick, but one can't argue that it's not really just a trick. Because it's a private variable as assigned a line above. If anything, this suggestion is the stretching of the letter of the standard (which makes it the trick). Indeed, what compiler would have to do given the above code? It would have to allocate place for vptr on the stack[!], copy ptr to it, and de-reference the location on stack in each iteration. Once again, spot for vptr is allocated on the stack. So that you kind of tell compiler to assume that purely private variable it allocates at run-time stack can be changed by hardware or another thread. Well, I'd also call this "Think carefully about what volatile means and about what it does not mean."

Now consider what we've got, i.e. volatile pointer to function. What assumption compiler can do about it? Can it assume that we won't place that variable by some linker magic in hardware-mapped area? No. Can it assume that another thread can't change it? Since it's global variable, no. And all this according to letter of standard. And given that it can't make the above assumptions, can it make assumption that pointer is always memset? No. In other words compiler will be obliged to call a sequence of instructions.

@dot-asm
Copy link
Contributor

dot-asm commented Feb 27, 2016

you should use SecureZeroMemory.

Once again, if given a choice, I'd prefer unified solution that works on arbitrary platform. Keyword is "if given a choice". But even if you say that one should adhere to SecureZeroMemory, because it's "part of SDLC", I'd probably say "so is compiler". Meaning that Microsoft compiler is also bound not to make assumptions mentioned in my previous message.

@dot-asm
Copy link
Contributor

dot-asm commented Feb 27, 2016

compiler will be obliged to call a sequence of instructions.

... and not doing it would have to be qualified as compiler bug.

@noloader
Copy link
Contributor

compiler will be obliged to call a sequence of instructions.

... and not doing it would have to be qualified as compiler bug.

I think if you ask the compiler folks, they will tell you its not a bug because you asked for optimizations.

For what its worth, I don't disagree with you on many of these points. Like OpenSSL, I knew what the pain points were. I asked the GCC devs what I should do, and these are the things I learned.

GCC's position of "...but you asked for optimizations..." is especially frustrating to me. That's when I came to realization the C language does not allow us to express what we want/need to do in the real world (and not the abstract state machine).

@dot-asm
Copy link
Contributor

dot-asm commented Feb 27, 2016

I think if you ask the compiler folks, they will tell you its not a bug because you asked for optimizations.

Standard specifies references to volatile objects as side effects, which does mean that compiler is obliged not to optimize away such references [and even keep them in original program order]. One can argue that compiler may relax when it comes to pure private variables, but it can't formally do that when it comes to global variables irregardless optimization level.

@gber
Copy link

gber commented Mar 9, 2016

Now consider what we've got, i.e. volatile pointer to function. What
assumption compiler can do about it? Can it assume that we won't place that
variable by some linker magic in hardware-mapped area? No. Can it assume that
another thread can't change it? Since it's global variable, no. And all this
according to letter of standard. And given that it can't make the above
assumptions, can it make assumption that pointer is always memset? No. In
other words compiler will be obliged to call a sequence of instructions.

This construct does not actually work and is rather easily optimized away, there was a discussion about it on HN following a blog post by Colin Percival two years ago. The basic point of it was that the volatile only applies to accessing but not making a call via that pointer.

@richsalz
Copy link
Contributor

richsalz commented Mar 9, 2016

I know everyone likes to play language lawyer, BUT...

  1. This issue is closed; open a new one if you have a platform where it's broken.
  2. The links talk about non-volatile objects which isn't related.
  3. Most of our platforms have hand-written assembler.

Thanks.

liuqun referenced this pull request in liuqun/njit8021xclient Feb 26, 2017
目标是将MD5算法编译成build-in不依赖外部动态库
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants