Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integer-openssl proposal #183

Closed
wants to merge 7 commits into from
Closed

integer-openssl proposal #183

wants to merge 7 commits into from

Conversation

ch1bo
Copy link

@ch1bo ch1bo commented Nov 18, 2018

Proposal to add a fast, BSD-licensed alternative for GHC's Integer type, next to currently available integer-gmp (default, fast and LPGL3-licensed) and integer-simple (slow, but pure and BSD3-licensed) implementations. Concretely, this would use OpenSSLs "BIGNUM" arbitrary size integer library contained in libcrypto.

Rendered

@cartazio
Copy link

so this would be as a better "bsd" default than current integer simple?

  1. thats great! (i'm not sure if the openssl one is current best of breed amongh applicable c libs, but having more options isnt a bad thing)

  2. one thing thats not discussed in this is what the static vs dynamic linking options here would be. Would this essentially mean that applicable ghc installs would need libssl and friends as dylibs? Or would that subset just be static linked in?

  3. so it looks like openssl uses karastuba for large numbers, BUT, it seems the thresholds haven't been measured / tuned since i was in first or second grade 20-22 years ago!
    https://github.com/openssl/openssl/blob/1212818eb07add297fe562eba80ac46a9893781e/crypto/bn/bn_lcl.h#L332-L338

  4. could you share some information about how on various representative algorithms libopen ssl fairs vs integer-simple and integer gmp?

theres a LOT of surface area for a good implementation, and among open source implementations, openssl may not be the best, though its omni presence is a boon.

for those watching from home
a) integer simple uses the grad school algorithm for multiplication
b) open ssl uses karatsuba https://en.wikipedia.org/wiki/Karatsuba_algorithm for things with more than 16 words (or limbs) of bits
c) gmp has a HUGE set of families of algorithms which run at various asymptotic / finite thresholds to handle all the trade offs in constant factors vs asymptotics.

  • in GMP, for multiplication its grade school, then the toom algorithms (about 5-6 of them?), of which karatsuba is the simplest and slowest asymptotically), then " Schönhage–Strassen algorithm", which is pretty darn fast (theres some even faster asymptotic algorithms, but i've not seen any experimental data that shows them winning out on numbers you can fit into a modern computer).

point being: would be great to see some benchmarking data for mult/ sqrt, etc on rationals, naturals, integers, for showing how these all compare on the differently naive / sophisticated algorithms therein. :)

@cartazio
Copy link

https://gmplib.org/repo/gmp/file/tip/mpn/x86_64/gmp-mparam.h is the current example thresholds of gmp. i think karatsuba corresponds to toom22

i guess i'm just concerned that the openssl tuning parameters are based on performance data older than some contributors to ghc and llvm :)

@ch1bo
Copy link
Author

ch1bo commented Nov 19, 2018

@cartazio Seems you have a far better understanding on the maths involved - thanks for the overview.

As far as linking is concerned, the current PoC is linked dynamically against the libcrypto existing on the system. Static linking, however, is mentioned a lot and should be possible.

Some benchmarks of multiplication and divison of small and big (4k bit) integers against the currently builtin (for most users thats integer-gmp) can be run with the PoC. Thr Readme page also shows (older) benchmarks agaonst integer-openssl. I will update them with some graphs soon.

There is also a some prior work on estimating performance of different open source integer libraries in the ghc wiki

@simonpj
Copy link
Contributor

simonpj commented Nov 19, 2018

Sounds like a good work to me.

But to an external observer it's very mysterious that you propose to call it integer-openssl. That makes it sounds as if this library is something to do with crypto etc. But I assume it's not. I'm guessing it's because the impl uses code from openssl?

Anyway, some less confusing name might be worth considering. Or not. Just raising it because it took me some time to realise that this proposal was not related to crypto

@cartazio
Copy link

cartazio commented Nov 19, 2018 via email

@cartazio
Copy link

cartazio commented Nov 19, 2018 via email

@piyush-kurur
Copy link

One of the suggested reasons for inclusion is to "allow public key algorithms". Of course you can
use that in the class room but not for serious crypto.

  1. Haskell requires a integer library because it provides large arbitrary integer multiplication. But
    it would be dangerous to implement pki (I am assuming RSA) based on that. A better motivation
    could be representing integers in compilers.

  2. For crypto purposes it would be sufficient (and better) to implement say 4026 or some fixed size integers carefully done so that there are no side channels etc.

  3. More importantly if ghc has to depend on openssl that would be a pity. I do not think it is a good code base to depend on plus it bring out a lot of other useless (for ghc) stuff.

@cartazio
Copy link

cartazio commented Nov 19, 2018 via email

@ch1bo
Copy link
Author

ch1bo commented Nov 19, 2018

@piyush-kurur The cryptography example was given as the haskell package cryptonite implements public key cryptography primitives in haskell. And IIRC it is using Integer and operations on it. These were particularly slow in a real world scenario using integer-simple.

@cartazio Comparing the performance is definitely important and I tried to get early estimates how much faster than integer-simple and also how much slower than integer-gmp the prototypical implementation of multiplication and division is, I'll add these preliminary results to the PoC and link them in the proposal.
The improvements outlined in https://ghc.haskell.org/trac/ghc/wiki/Design/IntegerGmp2 are already incorporated into integer-gmp and were also considered when creating the current proof of concept of integer-openssl. Sorry just reread your comment, and you seem to refer to the link time selection section.

In general I'd like to say that I took the BIGNUM implementation of openssl as it was a) BSD licensed, b) complete, c) correct and d) likely available on most systems. If we find better alternatives, I am happy to update the proposal, name and implementation to use something else :)

@bgamari
Copy link
Contributor

bgamari commented Nov 19, 2018

My apologies if this was already clear: I think it would make the most sense to consider this as a more efficient alternative to integer-simple, not a replacement of integer-gmp. GMP seems to be the gold-standard of bignum libraries and I see no reason not to use it unless licensing concerns preclude its use.

I believe this is what the proposal as-written is proposing. Consequently, I don't think the concern that GHC will gain a dependency on OpenSSL is well-founded.

Sebastian Nagel and others added 3 commits November 19, 2018 23:04
Added a section on why openssl was chosen for a proof of concept and add criterion benchmarks from munihac.
@23Skidoo
Copy link

23Skidoo commented Nov 21, 2018

There's also https://github.com/libtom/libtommath (edit: and https://github.com/libtom/tomsfastmath now as well), I wonder how they compare to OpenSSL's bignums.

@cartazio
Copy link

cartazio commented Nov 21, 2018 via email

@23Skidoo
Copy link

It seems to be dually licensed now, the first one is the asinine license you mention, and the other one is public domain (like SQLite).

@cartazio
Copy link

cartazio commented Nov 21, 2018 via email

@ch1bo
Copy link
Author

ch1bo commented Nov 21, 2018

libtommath looks promising as well. After a quick glance over the header, all required ops should be there.. all but one I could not find in openssl/bn so far either: integer <-> double conversion. I am not too good in maths, but in a quick chat with @hvr on monday, he mentioned that one probably does not want to do that by hand.

Other than that, also got updated to what the original intention by @hvr with integer-gmp and link time selection of an integer library was. I'll try to incorporate that into the proposal

Also, the https://github.com/libtom/libtommath/blob/develop/LICENSE is kinda hilarious!

@minad
Copy link

minad commented Nov 21, 2018

@ch1bo I also suggest libtommath, in particular since it can be vendored easily into an integer-tommath package or into GHC by copying a single c file. Concerning integer<->double conversion, I made libtom/libtommath#123 a while ago, which is not yet part of tommath unfortunately.

@piyush-kurur
Copy link

@ch1bo thanks for the response. Also @bgamari thanks for clarifying that it is not
to replace gmp. Rereading the proposal the misunderstanding was entirely my fault.
However I am uncomfortable with having an indirect suggestion of using the multi-precision
Integer at the haskell level for cryptography. Even when the underlying library is designed for crypto
I think it is tricky as I am not sure how ghc moves around large integers in its memory during GC.

@cartazio
Copy link

cartazio commented Nov 21, 2018 via email

@cartazio
Copy link

@minad if you could help shepard that license clarification into being in master that'd be lovely :)

@cartazio
Copy link

(i'm a little bit too spread thin to do more than be a resource on the sidelines atm :) )

@minad
Copy link

minad commented Nov 21, 2018

@cartazio I will see what I can do regarding libtommath

toonn and others added 2 commits November 23, 2018 13:15
 17: LPGL3 -> LGPL3
 44: it's BIGNUM -> its BIGNUM
 46: ticks all boxes -> tick all the boxes
 61: should be also possiblen -> should also be possible
 77: like they do with work currently with -> like they currently are to work with
100: Cost do exist -> Costs do exist
101: depends however -> depends, however,
103: Interface changes is -> Interface changes are
130: library, where implementation is tested and benchmarked against the builtin one. -> library. Test and benchmark the implementation against the built-in one.
132: half are -> half is
135: both, -> both
141: working in progress -> work in progress
Caught a couple typos while reading
@Ericson2314
Copy link
Contributor

Ericson2314 commented Dec 3, 2018

We really ought to us backpack for integer-*. I'd hope with 3 implementations it would finally be worth someone's time to do that.

@cartazio
Copy link

cartazio commented Dec 3, 2018 via email

@bravit bravit removed the Proposal label Dec 3, 2018
@ch1bo
Copy link
Author

ch1bo commented Dec 3, 2018

@Ericson2314 yes, this should be a perfect fit for backpack. It was also brought up by @hvr as a suggestion.

I am pretty new to backpack, but I should incorporate it into the proposal..

@minad
Copy link

minad commented Jan 3, 2019

Tommath changed the license to Unlicense, which is essentially public domain and similar to MIT in jurisdictions which do not recognize the concept of public domain.

Furthermore additional functions were added (float conversion,...), which are also useful for a hypothetical integer-tommath.

https://www.libtom.net/news/LTM_1.1.0RC1/

@harpocrates
Copy link
Contributor

@Ericson2314 yes, this should be a perfect fit for backpack. It was also brought up by @hvr as a suggestion.

I am pretty new to backpack, but I should incorporate it into the proposal..

Since I tried this recently, here's some problems that will have to be solved:

  • there's a bunch of CPP'ed optimizations in base (as well as a couple in text and bytestring I think?) which require intimate knowledge of Integer's particular constructors

  • there's a some GHC code that peers into the internals of Integer's particular constructors

    • there's a handy core-prep optimization for turning small integers directly into calls to S# instead of going via mkInteger
    • there's some GHCi code for cleverly printing integers with :sp
    • ????
  • integer-simple and integer-gmp don't expose quite the same functions (I'm working on that)

Ideally, we'd have GHC/Integer.hsig as a signature in base (and maybe even GHC/Natural.hsig?) and downstream consumers could instantiate base with integer-*, but it feels like there will be some work to get there.

@piyush-kurur
Copy link

piyush-kurur commented Jan 8, 2019 via email

@minad
Copy link

minad commented Jan 9, 2019

Would this project make sense in the context of the GSOC? See https://www.reddit.com/r/haskell/comments/ae1i6z/reminder_please_submit_gsoc_ideas/edmpgit

@jaspervdj
Copy link

Happy to have the implementation part of this as a GSoC project. If anyone here is interested in mentoring, please let me know.

@piyush-kurur
Copy link

piyush-kurur commented Jan 22, 2019 via email

@nomeata
Copy link
Contributor

nomeata commented May 1, 2019

Little activity recently, marking this as dormant.

@nomeata nomeata force-pushed the master branch 3 times, most recently from 6b33e58 to e7fdbc7 Compare June 11, 2019 12:04
@nomeata nomeata added the Dormant The proposal is not being actively discussed, but can be revived label Jun 11, 2019
@ch1bo
Copy link
Author

ch1bo commented Jun 15, 2019

Since I tried this recently, here's some problems that will have to be solved:

  • there's a bunch of CPP'ed optimizations in base (as well as a couple in text and bytestring I think?) which require intimate knowledge of Integer's particular constructors

  • there's a some GHC code that peers into the internals of Integer's particular constructors

    • there's a handy core-prep optimization for turning small integers directly into calls to S# instead of going via mkInteger
    • there's some GHCi code for cleverly printing integers with :sp
    • ????
  • integer-simple and integer-gmp don't expose quite the same functions (I'm working on that)

@harpocrates Thanks for that - this is already merged right?

Ideally, we'd have GHC/Integer.hsig as a signature in base (and maybe even GHC/Natural.hsig?) and downstream consumers could instantiate base with integer-*, but it feels like there will be some work to get there.

Although I am a backpack noob, I gave this a try (also a while back) in here https://github.com/ch1bo/backpack-integer

However with the current state of backpack adoption a first iteration without backpack sounds more realistic?

@nomeata
Copy link
Contributor

nomeata commented Aug 20, 2019

Closing this due to inactivity, but it can be reopened when necessary.

@nomeata nomeata closed this Aug 20, 2019
@ch1bo
Copy link
Author

ch1bo commented Sep 9, 2019

Finally created a MR for GHC here: https://gitlab.haskell.org/ghc/ghc/merge_requests/1677. This adds integer-openssl in the same way as integer-gmp - some cleanup and possibly DRYing with integer-gmp could/should be done - feedback welcome.

@hsyl20
Copy link
Contributor

hsyl20 commented Mar 11, 2020

I have been working on a new ghc-bignum package replacing both integer-gmp and integer-simple and hopefully available in GHC 8.12. Among other things, it makes the implementation of newer Integer/Natural backends way easier as each backend only has to provide a few functions: https://gitlab.haskell.org/ghc/ghc/merge_requests/2231

@ch1bo if you could try to add libcrypto as a backend for ghc-bignum, that would be great. You don't have to change anything in bytestring, text and so on anymore. I'd be glad to help as needed.

@ch1bo
Copy link
Author

ch1bo commented Mar 14, 2020

@hsyl20 Your work on ghc-bignum looks very promising, nice! It should be quite trivial to adapt https://github.com/ch1bo/integer-openssl to a backend of your ghc-bignum package. Is the package also available "outside" of GHC? i.e. with a test-suite or benchmarks

@hsyl20
Copy link
Contributor

hsyl20 commented Mar 16, 2020

Is the package also available "outside" of GHC?

Sadly not anymore. It was initially in https://github.com/haskus/packages/tree/master/haskus-integer with some tests but a lot of things have changed since then (bug fixes, etc.).

with a test-suite or benchmarks

I have some benchmarks which use criterion-measurements from here (to avoid any dependency on non boot packages).

I still have to integrate the QuickCheck tests into GHC's testsuite. Last time I tried, installing QuickCheck for the tests seemed to be a pain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dormant The proposal is not being actively discussed, but can be revived
Development

Successfully merging this pull request may close these issues.