Skip to content

Conversation

@spinute
Copy link

@spinute spinute commented Sep 5, 2016

I fixed compilation error on Clang (probably for -Wshorten-64-to-32).
It may be inappropriate fixes since I have not read through the implementation yet.
I embedded XXX comments into codes. Check them please.

funny-falcon and others added 9 commits July 11, 2016 19:45
add more hash benchmarks to test alternatives
(patch author is Vladimir Makarov)
Previous implementation had an issues:
- macros murmur1 assumes murmur_step takes rotation value
  as a second argument
- but murmur_step second argument is "next block"
- this makes st_hash_uint and st_hash_end to not mix high bits of
  hash value into lower bits
- this leads to pure hash behaviour on doubles and other.
  It didn't matter when bins amount were prime numbers, but it hurts
  when bins are powers of two.

Mistake were created when implementor tries to co-exist Murmur1 and Murmur2
in a same code.

Change it to single "Murmur3" implementation.
It is not exact implementation:
- 64bit version is an extrapolation of 32bit version instead of
  separate algorithm
- 64bit finalizer is taken from
  http://zimbry.blogspot.ru/2011/09/better-bit-mixing-improving-on.html
- on final block call murmur_step (do not shortcut last rotation of h
  and multiplication) instead of custom block.

Also remove ST_USE_FNV1: it lack many function, and looks to be abandoned
Main Murmur weakness is permutation of a block without seed.
Just adding a random seed to each block and finalization we
greatly increase security of Murmur.
SipHash is secure PseudoRandom Function, and could be used
in a counter mode to produce strong pseudorandom stream of bytes.
So use it to securely initialize both hashseed and sipseed.
(so no way to recover sipseed from hashseed)
Note, that ident becomes a bit slower, cause hash value is more "random".
New versions calculates fair hash value, and so provides statistically
fair collision rate.
Previous version relies on internal pattern of symbols and "usually" provides
lesser collisions. Though it could be compromised easely.
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
While for cryptography it has halved security, it is just enough
for hash tables.
See Jean-Philipee Aumasson answeres to Ocaml pull request:
ocaml/ocaml#24 (comment)
ocaml/ocaml#24 (comment)
- instead of allocating st_table_entry per element with malloc,
  store all elements in one array
- use order in the array as an insertion order instead of list
-- after array's last element is filled, hash is rebuilt and grow/shrunk if needed.
- reduce sizeof(st_table_entry) to 24byte on 64bit platform by default
-- use 32bit next index instead of pointer and 32bit hash
-- there is compile time option to use 64bit index and 64bit hash
   to have ability to use huge hashes.
- reduce sizeof(st_table) to 32bytes (on x86_64) by couple of tricks:
-- alloc bins together with entries, so only 1 pointer needed
-- pack allocated size into 8bit and rebuilt count into 24bit
- fix test_st_foreach
-- disable test_st_foreach(:unpack_delete) cause it behaves differently
   on already unpacked table (it may lead to segfault in previous st
   implementation)
-- remove checks for packed/unpacked
- bins are dinamically sized:
-- if array's size is less than 256, then bins are 8bit indices
-- if array is less than 65536 elements, then bins are 16bit indices
-- otherwise bins are indices of maximum size (32bit or 64bit depending
   on architecture and compile time option)
@funny-falcon
Copy link
Owner

Answering first comment-question: good catch. But, it is better just change type of new_sz argument, then check will be done in a loop of new_sz. (and, probably add assert when compiled outside of ruby instead of return -1).

Answering second question: no, it never overflows. Check is in st_rehash after comment 'else grow table'.

Thank you for catch. Would you mind if i embed fixes into my commit instead of merging?

@funny-falcon
Copy link
Owner

btw, has you gain any benefits from patch?

@spinute
Copy link
Author

spinute commented Sep 5, 2016

Thank you for replying.
It’s okay just to fix your codes instead of merging.
I leave the way to fix to you, as long as resolving compilation error.

btw, has you gain any benefits from patch?

Sorry, I can’t get it. What benefit are you mentioning?
If you are saying about the improvement by your patch, I posted benchmark results on my environment to https://bugs.ruby-lang.org/issues/12142.

funny-falcon pushed a commit that referenced this pull request Jan 31, 2017
17+ years passed since standardized in ISO, 8 years since we added
AC_HEADER_STDBOOL to configure.in.  I'm quite confident that it's
already safe to use <stdbool.h>.

I understand that when we introduced AC_HEADER_STDBOOL, <stdbool.h>
was remain not included because C standard and SVR4 curses conflicted
miserably back then (#1).  Though I believe such situation has been
fixed already(ruby#2), I'm afraid of your operating system might ship a
proprietary curses that still conflicts with the standard. So to avoid
potential problem, we limit the inclusion to our internal use only.

#1 : 1997 version of SUSv2 said bool is "defined though typedef" in
     <curses.h>, while C99 said bool is a macro, plus in C++ bool is a
     keyword.  AFASIK the curses library has never been a part of
     POSIX.

ruby#2 : In reality ncurses and NetBSD curses both just follow C99 to
     include <stdbool.h> from <curses.h>.  I think C99 is now widely
     adopted.

----

	* internal.h: #include <stdbool.h> if present.  That is
	  believed to be the case for 99.9% systems that lives today.
	  Non-C99, non-C++ situations are intentionally left
	  undefined, advised by Motohiro Kosaki.  If you have such
	  compiler, please fill the definition appropriately.


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57460 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants