Hi,
I am working on implementing a simplified swisstable inside llvmlibc and then use it as the backend of hsearch. As this would grow into a rather large patch, I hope to know if this is a good choice.

Background

Swisstable

Swisstable comes from abseil library and its rust port is now a part of rust’s standard library. Its stability, memory footprint, and performance is well-examined.

There is a rather comprehensive comparison of hashtable implementations at Hashmaps Benchmarks - Conclusion. The results are quite positive toward swisstable.

hsearch

Both musl and glibc provides a search library inside which there are a set of functions related to hashtable management. However, their implementations are relatively simple compared with swisstable.

Questions

  • I have not considered the hash functions yet. Both glibc and musl uses the canonical hash functions that proceeds strings byte by byte. Is it good enough?

  • Swisstable itself is not that complicated; but it is way more complicated than the implementations in glibc and musl. So, generally, is this a good choice?

  • The interface specified in POSIX standards require users to specify the maximum number of elements during hashtable construction, while in the same time, allowing implementations to adjust the size to achieve better performance. In actual implementations, MUSL will do resize/rehash. Should we also consider rehashing? (or leave an compilation option for this?)

  • I am little bit confused about the naming convention inside llvmlibc. It seems that the __support module does not follow the convention in the main part of LLVM project. Also, directories like threads and StringUtil are using different style. Is there any guidance on the proper way of naming?

Looking forward to suggestions.

Best,
Yifan

Are you aware of cwisstable? It’s a simplified C-implementation of swisstable – which is intended to be imported as a single C11 header file. Check in particular the usage example, which demonstrates almost exactly the functionality you need. I think it would not be much work to write a few wrapper functions which expose the hsearch API on top of that.

cwisstable is not appropriately licensed for use in LLVM. Its contributor agreement says that contributors retain copyright for their contributions, but it looks like all the contributions so far are from Google employees (but I’ve only done a cursory look, I haven’t verified that point). If so, and cwisstable were deemed useful, a Google employee could do an initial code import into LLVM.

It can be a good choice; but there are several things to consider:

  1. Llvmlibc is a cpp project despite the fact it is a c library.
  2. There’s some places we want tailor for the interface of libc.
  3. Notice that we will not have a complete C standard library inside the implementation of C library. I am afraid that many parts including macro detection, memory management, header inclusion need to be adjusted.

Notice that we will not have a complete C standard library inside the implementation of C library.

Although I’m not an expert in llvm libc, it feels like this really shouldn’t be an issue, given that hsearch is practically a user-level library feature. That is: it’s leaf functionality which nothing else in libc depends on, and with no special platform magic involved. It seems like it really ought to be able to depend on a fully functional libc.

cwisstable is not appropriately licensed for use in LLVM

All contributions to cwisstable are covered by a Google CLA. So Google has the right to distribute under any license it desired, even if there are external contributors. If this code is deemed useful for llvm-libc, I’m sure it will not be a problem to get the LLVM exceptions added to its Apache license.

The LLVM libc is written in C++. The goal is to be a complete replacement of e.g. glibc or musl. It is an odd design if cwisstable would depend on C standard facilities.

So I almost finished a patch at https://github.com/llvm/llvm-project/compare/main...SchrodingerZhu:llvm-project:libc-hashtable (still need to do a lot of clean ups and create headers and specs). It turns out that the libc version is indeed quite different from the normal swisstable.

FYI, I am getting back to this and in the process of writing an updated patch series. Based on previous unsuccessful experiences:

  1. I will use license-compatible hash functions.
  2. I will largely simplify the hashtable implementation to fit the exact need of hsearch (fixed sized, incremental-only hashtable)

Hopefully, I will give a short and clean implementation very soon.

I’d be a interested in hearing a more detail about what made the attempt from last year unsuccessful. Stories of failure can often be more interesting than successes. :slight_smile:

The major issue was unlicense. :smiley: