The following two files contain implementation of DJB2 hash algorithm in both Aarch64 and x86-64 Assembly languages.
DJB2 is a simple hash function. It is defined as:
unsigned long hash(char *str) {
unsigned long hash = 5381;
int c;
while ((c = *str++))
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
It's a non-crypto hash by Daniel Bernstein. DJB2 is similar to a Linear congruential generator and uses prime numbers 5381 and 33 to generate a psuedo-random unsigned quad given a sequence of signed bytes. In most implementations of DJB2, for the sake of speed, instead of multiplication by 33, shifting and adding is used. In the following implementations I did the same. You may say that's an extra instruction, but remember that instruction != cycle. A mul
instruction takes more cycles than a shift, plus and an add instruction.
Both the assembly files have an entry point called jdb2_ccc
. ccc
, of course, stands for C Calling Convention. But not any calling convention as C has several calling conventions across platforms. Plus I did not adhere to the convention fully. I did save all the callee-save registers, and the function yields no results, rather, saves the result in the address carried by the second argument. The following table shows all the clobbered, saved, argument and result registers.
Arch | Register(s) | Status | Usage |
---|---|---|---|
x86-64 | R12 | Callee-Saved | A the result of shift |
Aarch64 | X5 & X6 | Callee-Saved | As holder of loaded byte and address to byte string |
x86-64 | R11 | Clobbered | As holder of final hash |
Aarch64 | X0 | Clobbered | As holder of final hash |
x86-64 | RSI | Modified | Arg 2, Dereferenced and final hash put into it |
Aarch64 | X1 | Modified | Arg 2, Dereferenced and final hash put into it |
x86-64 | RDI | Preserved | Arg 1, Pointer to byte string |
Aarch64 | X0 | Clobbered | Arg 1, Pointer to byte string |
Judging by this table, you can decide if it's safe to use the implementations.
I believe it's much safer to pass a pointer to the location which the user needs result to be saved at. The concept of 'variable' is a very volatile thing! I was experiencing issues because C, in main
function, was doing bit extension and ruining the result that I was returning in X0/RAX. This made sure the result is preserved as it should. In general it's much more solid and grounded anyways. If you have any idea why the bit extension was happening, please tell me.
Just make a C file like this and call it djb2.c
:
#include <stdio.h>
int main()
{
unsigned long hash = 0;
djb2_ccc("my_null_terminated_str", &hash);
printf("%lu\n", hash);
}
And then pass the assembly file for the arch you want AFTER the C file name to GCC or Clang:
gcc djb2.c djb2.<arch>.s -o djb2 && ./djb2
And it will compile and run the file.
First, create a C file like this:
unsigned long djb2_hash(char *stream)
{
unsigned long hash = 0;
djb2_ccc(stream, &hash);
return hash;
}
First create a shared object like so:
gcc -c -shared -fPIC djb2.c djb2.<arch>.s -o djb2.so
And then you can use ctypes
to import and use it:
from ctypes import *
djb2 = CDLL("/abs/path/to/djb2.so")
string = create_string_buffer(b"my_null_terminated_str\0")
hash = djb2.djb2_hash(byref(string))
print(hash)
Do the previous steps to generate the shared library. Then:
use std::ffi::{c_char, c_ulong};
extern "C" {
fn djb2_hash(message: *const c_char) -> c_ulong;
}
fn djb2_hash() -> u64 {
unsafe {
const MSG_PTR: *const c_char = {
const BYTES: &[u8] = b"my_null_terminated_message\0";
BYTES.as_ptr().cast()
};
let hash = djb2_hash(MSG_PTR);
hash as u64
}
}
Notice that Python and Rust codes are untested! You may need to make changes to it to work.
You need to use QEMU's Userspace emulators to run the Aarch64 code in non-ARM machines. There are several guides out there for this you can consult.
These implementations are provided with NO WARRANTY that they will work, funcion as intended, work properly, do no clobber your registers, that the convention won't harm your flow, ZILCH. If they break, do not hold me responsible.
This is my first working assembly code. After this one I am going to implemented fixed-point integers in 6502 assembly. I am seeking comments to improve my skills. Please contact me on Discord at Chubak#7400.
Have a good day.