The following two files, newton.c
and newton.S
are mainly for demonstration and served their purpose as for me warming up to x86-64 Assembly once more. I did not test them because I have other stuff brewing. I share them with you so I can discuss the code only. And I want your opinions.
The Assembly subroutine, newton_rhapson
only has 6 instructions and 3 of them are loading the arrays. Let's go through it together:
.global newton_rhapson
.text
newton_rhapson:
#define NUMBERS %rdi
#define JACOBIAN %rsi
#define CURRENTX %rdx
#define DESTINATION %rcx
vmovupd NUMBERS, VPDNUMBERS
vmovupd JACOBIAN, VPDJACOBIAN
vmovupd CURRENTX, VPDCURRENTX
vdivpd VPDNUMBERS, VPDJACOBIAN
vsubpd VPDJACOBIAN, VPDCURRENTX
vmovupd VPDCURRENTX, DESTINATION
ret
I first declrate the aforementioned lable as a global, so it can be seen by C as an extern
symbol. And then, I used CPP to define several macros that will make the reading much easier. It's kinda Lisp-like ain't it?
I am going to explain the instructions now:
-
vmovupd
-> This instruction loads an array of 64 bytes (given AVX512), 32 bytes (given AVX2) or 16 bytes (given SSE4), maps the bytes to double floats, and stores them in the extedned registers. Notice that above, at the beginning of the file, we have defined these extended registers to match with the right extension, e.g.ZMM
registers for AVX512. -
vdipd
andvsubpd
-> These instructions do a SIMD division and SIMD subtraction, SIMD being vectorized "single instruction, mulitple data". -
vmovupd
-> This time we do the reverse, excpet we copy the vector into the mmory address adDESTINATION
which is general-purpose registerRCX
.
In the C code, we declare this function as extern
:
extern void newton_rhapson(DOUBLE *numbers, DOUBLE * jacobian, DOUBLE *currentx, DOUBLE *dest);
We must compile with:
gcc newton.S newton.c -o newton
We then accept the input from a redirected STDIN
in form of echo 'N=[...] J=[...] X=[...]' | newton
.
So as I said this code most probably does not work, but I just wanted to share it with you since the aim is to seek general advice / critique etc.
Please visit my mainpage Github profile where I store a looot of goodies that you may even find useful: https://github.com/Chubek
Thanks.
NOTE: Remember the extension for the Assembly file must be a captial S
because then it will expand the CPP macros. You could, however, expand them yourself and pre-save it via cpp
or gcc -E
.
PS: I'm ashamed to say it so I say it backwards -> (: buj na ydeen em