Assembly support for GMP on AMD64
Purpose
This is a patch to gmp-4.1.4 in order to have some assembly loops for the
AMD64 architecture. According to the GMP webpage, assembly
support is planned only for the version 5.0. For those (like me) who can
hardly wait, here is some stuff in order to have decent timings. However,
this is still far from the estimated optimal performance given on the
GMPBench web
page (see Performance section, below).
Only a few functions have been written:
add_n
sub_n
addmul_1
submul_1
mul_basecase
sqr_basecase
The assembly code is mostly a 64 bit translation of the k7 assembly code
that is available in GMP. The main modifications are:
The ABI for function calls is not the same: up to 6 parameters
are passed in registers, not on the stack.
Change movl to movq, eax to rax, etc... That's the easy part.
In an unrolled loop, the size of the unrolled code is not the same, so
the computation of the jump is different.
Disclaimer
The code has been reasonably well tested. I used the program tests/devel/try
that tests quite a few bug possibilities. Nonetheless, since there are
less users and only one developer (!), the correctness of the code is
less likely than for any official GMP code.
Bugs
Please send comments and bugs to gaudry@lix.polytechnique.fr and *not* to
the official GMP developpers: they have nothing to do with this code.
Performance
I've got a multiply bench of around 48000 on a 2.4 GHz Opteron (was 27000
without the asm functions). The whole gmpbench score is about 8850 (was
5700 without asm, optimal estimated at 11000).
Options: CFLAGS = "-O2 -fomit-frame-pointer -funroll-loops -mcpu=k8"
Download / install
Get the gmp-4.1.4 archive and unpack it, thus creating a
directory /path_to_gmp/gmp-4.1.4/
Download the mpn_amd64 archive and
unpack it. In the directory of mpn_amd64, run
./install /path_to_gmp/gmp-4.1.4
cd /path_to_gmp/gmp-4.1.4
./configure with your favorite options
make ; make check ; make install