Assembly support for GMP on AMD64
Purpose
This is a patch to gmp-4.2.x for AMD64 architecture. The 4.2.x version comes
with basic assembly support. This patch gives substantial speed-up.
Only a few functions have been written:
- add_n
- sub_n
- addmul_1
- submul_1
- mul_basecase
- sqr_basecase
The assembly code is mostly a 64 bit translation of the k7 assembly code
that is available in GMP. The main modifications are:
- The ABI for function calls is not the same: up to 6 parameters
are passed in registers, not on the stack.
- Change movl to movq, eax to rax, etc... That's the easy part.
- In an unrolled loop, the size of the unrolled code is not the same, so
the computation of the jump is different.
Changes
There is almost no change compared to the patch
for 4.1.4. The
multiplication has been slighlty improved (around 3.15 cyc/limb) but most
of the improvement in the gmpbench score comes from modifications in the
C code of GMP between the 2 versions.
Disclaimer
The code has been reasonably well tested. I used the program
tests/devel/try that tests quite a few bug possibilities. Nonetheless,
since there are less users, the correctness of the code is less likely
than for any official GMP code.
Bugs
Please send comments and bugs to gaudry@lix.polytechnique.fr and *not* to
the official GMP developpers: they have nothing to do with this code.
Performance
I've got a multiply bench of around 55000 on a 2.4 GHz Opteron (was 41500
with the plain 4.2). The whole gmpbench score is about 10000 (was
8200 before patch).
Download / install
- Get the gmp-4.2.x archive and unpack it, thus creating a
directory /path_to_gmp/gmp-4.2.x/
- Download the mpn_amd64.42 archive and
unpack it. In the directory of mpn_amd64.42, run
./install /path_to_gmp/gmp-4.2.x
- cd /path_to_gmp/gmp-4.2.x
- ./configure with your favorite options
- make ; make check ; make install