Download TomsFastMath User Manual v0.12

Transcript
TomsFastMath User Manual
v0.12
Tom St Denis
[email protected]
March 14, 2007
This text and library are all hereby placed in the public domain. This book
has been formatted for B5 [176x250] paper using the LATEX book macro package.
This project was sponsored in part by
Secure Science Corporation http://www.securescience.net.
Contents
1 Introduction
1.1 What is TomsFastMath? . . . . . .
1.2 License . . . . . . . . . . . . . . . .
1.3 Building . . . . . . . . . . . . . . .
1.3.1 Intel CC . . . . . . . . . . .
1.3.2 MSVC . . . . . . . . . . . .
1.3.3 Build Limitations . . . . . .
1.3.4 Optimization Configuration
1.3.5 Build Configurations . . . .
1.3.6 Precision Configuration . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
2
2
2
2
3
3
5
5
2 Getting Started
2.1 Data Types . . . . . . . . . . . .
2.2 Initialization . . . . . . . . . . .
2.2.1 Simple Initialization . . .
2.2.2 Initialize Small Constants
2.2.3 Initialize Copy . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
7
7
8
8
8
8
3 Arithmetic Operations
3.1 Odds and Evens . . . .
3.2 Sign Manipulation . . .
3.3 Comparisons . . . . . .
3.4 Shifting . . . . . . . . .
3.5 Basic Algebra . . . . . .
3.6 Modular Exponentiation
3.7 Number Theoretic . . .
3.8 Prime Numbers . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
9
10
10
11
11
11
12
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
iii
.
.
.
.
.
.
.
.
4 Porting TomsFastMath
4.1 Getting Started . . . . . .
4.2 Multiply with Comba . .
4.3 Squaring with Comba . .
4.4 Montgomery with Comba
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
13
13
15
17
List of Figures
1.1
Recommended Build Modes . . . . . . . . . . . . . . . . . . . . .
v
5
Chapter 1
Introduction
1.1
What is TomsFastMath?
TomsFastMath is meant to be a very fast yet still fairly portable and easy to
port large integer arithmetic library written in ISO C. The goal specifically
is to be able to perform very fast modular exponentiations and other related
functions required for ECC, DH and RSA cryptosystems.
Most of the library is pure ISO C portable source code while a small portion (three files) contain a mixture of ISO C and assembler inline fragments.
Compared to LibTomMath this new library is meant to be much faster while
sacrificing flexibiltiy. This is accomplished through several means.
1. The new code is slightly messier and contains asm blocks.
2. This uses fixed not multiple precision integers.
3. It is designed only for fast modular exponentiations [e.g. less flexibility].
To mitigate some of the problems that arise from using assembler it has
been carefully and appropriately used where it would make the most gain in
performance. Also we use macro’s for assembler code which allows new ports
to be inserted easily.
The new code uses fixed precision arithmetic which means at compile time
you choose a maximum precision and all numbers are limited to that. This has
the benefit of not requiring any memory heap operations (which are slow) in
1
2
CHAPTER 1. INTRODUCTION
any of the functions. It has the downside that integers that are too large are
truncated.
The goal of this library is to be able to perform modular exponentiations
(with an odd modulus) very fast. This is what takes the most time in systems
such as RSA and DH. This also requires fast multiplication and squaring and
has the side effect of speeding up ECC operations as well.
1.2
License
TomsFastMath is public domain.
1.3
Building
To build the library simply type “make”. Or to install in typical *unix like
directories use “make install”. Similarly a shared library can be built with
“make -f makefile.shared install”.
You can build the test program with “make test”. To perform simple static
testing (useful to test out new assembly ports) use the stest program. Type
“make stest” and run it on your target. The program will perform three multiplications, squarings and montgomery reductions. Likely if your assembly code
is invalid this code will exhibit the bug.
1.3.1
Intel CC
In theory you should be able to build the library with
CFLAGS="-O3 -ip" CC=icc make IGNORE_SPEED=1
However, Intels inline assembler is way less advanced than GCCs. As a result
it doesn’t compile. Fortunately it doesn’t really matter.
1.3.2
MSVC
The library doesn’t build with MSVC. Imagine that.
1.3. BUILDING
1.3.3
3
Build Limitations
TomsFastMath has the following build requirements which are non–portable but
under most circumstances not problematic.
1. “CHAR BIT” must be eight.
2. The “fp digit” type must be a multiple of eight bits long.
3. The “fp word” must be at least twice the length of fp digit.
1.3.4
Optimization Configuration
By default TFM is configured for 32–bit digits using ISO C source code. This
mode while portable is not very efficient. While building the library (from
scratch) you can define one of several “CFLAGS” defines.
For example, to build with with SSE2 optimizations type
CFLAGS=-DTFM_SSE2 make clean libtfm.a
x86–32
The “x86–32” mode is defined by “TFM X86” and covers all i386 and beyond
processors. It requires GCC to build and only works with 32–bit digits. In this
mode fp digit is 32–bits and fp word is 64–bits. This mode will be autodetected
when building with GCC to an “i386” target. You can override this behaviour
by defining TFM NO ASM or another optimization mode (such as SSE2).
SSE2
The “SSE2” mode is defined by “TFM SSE2” and requires a Pentium 4, Pentium M or Athlon64 processor. It requires GCC to build. Note that you
shouldn’t define both TFM X86 and TFM SSE2 at the same time. This mode
only works with 32–bit digits. In this mode fp digit is 32–bits and fp word is
64–bits. While this mode will work on the AMD Athlon64 series of processors
it is less efficient than the native “x86–64” mode and not recommended.
There is an additional “TFM PRESCOTT” flag that you can define for P4
Prescott processors. This causes the mul/sqr functions to use x86 32 and the
montgomery reduction to use SSE2 which is (so far) the fastest combination. If
you are using an older (e.g. Northwood) generation P4 don’t define this.
4
CHAPTER 1. INTRODUCTION
x86–64
The “x86–64” mode is defined by “TFM X86 64” and requires a “x86–64” capable processor (Athlon64 and future Pentium processors). It requires GCC to
build and only works with 64–bit digits. Note that by enabling this mode it will
automatically enable 64–bit digits. In this mode fp digit is 64–bits and fp word
is 128–bits. This mode will be autodetected when building with GCC to an
“x86–64” target. You can override this behaviour by defining TFM NO ASM.
ARM
The “ARM” mode is defined by “TFM ARM” and requires a ARMv4 with the
M instructions (enhanced multipliers) or higher processor. It requires GCC and
works with 32–bit digits. In this mode fp digit is 32–bits and fp word is 64–bits.
PPC32
The “PPC32” mode is defined by “TFM PPC32” and requires a standard PPC
processor. It doesn’t use altivec or other extensions so it should work on all
compliant implementations of PPC. It requires GCC and works with 32–bit
digits. In this mode fp digit is 32–bits and fp word is 64–bits.
PPC64
The “PPC64” mode is defined by “TFM PPC64” and requires a 64–bit PPC
processor.
AVR32
The “AVR32” mode is defined by “TFM AVR32” and requires an Atmel AVR32
processor.
Future Releases
Future releases will support additional platform optimizations. Developers of
MIPS and SPARC platforms are encouraged to submit GCC asm inline patches
(see chapter 4 for more information).
5
1.3. BUILDING
Processor
All 32–bit x86 platforms
Pentium 4
Pentium 4 Prescott
Athlon64
ARMv4 or higher with M
G3/G4 (32-bit PPC)
G5 (64-bit PPC)
Atmel AVR32
Recommended Mode
TFM X86
TFM SSE2
TFM SSE2 + TFM PRESCOTT
TFM X86 64
TFM ARM
TFM PPC32
TFM PPC64
TFM AVR32
x86–32 or x86–64 (with GCC)
Leave blank and let autodetect work
Figure 1.1: Recommended Build Modes
1.3.5
Build Configurations
TomsFastMath is configurable in terms of which unrolled code (if any) is included. By default, the majority of the code is included which results in large
binaries. The first flag to try out is TFM ALREADY SET which tells TFM to
turn off all unrolled code. This will result in a smaller library but also a much
slower library.
From this clean state, you can start enabling unrolled code for given cryptographic tasks at hand. A series of TFM MULXYZ and TFM SQRXYZ macros
exist to enable specific unrolled code. For instance, TFM MUL32 will enable a
32 digit unrolled multiplier. For a complete list see the tfm.h header file. Keep
in mind this is for digits not bits. For example, you should enable TFM MUL16
if you are doing 1024-bit exptmods on a 64–bit platform, enable TFM MUL32
on 32–bit platforms.
To help developers use ECC there are a set of defines for the five NIST
curve sizes. They are named TFM ECCXYZ where XYZ is one of 192, 224,
256, 384, or 521. These enable the multipliers and squaring code for a given
curve, autodetecting 64–bit platforms as well.
1.3.6
Precision Configuration
The precision of all integers in this library are fixed to a limited precision.
Essentially the rule of setting the precision is if you plan on doing modular
exponentiation with k–bit numbers than the precision must be fixed to 2k–bits
plus four digits.
6
CHAPTER 1. INTRODUCTION
This is changed by altering the value of “FP MAX SIZE” in tfm.h to your
desired size. By default, the library is configured to handle upto 2048–bit inputs
to the modular exponentiator.
Chapter 2
Getting Started
2.1
Data Types
TomsFastMath is a large fixed precision integer library. It provides the functionality to manipulate large signed integers through a relatively trivial api and
a single data type.
The “fp int” or fixed precision integer is the data type that the functions
operate with.
typedef struct {
fp_digit dp[FP_SIZE];
int
used,
sign;
} fp_int;
The dp member is the array of digits that forms the number. It must always
be zero padded. The used member is the count of digits used in the array.
Although the precision is fixed the algorithms are still tuned to not process the
entire array if it does not have to. The sign indicates the sign of the integer. It
is FP ZPOS (0) if the integer is zero or positive and FP NEG (1) otherwise.
7
8
CHAPTER 2. GETTING STARTED
2.2
2.2.1
Initialization
Simple Initialization
To initialize an integer to the default state of zero use the fp init() function.
void fp_init(fp_int *a);
This will initialize the fp int a to zero. Note that the function fp zero() is
an alias for fp init().
2.2.2
Initialize Small Constants
To initialize an integer with a small single digit value use the fp set() function.
void fp_set(fp_int *a, fp_digit b);
This will initialize a and set it equal to the digit b.
2.2.3
Initialize Copy
To initialize an integer with a copy of another integer use the fp init copy()
function.
void fp_init_copy(fp_int *a, fp_int *b)
This will initialize a as a copy of b. Note that for compatibility with LibTomMath the function fp copy() is also provided.
Chapter 3
Arithmetic Operations
3.1
Odds and Evens
To quickly and easily tell if an integer is zero, odd or even use the following
functions.
int fp_iszero(fp_int *a);
int fp_iseven(fp_int *a);
int fp_isodd(fp_int *a);
These will return FP YES if the answer to their respective questions is yes.
Otherwise they return FP NO. Note that these are implemented as macros and
as such you should avoid using ++ or – – operators on the input operand.
3.2
Sign Manipulation
To negate or compute the absolute of an integer use the following functions.
void fp_neg(fp_int *a, fp_int *b);
void fp_abs(fp_int *a, fp_int *b);
This will compute the negation (or absolute) of a and store the result in b. Note
that these are implemented as macros and as such you should avoid using ++
or – – operators on the input operand.
9
10
CHAPTER 3. ARITHMETIC OPERATIONS
3.3
Comparisons
To perform signed or unsigned comparisons use following functions.
int fp_cmp(fp_int *a, fp_int *b);
int fp_cmp_mag(fp_int *a, fp_int *b);
These will compare a to b. They will return FP GT if a is larger than b,
FP EQ if they are equal and FP LT if a is less than b.
The function fp cmp performs signed comparisons while the other performs
unsigned comparisons.
3.4
Shifting
To shift the digits of an fp int left or right use the following functions.
void fp_lshd(fp_int *a, int x);
void fp_rshd(fp_int *a, int x);
These will shift the digits of a left (or right respectively) x digits.
To shift individual bits of an fp int use the following functions.
void
void
void
void
void
void
fp_div_2d(fp_int *a, int b, fp_int *c, fp_int *d);
fp_mod_2d(fp_int *a, int b, fp_int *c);
fp_mul_2d(fp_int *a, int b, fp_int *c);
fp_mul_2(fp_int *a, fp_int *c);
fp_div_2(fp_int *a, fp_int *c);
fp_2expt(fp_int *a, int b);
fp div 2d() will divide a by 2b and store the quotient in c and remainder in d.
Either of c or d can be NULL if their value is not required. fp mod 2d() is a
shortcut to compute the remainder directly. fp mul 2d() will multiply a by 2b
and store the result in c.
The fp mul 2() and fp div 2() functions are optimized multiplication and
divisions by two. The function fp 2expt() will compute a = 2b quickly.
To quickly count the number of least significant bits that are zero use the
following function.
int fp_cnt_lsb(fp_int *a);
This will return the number of adjacent least significant bits that are zero. This
is equivalent to the number of times two evenly divides a.
3.5. BASIC ALGEBRA
3.5
11
Basic Algebra
The following functions round out the basic algebraic functionality of the library.
void fp_add(fp_int *a, fp_int *b, fp_int *c);
void fp_sub(fp_int *a, fp_int *b, fp_int *c);
void fp_mul(fp_int *a, fp_int *b, fp_int *c);
void fp_sqr(fp_int *a, fp_int *b);
int fp_div(fp_int *a, fp_int *b, fp_int *c, fp_int *d);
int fp_mod(fp_int *a, fp_int *b, fp_int *c);
The functions fp add(), fp sub() and fp mul() perform their respective operations on a and b and store the result in c. The function fp sqr() computes
b = a2 and is faster than using fp mul() to perform the same operation.
The function fp div() divides a by b and stores the quotient in c and remainder in d. Either of c and d can be NULL if the result is not required. The
function fp mod() is a simple shortcut to find the remainder.
3.6
Modular Exponentiation
To compute a modular exponentiation use the following function.
int fp_exptmod(fp_int *a, fp_int *b, fp_int *c, fp_int *d);
This computes d ≡ ab (mod c) for any odd c and b. b may be negative so long
as a−1 (mod c) exists. The initial value of a may be larger than c. The size of
c must be half of the maximum precision used during the build of the library.
For example, by default c must be less than 22048 .
3.7
Number Theoretic
To perform modular inverses, greatest common divisor or least common multiples use the following functions.
int fp_invmod(fp_int *a, fp_int *b, fp_int *c);
void fp_gcd(fp_int *a, fp_int *b, fp_int *c);
void fp_lcm(fp_int *a, fp_int *b, fp_int *c);
12
CHAPTER 3. ARITHMETIC OPERATIONS
The fp invmod() function will find the modular inverse of a modulo an odd
modulus b and store it in c (provided it exists). The function fp gcd() will
compute the greatest common divisor of a and b and store it in c. Similarly the
fp lcm() function will compute the least common multiple of a and b and store
it in c.
3.8
Prime Numbers
To quickly test a number for primality call this function.
int fp_isprime(fp_int *a);
This will return FP YES if a is probably prime. It uses 256 trial divisions and
eight rounds of Rabin-Miller testing. Note that this routine performs modular
exponentiations which means that a must be in a valid range of precision.
Chapter 4
Porting TomsFastMath
4.1
Getting Started
Porting TomsFastMath to a given processor target is usually a simple procedure.
For the most part assembly is used to get around the lack of a “add with carry”
operation in the C language. To make matters simpler the use of assembler is
through macro blocks.
Each “port” is defined by a block of code that re-defines the portable ISO
C macros with assembler inline blocks. To add a new port you must designate
a TFM XXX define that will enable your port when built.
4.2
Multiply with Comba
The file “fp mul comba.c” is responsible for providing the fast multiplication
within the library. This comba multiplication is fairly simple. It uses a sliding
three digit carry system with the variables c0, c1, c2. For every digit of output
c0 is the what will be that digit, c1 will carry into the next digit and c2 will be
the “c1” carry for the next digit. For every “next” digit effectively c0 is stored
as output, c1 moves into c0, c2 into c1 and zero into c2.
The following macros define the assmebler interface to the code.
#define COMBA_START
This is issued at the beginning of the multiplication function. This is in
place to allow you to initialize any registers or machine words required. You
13
14
CHAPTER 4. PORTING TOMSFASTMATH
can leave it blank if you do not need it.
#define COMBA_CLEAR \
c0 = c1 = c2 = 0;
This clears the three comba carries. If you are going to place carries in
registers then zero the appropriate registers. Note that the functions do not use
c0, c1 or c2 directly so you are free to ignore these varibles and use registers
directly.
#define COMBA_FORWARD \
c0 = c1; c1 = c2; c2 = 0;
This propagates the carries after a digit has been produced.
#define COMBA_STORE(x) \
x = c0;
This stores the c0 digit in the memory location specified by x. Note that if
you manually aliased c0 with a register than just store that register in x.
#define COMBA_STORE2(x) \
x = c1;
This stores the c1 digit in the memory location specified by x. Note that if
you manually aliased c1 with a register than just store that register in x.
#define COMBA_FINI
If at the end of the function you need to perform some action fill this macro
in.
#define
t =
c0 =
c1 =
MULADD(i, j)
\
((fp_word)i) * ((fp_word)j);
\
(c0 + t);
if (c0 < ((fp_digit)t)) ++c1; \
(c1 + (t>>DIGIT_BIT)); if (c1 < (t>>DIGIT_BIT)) ++c2;
This macro performs the “multiply and add” step that is central to the
comba multiplier. It multiplies the fp digits i and j to produce a fp word result.
Effectively the double–digit value is added to the three-digit carry formed by
c0, c1, c2 where c0 is the least significant digit.
4.3. SQUARING WITH COMBA
4.3
15
Squaring with Comba
Squaring is similar to multiplication except that it uses a special “multiply and
add twice” macro that replaces multiplications that are not required.
#define COMBA_START
This allows for any initialization code you might have.
#define CLEAR_CARRY \
c0 = c1 = c2 = 0;
This will clear the carries. Like multiplication you can safely alias the three
carry variables to registers if you can/want to.
#define COMBA_STORE(x) \
x = c0;
Store the c0 carry to a given memory location.
#define COMBA_STORE2(x) \
x = c1;
Store the c1 carry to a given memory location.
#define CARRY_FORWARD \
c0 = c1; c1 = c2; c2 = 0;
Forward propagate all three carry variables.
#define COMBA_FINI
If you need to clean up at the end of the function.
/* multiplies point i and j, updates carry "c1" and digit c2 */
#define SQRADD(i, j)
\
t = ((fp_word)i) * ((fp_word)j);
\
c0 = (c0 + t);
if (c0 < ((fp_digit)t)) ++c1; \
c1 = (c1 + (t>>DIGIT_BIT)); if (c1 < (t>>DIGIT_BIT)) ++c2;
This is essentially the MULADD macro from the multiplication code.
16
CHAPTER 4. PORTING TOMSFASTMATH
/* for squaring some of the terms are
#define SQRADD2(i, j)
t = ((fp_word)i) * ((fp_word)j);
c0 = (c0 + t);
if (c0
c1 = (c1 + (t>>DIGIT_BIT)); if (c1
c0 = (c0 + t);
if (c0
c1 = (c1 + (t>>DIGIT_BIT)); if (c1
doubled... */
\
\
< ((fp_digit)t))
< (t>>DIGIT_BIT))
< ((fp_digit)t))
< (t>>DIGIT_BIT))
++c1; \
++c2; \
++c1; \
++c2;
This is like SQRADD except it adds the produce twice. It’s similar to
computing SQRADD(i, j*2).
To further make things interesting the squaring code also has “doubles” (see
my LTM book chapter five...) which are handled with these macros.
#define SQRADDSC(i, j)
do { fp_word t;
t = ((fp_word)i) * ((fp_word)j);
sc0 = (fp_digit)t; sc1 = (t >> DIGIT_BIT); sc2 = 0;
} while (0);
\
\
\
\
This computes a product and stores it in the “secondary” carry registers hsc0, sc1, sc2i.
#define SQRADDAC(i, j)
do { fp_word t;
t = sc0 + ((fp_word)i) * ((fp_word)j); sc0 = t;
t = sc1 + (t >> DIGIT_BIT);
sc1 = t; sc2 += t >> DIGIT_BIT;
} while (0);
\
\
\
\
This computes a product and adds it to the “secondary” carry registers.
#define SQRADDDB
do { fp_word t;
t = ((fp_word)sc0) + ((fp_word)sc0) + c0; c0 = t;
t = ((fp_word)sc1) + ((fp_word)sc1) + c1 + (t >> DIGIT_BIT); c1 = t;
c2 = c2 + ((fp_word)sc2) + ((fp_word)sc2) + (t >> DIGIT_BIT);
} while (0);
This doubles the “secondary” carry registers and adds the sum to the main
carry registers. Really complicated.
\
\
17
4.4. MONTGOMERY WITH COMBA
4.4
Montgomery with Comba
Montgomery reduction is used in modular exponentiation and is most called
function during that operation. It’s important to make sure this routine is very
fast or all is lost.
Unlike the two other comba routines this one does not use a single three–
digit carry system. It does have three–digit carries except that the routine steps
through them in the inner loop. This means you cannot alias them to registers
(at all).
To make matters simple though the three arrays of carries are stored in one
array. The “c0” array resides in c[0 . . . OF F 1−1], “c1” in c[OF F 1 . . . OF F 2−1]
and “c2” in c[OF F 2 . . . OF F 2 + F P SIZE − 1].
#define MONT_START
This allows you to insert anything at the start that you need.
#define MONT_FINI
This allows you to insert anything at the end that you need.
#define LOOP_START \
mu = c[x] * mp;
This computes the µ value for the inner loop. You can safely alias mu and
mp to a register if you want.
#define INNERMUL
do { fp_word t;
_c[0] = t = ((fp_word)_c[0] + (fp_word)cy) +
(((fp_word)mu) * ((fp_word)*tmpm++));
cy = (t >> DIGIT_BIT);
} while (0)
\
\
\
\
\
This computes the inner product and adds it to the destination and carry
variable cy. This uses the mu value computed above (can be in a register
already) and the cy which is a chaining carry. Inside the INNERMUL loop the
cy value can be kept inside a register (hint: it always starts as cy = 0 in the
first iteration).
Upon completion of the inner loop the macro LOOP END is called which is
used to fetch cy into the variable the C program can see. This is where, if you
cached cy in a register you would copy it to the locally accessible C variable.
18
CHAPTER 4. PORTING TOMSFASTMATH
#define PROPCARRY \
do { fp_digit t = _c[0] += cy; cy = (t < cy); } while (0)
This propagates the carry upwards by one digit.
Index
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
fp
abs, 9
add, 11
cmp, 10
cmp mag, 10
cnt lsb, 10
div, 11
div 2, 10
div 2d, 10
exptmod, 11
gcd, 11
init, 8
init copy, 8
invmod, 11
iseven, 9
isodd, 9
isprime, 12
iszero, 9
lcm, 11
lshd, 10
mod, 11
mod 2d, 10
mul, 11
mul 2, 10
mul 2d, 10
neg, 9
rshd, 10
set, 8
sqr, 11
sub, 11
19