Download AMD x86 Typewriter User Manual
Transcript
AMD Athlon™ Processor x86 Code Optimization
22007E/0—November 1999
/* Function XForm performs a fully generalized 3D transform on an array
of vertices pointed to by "v" and stores the transformed vertices in
the location pointed to by "res". Each vertex consists of four floats.
The 4x4 transform matrix is pointed to by "m". The matrix elements are
also floats. The argument "numverts" indicates how many vertices have
to be transformed. The computation performed for each vertex is:
res->x
res->y
res->z
res->w
=
=
=
=
v->x*m[0][0]
v->x*m[0][1]
v->x*m[0][2]
v->x*m[0][3]
+
+
+
+
v->y*m[1][0]
v->y*m[1][1]
v->y*m[1][2]
v->y*m[1][3]
+
+
+
+
v->z*m[2][0]
v->z*m[2][1]
v->z*m[2][2]
v->z*m[2][3]
+
+
+
+
v->w*m[3][0]
v->w*m[3][1]
v->w*m[3][2]
v->w*m[3][3]
*/
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
#define
M00
M01
M02
M03
M10
M11
M12
M13
M20
M21
M22
M23
M30
M31
M32
M33
0
4
8
12
16
20
24
28
32
36
40
44
48
52
56
60
void XForm (float
{
_asm {
MOV
MOV
MOV
MOV
*res, const float *v, const float *m, int numverts)
EDX,
EAX,
EBX,
ECX,
[V]
[M]
[RES]
[NUMVERTS]
;EDX
;EAX
;EBX
;ECX
=
=
=
=
source vector ptr
matrix ptr
destination vector ptr
number of vertices to transform
;3DNow! version of fully general 3D vertex tranformation.
;Optimal for AMD Athlon (completes in 16 cycles)
FEMMS
ALIGN
120
;clear MMX state
16
;for optimal branch alignment
Optimized Matrix Multiplication