Modular addition/subtraction

Obscurans October 6 2008 3:36 AM EDT

I'm trying to think of a fast bit-twiddly method for modular addition or subtraction (obviously not modulo 2^n):

s = a + b; s -= -(s >= n) & n;
s = a - b; s += -(s < 0) & n;

Using the old trick of instant propagation on negation and knowing that only one loop-around will ever be needed.

Weirdly enough subtraction is faster since (s >= n) definitely becomes (cmp s n; setae) while (s < 0) under any decent compiler would just copy off the sign flag (sets).

Reminds me of my algorithms prof saying in the old days division was faster than multiplication and they went and did a /= (1/b).

Little Anthony October 6 2008 3:56 AM EDT

subtraction is never faster than addition.

Obscurans October 6 2008 4:15 AM EDT

What I see from the two code bits, clobbering two temporary registers:

mov t1 a
add t1 b
cmp t1 n
setae t2
neg t2
and t2 n
sub t1 t2

versus

mov t1 a
sub t1 b
sets t2
neg t2
and t2 n
add t1 t2

Since I've already computed (a - b) prior, I can set on sign immediately. Although this does look like a 6/7-linear chain of 1-ops, it does however avoid the horrendous 10+ cycles latency of an IDIV though.

Obscurans October 6 2008 5:21 AM EDT

Timings for 2.1 billion sequential iterations on my K8

a + b, mod: 16.441s
a - b, mod: 16.773s (had to actually do (a - b + n) % n)
a + b, hack: 10.611s
a - b, hack: 9.758s (I said so)

And here's the funny part:

a - (n - b), hack: 10.280s (lulz ensue)

OK I'm not done yet, time for my own cooked up assembly:

__asm__("\
movl %4, %3\n\t\
movl %1, %0\n\t\
movl $0, %%edx\n\t\
addl %2, %0\n\t\
idivl %3\n\t\
movl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "D"(u), "n"(N) : "edx");
Runs in 32.776s since I screwed the latency head-on

__asm__("movl %4, %3\n\t\
movl %1, %0\n\t\
movl $0, %%edx\n\t\
subl %2, %0\n\t\
addl %3, %0\n\t\
idivl %3\n\t\
movl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "D"(u), "n"(N) : "edx");
33.747s, extra +n hurts as it should

__asm__("\
movl %1, %0\n\t\
addl %2, %0\n\t\
cmp %3, %0\n\t\
setae %%dl\n\t\
negl %%edx\n\t\
andl %3, %%edx\n\t\
subl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "n"(N) : "edx");
Requires only 7.088s, I beat -O0 yay

__asm__("\
movl %1, %0\n\t\
subl %2, %0\n\t\
sets %%dl\n\t\
negl %%edx\n\t\
andl %3, %%edx\n\t\
addl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "n"(N) : "edx");
Only 6.272s, way faster here

__asm__("\
movl %3, %0\n\t\
addl %1, %0\n\t\
subl %2, %0\n\t\
sets %%dl\n\t\
negl %%edx\n\t\
andl %3, %%edx\n\t\
addl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "n"(N) : "edx");
8.078s, oh well.

Did I mention spellcheck is throwing a hissy fit?

Mirick October 6 2008 6:05 AM EDT

crikey :|

Obscurans October 6 2008 6:37 AM EDT

Sheesh, I can't believe I literally typed the a - (-b) out:

__asm__("\
leal -N(%1), %0\n\t\
addl %2, %0\n\t\
sets %%dl\n\t\
negl %%edx\n\t\
andl %3, %%edx\n\t\
addl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "n"(N) : "edx");

-N must be actually written in, but now this thing runs in 6.280s :)

PearsonTritonRaveshaw October 6 2008 8:53 AM EDT

?_?

English in the forums please. =P