Obscurans
October 6 2008 3:36 AM EDT
I'm trying to think of a fast bit-twiddly method for modular addition or subtraction (obviously not modulo 2^n):
s = a + b; s -= -(s >= n) & n;
s = a - b; s += -(s < 0) & n;
Using the old trick of instant propagation on negation and knowing that only one loop-around will ever be needed.
Weirdly enough subtraction is faster since (s >= n) definitely becomes (cmp s n; setae) while (s < 0) under any decent compiler would just copy off the sign flag (sets).
Reminds me of my algorithms prof saying in the old days division was faster than multiplication and they went and did a /= (1/b).
subtraction is never faster than addition.
Obscurans
October 6 2008 4:15 AM EDT
What I see from the two code bits, clobbering two temporary registers:
mov t1 a
add t1 b
cmp t1 n
setae t2
neg t2
and t2 n
sub t1 t2
versus
mov t1 a
sub t1 b
sets t2
neg t2
and t2 n
add t1 t2
Since I've already computed (a - b) prior, I can set on sign immediately. Although this does look like a 6/7-linear chain of 1-ops, it does however avoid the horrendous 10+ cycles latency of an IDIV though.
Obscurans
October 6 2008 5:21 AM EDT
Timings for 2.1 billion sequential iterations on my K8
a + b, mod: 16.441s
a - b, mod: 16.773s (had to actually do (a - b + n) % n)
a + b, hack: 10.611s
a - b, hack: 9.758s (I said so)
And here's the funny part:
a - (n - b), hack: 10.280s (lulz ensue)
OK I'm not done yet, time for my own cooked up assembly:
__asm__("\
movl %4, %3\n\t\
movl %1, %0\n\t\
movl $0, %%edx\n\t\
addl %2, %0\n\t\
idivl %3\n\t\
movl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "D"(u), "n"(N) : "edx");
Runs in 32.776s since I screwed the latency head-on
__asm__("movl %4, %3\n\t\
movl %1, %0\n\t\
movl $0, %%edx\n\t\
subl %2, %0\n\t\
addl %3, %0\n\t\
idivl %3\n\t\
movl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "D"(u), "n"(N) : "edx");
33.747s, extra +n hurts as it should
__asm__("\
movl %1, %0\n\t\
addl %2, %0\n\t\
cmp %3, %0\n\t\
setae %%dl\n\t\
negl %%edx\n\t\
andl %3, %%edx\n\t\
subl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "n"(N) : "edx");
Requires only 7.088s, I beat -O0 yay
__asm__("\
movl %1, %0\n\t\
subl %2, %0\n\t\
sets %%dl\n\t\
negl %%edx\n\t\
andl %3, %%edx\n\t\
addl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "n"(N) : "edx");
Only 6.272s, way faster here
__asm__("\
movl %3, %0\n\t\
addl %1, %0\n\t\
subl %2, %0\n\t\
sets %%dl\n\t\
negl %%edx\n\t\
andl %3, %%edx\n\t\
addl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "n"(N) : "edx");
8.078s, oh well.
Did I mention spellcheck is throwing a hissy fit?
Obscurans
October 6 2008 6:37 AM EDT
Sheesh, I can't believe I literally typed the a - (-b) out:
__asm__("\
leal -N(%1), %0\n\t\
addl %2, %0\n\t\
sets %%dl\n\t\
negl %%edx\n\t\
andl %3, %%edx\n\t\
addl %%edx, %0"
: "=a"(s) : "b"(a), "c"(b), "n"(N) : "edx");
-N must be actually written in, but now this thing runs in 6.280s :)
?_?
English in the forums please. =P
This thread is closed to new posts.
However, you are welcome to reference it
from a new thread; link this with the html
<a href="/bboard/q-and-a-fetch-msg.tcl?msg_id=002Ypy">Modular addition/subtraction</a>