[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Rep:[f-cpu] A simple SIMD extension for C(++)



On Fri, May 17, 2002 at 07:16:03PM +0200, nico wrote:
> There is an other potentiel improvement if made several operation on
> many array.

As I already said, the number of arrays shall not be limited to 2.
This also means that the mapped function can be arbitrary complex.

> Imagine an mul then an add then accumulation (array added together) it
> could speed up things if the 3 op will be perform one after an other for
> each shunk and not cover 3 times the array (to stay inside the cache !)

You have two choices: put all operations into a single function and
`call' __map__ once, or use individual functions for each operation and
map them one after another. The former, however, allows the compiler to
perform additional optimization.  Since it *knows* that the individual
function calls are independent, it can perform them in any order,
and it's permitted to do inline substitution, interleave instructions
from different invocations (which is equivalent to loop unrolling)
and the like.

Accumulation is slightly more difficult, though. You might be tempted
to try something like

	float y = 0.0;

	float sum(float a) {
		y += a;
	}

	float accumulate(int count, float va[]) {
		__map__(sum, count, va);
		return y;
	}

but that's not correct because `y += a' is not quaranteed to be an
atomic operation, and the calls to `sum' may be performed in parallel.
The solution should look more like

	float fadd(float a, float b) {
		return a + b;
	}

	float accumulate(int count, float va[]) {
		// assume count >= 0
		if (count < MINCOUNT) {
			float y = 0.0;
			for (int i = 0; i < count; i++) {
				y = fadd(y, va[i]);
			}
			return y;
		}
		else {
			int ncount = (count + 1) / 2;
			float vy[ncount];
			__map__(vy, count/2, fadd, va, va + count/2);
			if (count % 2) {
				vy[count/2] = va[count-1];
			}
			return accumulate(ncount, vy);
		}
	}

This version is also more efficient (O(log(n))).

We could define a special `__map2__' primitive that applies a
binary function to a single array in the same way.

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/