[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] GCC and jmpz vs. jmpl

To: f-cpu@seul.org
Subject: Re: [f-cpu] GCC and jmpz vs. jmpl
From: Michael Riepe <michael@stud.uni-hannover.de>
Date: Tue, 7 Jan 2003 23:47:05 +0100
Delivered-to: archiver@seul.org
Delivered-to: f-cpu-outgoing@seul.org
Delivered-to: f-cpu@seul.org
Delivery-date: Tue, 07 Jan 2003 17:52:21 -0500
In-reply-to: <Pine.LNX.4.33.0301071323500.520-100000@devix>; from devik on Tue, Jan 07, 2003 at 01:57:04PM +0100
References: <Pine.LNX.4.33.0301071323500.520-100000@devix>
Reply-to: f-cpu@seul.org
Sender: owner-f-cpu@seul.org

On Tue, Jan 07, 2003 at 01:57:04PM +0100, devik wrote:
> In light of my prev. mail about zero_extending one other
> thing to discuss.
> When we do compares < > >= <= we got results as 1 or -1.

No, we get -1 for true and 0 for false, both truncated to the chunk size
(unless we change the ISA, but I see no reason to do so).

> It is very nice as gcc can then often eliminate jump.
> I set 1/-1 as scc value and it now knows that:
> if (a>b) b++; else b--;
> can be compiled as:
> cmpg r1,r2,r3
> add r2,r3,r2

Doesn't work because 0 means false. What you can do is something like
this:

	// if (a < b) a--;
	cmpg a, b, temp
	add temp, a, a		// a -= (a < b)

	// if (a < b) a++;
	cmpg a, b, temp
	sub temp, a, a		// a += (a < b)

> and "return a>b" becomes
> cmpg r1,r2,r3
> neg r3,r1.

Since `neg' is rather slow (two cycles), it's probably better to use
`andi $1, r3, r1' to isolate the LSB.

> For different modes we can add truncation like
> for "return (long)a > (long)b":
> cmpg.64 r1,r2,r3
> neg.32 r3,r1
> 
> or extension for "return (char)a > (char)b":
> cmpg.8 r1,r2,r3
> neg.8 r3,r1
> widen.8 r1,r1 // what does this wide to ?? to 64bits ??

Full register size. But it isn't necessary here because r1 will be 0 or 1,
and will already be zero-extended by the `neg' (or `andi') operation.

> For these cases we could learn gcc's combiner that
> if (a > b)
> can use jmpl - it is because we know that nonequality
> operator stores result in bit 0 regardless of operands
> sizes and it is possibly faster than jmpz for FCPUs
> with wider data types where zero flag computation
> can took long time - also it relieves us from problems
> with zero extending all results.

That will work fine if the compare and jmpl instructions are paired.
If the condition comes from somewhere else (e.g. as a function parameter),
you'll have to compare with zero explicitly (usually, a zero extend
operation will be sufficient).

> On other side there is a big problem with == and !=.

Yep, I know...

> Just now I use xor and the jmpz/nz. If I want to use scc
> I need to emit "cmple 0" to convert it to 1/-1 notation.
> If we would like to use jmpl it is the same problem.
> So that I'd like to ask, is it big problem to perform
> "cmpe" in increment unit too ? I know it is done by
> xor.and. but it is still not sure whether is will be there,
> and if it will support more than 8 bits and even if so
> it will take 2 cycles.

One solution would be a `chunk-size' logical operation that zero-extends
the result. If we really had `xor.b', you could just write

	// beq r1, r2, r4
	xor.b r1, r2, r3
	jmpz r3, r4

	// bne r1, r2, r4
	xor.b r1, r2, r3
	jmpnz r3, r4

because the high part of r3 would be guaranteed to be zero.

But if you want to compile something like `return (int)a == (int)b',
you must use

	// return a == b
	xnor.and.q r1, r2, r3	// -1 if they're equal
	andi $1, r3, r1

	// return a != b
	xor.or.q r1, r2, r3		// -1 if they're different
	andi $1, r3, r1

or, if `xor.or.q' is not available,

	// return a == b
	xor r1, r2, r3
	cmple.q r0, r3, r3
	andi $1, r3, r1

	// return a != b
	xor r1, r2, r3
	cmpg.q r0, r3, r3
	andi $1, r3, r1

but that will stall for three cycles (one for xor, two for cmpg).

> It could save cycles if zero flag is slow (if it is
> possible at all !).
> 
> devik
> 
> PS: Will stall occur here (due zero flag computation) ? :
> cmplei 2,r1,r2
> nop
> movez r2,r0,r1

The stall will occur because cmplei takes two cycles. Zero flag
computation may take another cycle. If you use `movel' instead, you
should be on the safe side.

-- 
 Michael "Tired" Riepe <Michael.Riepe@stud.uni-hannover.de>
 "All I wanna do is have a little fun before I die"
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

Follow-Ups:
- Re: [f-cpu] GCC and jmpz vs. jmpl
  - From: Yann Guidon <whygee@f-cpu.org>
- Re: [f-cpu] GCC and jmpz vs. jmpl
  - From: devik <devik@cdi.cz>

References:
- [f-cpu] GCC and jmpz vs. jmpl
  - From: devik <devik@cdi.cz>

Prev by Date: Re: [f-cpu] statistics of direct indexing usage
Next by Date: Re: [f-cpu] latest gcc & immediate addressing [Was: BOUNCE f-cpu@seul.org:...] (fwd)
Previous by thread: [f-cpu] GCC and jmpz vs. jmpl
Next by thread: Re: [f-cpu] GCC and jmpz vs. jmpl
Index(es):
- Date
- Thread