[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Re: [f-cpu] reg. rotation [Was: New suggestion about call convention]
- To: email@example.com
- Subject: Re: Re: [f-cpu] reg. rotation [Was: New suggestion about call convention]
- From: firstname.lastname@example.org
- Date: Tue, 26 Nov 2002 12:56:09 CET
- Delivered-To: email@example.com
- Delivered-To: firstname.lastname@example.org
- Delivered-To: email@example.com
- Delivery-Date: Tue, 26 Nov 2002 06:56:10 -0500
- Reply-To: firstname.lastname@example.org
- Sender: email@example.com
>> --- HADWARE NEEDED:
>> Add register rotation.
>> I'll try to explain my idea even if I know
>> that rotation/renaming will probably eat next
>> pipeline stage before decode.
it's not probable, it's certain.
Currently, the register set is probably the slowest
part of the core. If you add any more logic to this,
you have to slow down the general clock.
>> Maybe performance saving of it will be greater than looses
>> from additional 1 cycle latency of jumps.
jump latency can be harmful.
Currently there is 1 cycle on taken branch
and this is too few to justify the implementatition
of a branch predictor. But two cycles of latency
will double this penalty.
>> And maybe someone clever will find way how to implement the
>> idea without next stage.
AFAIK, this is not technically possible.
>> Suppose that r32...r63 (for now) can be rotated with granularity
>> 2 regs (because of register pairing) by adding 5 bit constant ROFF
>> somewhere in fetch, decode or new stage.
>> We could manipulate the constant ROFF by instruction
>> circ n; n is even int between 0...30 and instruction performs
>> ROFF=ROFF+n in unsigned unsatureated (wrapped) arithmetic
>> Note that 0th bit of ROFF is always 0 so that adder is 4bit in
>> reality. I think (however I'm not HW expert) that HW needed is trivial
>> and without impact on other parts of f-cpu.
there IS an impact. it's half of a pipeline stage !
>> It is like instruction
>> stream register renaming before it hits f-cpu as we know is currently.
>It look like a complex instruction that look
> having a lot of effect on the CPU I think.
it hurst because it adds a new "hidden state"
and it might even have consequences on the scheduler
because it is running in parallel with the register
set reading cycle.
> But I currently don't see where you can expect
> better performance.
>PS: A macro that do a call will certainly look like this.
> .macro call name
> loadaddri name, t0
> store [sp], ra
> jmp [t0], ra
nice but how do you manage the stack
(increment/decrement the pointer) ?...
To unsubscribe, send an e-mail to firstname.lastname@example.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/