[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[f-cpu] LSU immediate offset idea
- To: f-cpu@xxxxxxxx
- Subject: [f-cpu] LSU immediate offset idea
- From: Martin Devera <devik@xxxxxx>
- Date: Sat, 25 Feb 2006 12:06:16 +0100
- Delivered-to: archiver@seul.org
- Delivered-to: f-cpu-outgoing@seul.org
- Delivered-to: f-cpu@seul.org
- Delivery-date: Sat, 25 Feb 2006 06:07:11 -0500
- Reply-to: f-cpu@xxxxxxxx
- Sender: owner-f-cpu@xxxxxxxx
- User-agent: Thunderbird 1.5 (X11/20051201)
Hi,
regarding last discussion with Yann, I was thinking how to solve problem
that we need immediate offsets in load/store - they are simply too
common. Here is the solution.
Let's assume that (the smallest) page size is 8kB. The load/store insns
will allow for +-4kB offset from base register. Then depending on bit 12
in base register we can "overflow" with offset either into preceding or
following page but not both.
During base register fill (by special addp for example) we will init two
shadow registers (in background - no stall) via TLB lookup. The one
pointer by resulting pointer and other for page on the closer side. Also
we would fill small registers with protection info for both pages -
whether they can be read and/or written (execute is irelevant for LSU).
Now it is common that pointer will point to page which has valid pages
on both sides (except for stack but see later). Let's optimize for
common case.
When we fetch insn then during decode stage (where we also read RF in
FC0) we will look for protection bits for given source register (when
LSU insn is being dispatched) - when they are both ok (write-write for
store r/w-r/w for read) then we do nothing special here because we know
that for all possible displacement it will be ok.
If both are bad (page/protection fault) then we can trap just now.
If one is on and on is bad then we will stall this (decode & fetch)
stage and wait one cycle until we sum displacement with low 13 bits from
base and check carry - carry then selects shadow protection bit set used
to pass or trap.
The point is that this stall will not occur often. For stack where
kernel grows it in pg fault typicaly I'd add SR bit to control traping
event when at least one shadow indicates "bad" (setting is for selected
register(s) user as SP/BP). Thus we will keep one half pg preallocated
on average.
This has other interesting properties too. TLB can be slower because we
will "cache" it in shadows. So that it will better scale for SMT design.
I need to solve coherency issue bu I already have an idea :)
Yann, can I stall only fetch/decode in FC0 like I described ?
Martin
*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/