[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[f-cpu] LSU immediate offset idea



Hi,

regarding last discussion with Yann, I was thinking how to solve problem that we need immediate offsets in load/store - they are simply too common. Here is the solution.
Let's assume that (the smallest) page size is 8kB. The load/store insns will allow for +-4kB offset from base register. Then depending on bit 12 in base register we can "overflow" with offset either into preceding or following page but not both.
During base register fill (by special addp for example) we will init two shadow registers (in background - no stall) via TLB lookup. The one pointer by resulting pointer and other for page on the closer side. Also we would fill small registers with protection info for both pages - whether they can be read and/or written (execute is irelevant for LSU).


Now it is common that pointer will point to page which has valid pages on both sides (except for stack but see later). Let's optimize for common case.
When we fetch insn then during decode stage (where we also read RF in FC0) we will look for protection bits for given source register (when LSU insn is being dispatched) - when they are both ok (write-write for store r/w-r/w for read) then we do nothing special here because we know that for all possible displacement it will be ok.
If both are bad (page/protection fault) then we can trap just now.


If one is on and on is bad then we will stall this (decode & fetch) stage and wait one cycle until we sum displacement with low 13 bits from base and check carry - carry then selects shadow protection bit set used to pass or trap.
The point is that this stall will not occur often. For stack where kernel grows it in pg fault typicaly I'd add SR bit to control traping event when at least one shadow indicates "bad" (setting is for selected register(s) user as SP/BP). Thus we will keep one half pg preallocated on average.


This has other interesting properties too. TLB can be slower because we will "cache" it in shadows. So that it will better scale for SMT design.

I need to solve coherency issue bu I already have an idea :)

Yann, can I stall only fetch/decode in FC0 like I described ?

Martin
*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/