[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] Re: FC0 XBAR



ho and welcome in the list,

Juergen Goeritz wrote:
> On Wed, 1 Aug 2001, Yann Guidon wrote:
> > could it be possible to switch this discussion on the main
> > english f-cpu list ?
> Sure!
cool :-)
other people will be able to participate and help.

> > Juergen Goeritz wrote:
> > > On Wed, 1 Aug 2001, Yann Guidon wrote:
> > >
> > > Hi !
> > > > > Don't you do a strictly synchronous design of all the control
> > > > > signals?
> > > >
> > > > it's fully synchronous, but i have no safe evidence that the
> > > > critical datapath is short enough.
> > >
> > > Evidence will only come together with a real synthesis and
> > > the layout. It depends where you locate the blocks on the
> > > chip. The new technologies like 0.18 may introduce additional
> > > delays on signals when the way between source and drain gets
> > > longer.
> >
> > i am aware of this. I have no idea of how the synthesiser will
> > modify the datapath, it could end up worse than what is written.
> > i know that there are some nasty surprises awaiting us.
> 
> You will get different surprises with every technology.
> FPGA for example require completely different synthesis
> compared to the ASIC technologies. Thus you may get
> problems where you don't expect them to arise. If you
> look at LEON there is a fast adress generator added for
> FPGAs to do it in a single cycle still.

my worry is that compilers sometimes want to play the "smart ass"
and optimize things that should be left.

> > > > > > However, i don't know if it is possible to read and write the
> > > > > > same register in a single clock cycle.
> > > > > That should work, because LEON has a similar strategy for read
> > > > > and write of the registers, one read port and one write port.
> > > > ?
> > >
> > > The SPARC architecture describes a register set that provides
> > > read and write to the register bank in a single cycle since
> > > its a pipelined architecture capable to execute an instruction
> > > each cycle.
> >
> > now the problem is when the written register is the same as the read register.
> > gut feeling tells me that the signal couldn't propagate fast enough.
> > some kind of bypass could become necessary.
> 
> Yes, but write usually is at least one cycle delayed, isn't it?
it depends, but it doesn't solve the problem : the delay only "moves"
the problem from one cycle to another...

> So the compiler could take care of not using the same register
> as source in the next instruction(s) that must be written to
> first. But this implies that you know exactly how the pipeline
> is constructed and how it works inside the compiler.
this is not considered : the compiler and the binaries should
be independent from the microarchitecture (at least, have a minimal
compatibility). introducing such a constraint on the compiler is
not desirable and not easy either...

we CAN detect when the bank is accessed both for read and write.
we can even delay the instruction that does that (but it's not desirable).
however, on some cases it might be that the hardware doesn't need
such a measure. it depends too much on the silicon characteristics...

> > i'll analyse this issue when C will be transformed into VHDL.
> Do you simulate the register accesses as well?
i start from the idea that the hardware CAN read and write the same data.
it is easier to handle because there is no special case.
i could then add a 1-cycle write latency feature, in the future,
but it makes the instruction decode/issue more complex...

> > > > > > i think that the register
> > > > > > bank will be "generated" with some design tools, but there might
> > > > > > be a problem here.
> > > > > Usually yes. Some macro of the library of the target chip.
> > > > can you explain ?
> > >
> > > You take some macros from the vendor libraries that are
> > > optimized for speed and area. You do not synthesize
> > > regular structures because it will waste a lot of space.
> > > You could also add some (S)(D)RAM structures onto the chip
> > > (but dynamic RAM not with all technologies). Some Ethernet
> > > switch devices have up to several megabyte directly onchip
> > > as packet buffers. LEON is an example on how you do it.
> >
> > usually, this kind of "hard macro" is specified and characterised.
> > but this changes from vendor to vendor, so the question is still
> > annoying.
> 
> The question?
the question whether the macro will allow simultaneous read and write
to the same register(s).

> If you look at the plot from an ASIC the area
> with synthesized structures are not as much filled as the
> hard macro areas.
that's almost obvious.

> A lot of overhead is introduced. That's
> why every vendor tries to provide a library of preoptimized
> blocks/modules that you can use. It's the same in software,
> when you rely on libc for example it may be different on
> other operating systems also providing a libc but it will
> not so easy to switch to an operating system that uses DLLs.
> ;-)
no comment :-)

> JG
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/