[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[f-cpu] LSM2002 : Proceedings for the F-CPU topic



hello,

i am back from Bordeaux. Here are some notes about this
meeting, other material will be provided later for the proceedings'
CDROM.

Despite some technical and social problem (seems i attracted them all),
there are some interesting points :

 - the schedule collision between the F-CPU conference and this of RMS
   was an issue for a number of people. What started as a pure laziness
   and coincidence has reached some unexpected (trolling) proportions.
   The fact that they had to make an exclusive choice creates an
   artificial conflict, at least in their minds.
   I promise i won't give trolls a chance, if there is a next time.
   The F-CPU presentation conference went well and some interested
   people showed up, despite the relatively few people in the room.

 - One reason i chose the 11th was because some non-parisian people
   wanted to come that day from the region of Toulouse. So we better
   had to choose one single day so as many F-CPU team members
   could be in a single point at the same time and see each others IRL
   for the first time. It was a pleasure to meet for the first time people
   with which i communicated for two (?) years. This trip to Bordeaux was
   worth the efforts for this at least.

 - I am particularly disapointed that only the F-CPU project was represented.
   Though Peter brought his NEC/MIPS boards, the issue is different.
   I explained this issue a bit in the conference's introduction but yet,
   the fragmentation of the projects is damageable to the whole community.
   I have no idea how this could change. I have to code, too, social
   engineering is not my full-time job. If someone can convince the FreeHDL
   and others to participate in common meetings, that would be cool, no ?

 - On top of that, it was a good occasion to see how the F-CPU project
   is perceived by others. The simultaneous RMS/F-CPU/Prelude conferences
   were a good occasion to see who wants what. I also met Abdoulaye and
   a lot of other enthusiastic people which did not troll on the difficulty
   of the project. Despite design/coding problems, there are also good contacts
   with the OS teams (mainly Hurd but also *BSD) as you can see 
   http://perso.linuxfr.org/penso/photos/lsm2002/kif_1901.jpg
   and http://plouc.net/lsm/kilobug/IMG_0616.JPG for example.

 - I will keep myself away from APRIL and other such organisations for a while.
   Their unacceptable behaviour during the evening of july 13th upsets me.
   I'll also stay away from geek social events, as i now know they are potentially
   dangerous. Furthermore, i don't see any interest for their work, only theory
   and no result. I prefer to code, that's the most efficient and tangible way to
   change things. Finally, i'll think twice or more before accepting to make
   a presentation in an unknown place. It's nice, as my budget is rather
   reduced...

Read you soon,
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
file : yg_lsm.txt
written in july 2002 by Yann Guidon (whygee@f-cpu)
version Sat Jul 13 06:06:48 CEST 2002

This file contains the original sliding text for the
Freedom CPU conference, held in july, thu. 11th 2002 in ENSEIRB
in the campus of the university of Bordeaux for the Libre Software Meeting.

Notes :
 * some personal "papers" can be found on http://f-cpu.seul.org/whygee.
   For example, an architectural presentation of the pipeline
   scheduling techniques (in french) and a step-by-step DCT optimization.
 * The most recent design files can be found at http://f-cpu.seul.org/new
   The most important files are the VHDL source snapshots and the
   latest manual's updates.

This is the text commented and explained by Yann Guidon.
Before him spoke Nicolas Boulay who explained the audience
some basics about general CPU design and industry.
After YG came Cédric Bail who spoke about some SW aspects.

But before starting to comment the slides, there was a little
speech about more general, non-technical things. Here is
a little recollection from memory.

-----------------------FOREWORDS-----------------------

First, i have often heard people say "Free CPU" when
speaking about F-CPU. The "F" means "Freedom", as it is
only about that. F-CPU was first called "The Freedom
project", then "Freedom CPU project" and "F-CPU" for short.
It will probably sound like RMS (and he is currently doing
his presentation at the same time in the Grand Amphithéatre)
but please say "Freedom CPU", not "Free CPU". It is less
confusing to the audience, as others might understand it
differently than you (even if you mean it strongly).

Second point i want to make clear is : participating actively
to this project is a proof of deep dispair. One has to be
really depressed if he wants to participate in such
a crusade and reinvent the wheel everyday (even if this one
grants you more freedom). It is only because i see no other durable,
ethical and technical solution that i joined and contributed
to this project which was first a pure acid-induced utopia.
DMCA, EUCD and all other copyright or security enforcement
laws, as well as the "murder" of the Alpha CPU, slowly made
this project more important than it first seemed.
Knowing how this project is difficult, you can understand
that i am particularly hopeless. Emphasizing on this difficulty
will not solve the F-CPU's problems, so please make
constructive remarks. All contributors are perfectly
aware of the problems but they try to solve the little ones
that are easiest first.

Third, F-CPU is not the only project that provides its
design files on the Internet with a "Free" license.
Several meaningful such projects were invited but none
could come, we are sorry for this. This shows that the
"Open source hardware" field is very fragmented and
does not work the same way as in software.

Fourth point is that "Open hardware" is not the same
as "Free hardware" : the first will publish some code
on the Internet and use whatever "free" license exists,
even the GNU GPL though it is difficult to be sure
about its meaning in the HW/EDA world, where tools are
extremely expensive and restrictive (not to mention
proprietary). F-CPU aims at Freedom and understands
(as most contributors can't access the expensive tools)
that copylefted sources are not enough.

A proprietary-free workflow does not exist yet because
the question was never serious enough to justify efforts
in this domain. As noted in 3, the fragmentation and
the relative overlapping of the projects that can be
found on http://www.opencollector.org leads to inefficient
workflows, non-standard formats or simply non-functioning
sources. In a world where proprietary software rule
the world for more than a decade, it is particularly
frustrating.

As the question was never raised before (or at least,
loudly enough), there is no difference or straight
line between "Open hardware" and "Free hardware". Most projects
have roughly the same wish for freedom but each one
takes this more or less seriously. Fashions or buzzwords
also influence people's behaviour, we have seen a blossoming
number of projects with "IP", "Open", "Free", Core", etc.
and all their combinations in the last 5 years
(for example : "OpenIP", "OpenCores", "OpenIPCores"...).

One reason why F-CPU is not advanced yet is because the
toolchains are not able to do serious work that can
match proprietary tools. One can say that Linux existed
thanks to the existence of the GNU tools and larger HDDs.
However, there is no decent non-proprietary workflow. We
don't ask for extensive feature support, but at least for
a tool that works decently and is compatible with commercial
solutions. The current solution is to use proprietary
software, in hope that a serious free solution will become
available soon.

In order to keep the F-CPU project free from commercial
and industrial pressure, the strategy is to perform
intensive tests with as many tools we can legally
get, either through trial periods or partnerships.
Cadence's ncsim is one of the tools that are usually
extremely expensive but one Cadence employee believes
in "Free hardware" and provides personal licenses
to the contributors of F-CPU or Opencores. Other tools
are not as restrictive and don't require the user to
apply or fill in any form. There are several toolsuites
that an individual user can run on a personal workstation
with more or less freedom, ease and performance
(F-CPU contributors also maintain a "VHDL Howto"
where each known tool is described and installation
and use are explained).

In case a tool breaks or a partnership ends, compatibility
tests with the other tools will ensure that the source
will remain free from any ties from one vendor.
Similarly, if the sources compile nicely through all the
available tools, there are few chances that a major problem
appears when yet another tool is used. This makes the port
to unsupported tools much easier ! For example, a lot
of problems appeared when going from single tool support
to dual, and much fewer problems were found when a third
tool was added. However, every tools has a specific
understanding of the VHDL Reference Langage Manual
and error-free compilation is not garanteed on the first try.

Contrarily to the Linux kernel and most GNU projects,
the freedom of the F-CPU sources is not only ensured
by the Copyleft, but also by the respect of the minimal
langage standards, so everyone can REALLY use it.
This "least common feature" approach slows the project's
progress but makes it more attractive, serious...
It's like an OS kernel that is designed to be compiled
with Intel's, Alpha's and IBM's compilers : performance
and portability would be terrific.

Why spend so much personal efforts in F-CPU ?
 - I could have been hired by a high-tech company and
do wonderful things... However everyone knows that this
position does not allow one to follow the project as
wanted. It is frustrating to see a project stop because
corporate priorities change. There is often a lot
of work waste or wrong decisions that an individual
can not change. Profit of the shareholders is not always
compatible with working ethic, or at least with personal
beliefs.
 - I could make a start-up. Well, i'm an artist, not
a businessman. Forget about that.
 - I could play the old role of the lonely inventor in
his house... But patent laws would crush me.

The solution is to abandon the idea of individual profit
and work with other people who believe in the same things,
so the project really remains free. From this point on,
it becomes possible to find a win-win deal in our society
which is eager to spend money in order to spend less money...

Finally, the F-CPU project has recently gained some interest
from the EDA (Electronics Design Automation) community and
its users : not only does the "Linux" wave become successful
(it's not only a bit cheaper, it's also much easier to port
old software from the SUN and HP platforms than to Windows),
but the last economic recession has made everybody wonder
where they are heading...


-----------------------SLIDES-----------------------


Title :
 The Freedom CPU specification
 and its implementation

      (logo)

   "Design and let Design"

 Yann Guidon (whygee@f-cpu.org) @LSM2002


F-CPU history :
  - the origin in 1998 : a sloshdated site by
      Andrew D. Balsa, Richard Gooch, Raphael Reilova... (linux hackers)
  - M2M : memory-mapped registers -> never caught
  - AlphaRISC's TTA -> too many problems
  - RISC -> looks classical and should work


F-CPU goals:

   The Freedom CPU Project Constitution:
          voted in early 1999

  "
  To develop and make freely available an architecture, and all other intellectual
  property necessary to fabricate one or more implementations of that architecture, with the
  following priorities, in decreasing order of importance:

       1. Versatility and usefulness in as wide a range of applications as possible
       2. Performance, emphasizing user-level parallelism and derived through intelligent
             architecture rather than advanced silicon process
       3. Architecture lifespan and forward compatibility
       4. Cost, including monetary and thermal considerations
  "

      Important note : "To develop (... IP...) necessary to fabricate"
      does NOT mean "To fabricate". It just means "To design". Fabrication
      is not our goal.


Some constraints :
  - must be easily understood and manageable by a small team
  - no tools, no methodology, no "Free Hardware" model to copy
      --> we are reinventing the wheel all the time...
  - no structure/organisation
       (only a mailing list and a lot of abandonned websites)
  - "yet another open IP core" but done by software guyz....


Freedom as utopia
  * when nothing exists, we are free to imagine whatever comes to mind
  * when it starts to work, the real problems start and nobody
      wants to solve them, prefering the comfy dreams...


F-CPU overview :
  - A new architecture that builds upon decades of
      computer families --> F-CPU learns from
      the errors of the past and wipes the compatibility
      problems away
  - inspired by DEC's Alpha and all the frustrations accumulated
     with other architectures : it shouldn't su>< too much.
  - looks like a MIPS at first sight (from very far)
       --> should not frighten the average CS student
           that was nursed with P&H's book ("bible")
  - The "Execution Pipeline" is inspired by the CDC6600's,
      but the memory interface is ground-breaking new.


Instructions :

  - unified register set with 63 SIMD registers
  - fixed-size 32-bit instruction word with 3-address operations

   (image : ISA2.gif)

  - extensions with 3r1w (store+postinc) and 2r2w (load+postinc)
    and some nice other things.


Resources :

  - Unified register set (64 regs, R0 = 0) (SIMD or scalar data, pointers, FP...)
  - Execution Units
  - Physical memory address space (private + public)
  - Virtual Memory address space (mapped to Physical memory by a TLB)
  - Special Registers
     (to perform whatever can't be scheduled cleanly in the pipeline)

 BUT

  - no stack
  - no architecture-visible "status register"
       -> no known bottleneck (except instruction decoding,
           prefetching, dumb compilers...)


FC0 : the F-CPU Core #0
  - simple static scheduling
      in-order issue, OOOC : "Out Of Order Completion" (not OOOE like PPC or P6 !)
  - VSPS : Very Short Pipeline Stages
      (also known as "Superpipeline" but keeps the issue of the depth open)
  - separation between the "memory interface" (speculative) and
    the "Execution pipeline" (completely deterministic)

  (image : FC0 coupé en 2)


Pipeline latency :

  FC0 is a "Carpaccio CPU"

  FC0 has an average complexity of 6 gates between 2 pipeline D-latches
  --> Complex operations will take more time than simple ones

  ex.:  MOVE, LOADCONS : 0 cycle
        OR, AND        : 1 cycle
        ADD, SUB, SHL  : 2 cycles
        Multiply, MAC  : 2-4-6-8 cycles
        Division       : very long (if ever implemented...)


The FC0 Execution Pipeline
  
  GOLDEN RULE : IF AN INSTRUCTION ENTERS THE PIPELINE,
  IT WILL NOT BE STOPPED

  A short instruction can complete before a longer one
    --> the pipeline can not be kept coherent if
        it must be flushed, or it would be REALLY too complex
        with temporary / renamed registers :-(


FC0 pipeline "flush" :

  When a trap/interrupt/exception occurs, the pipeline "flushes"
  alone by completing ALL accepted (valid) operations.
  A tag (one bit) per register designates which task ("new" or "old")
  owns each register (in order to avoid overwriting registers that
  are not yet saved)

  All the exception conditions must be caught at DECODE TIME !
    --> all the instructions are designed to be trap-safe
        INSIDE the execution units.
    --> TLB miss, div/0, jump... are detected during decode
        so nothing harmful is injected inside the pipeline
    --> some instructions have yet unseen forms !


F-CPU's unusual instructions :

 * GET and PUT stall the instruction decode as long as the
  SR unit is not "ready"
    --> used to perform "unsafe" operations and enforce
     resource protection (a bit like the Pentium's MSRs)
 * Load and Store only use post-increment addressing mode
    --> pointer update and data access are parallel, keeping
        the pipeline short
    --> the next pointer is computed and checked speculatively
    --> the decode stage then knows whether the pointer is OK
        when the same pointing register is reused

FC0 scheduling :
 * The Fetcher tries to prefetch instructions and hands them to the
    pipeline in order.
 * All instructions start at "decode", where a lot of things
    are done in parallel
    --> gather the resource status (register ready, execution unit ready...)
    --> read the register set
    --> check the memory buffers (with the register number)
 * Xbar Read stage :
    --> sends the data to the units (long wires) or bypass
    --> accepts the instruction or not
          (it's the last place where we can 'stop the pipeline')
    --> records the instruction in the scoreboard
 * OPERATION (where appliable)
    --> move or modify the data
    --> 0 to X cycles
 * Xbar Write stage :
    --> Gets the results from the units and compute the ZERO flag
    --> perfom bypass
 * Register Writeback
    --> there is one cycle before data can be available
        for reading again  --> 2nd level bypass

 images can be found in http://f-cpu.seul.org/whygee/conf_parinux.zip


FC0 throughput :

 NO REGISTER RENAMING
 ==> MINIMAL OVERHEAD

 Minimum : 4 cycles per instruction and 0 cycle of latency
           (one instruction enters the pipeline every cycle)
 Maximum : depends on the latency/throughput of the instruction
   (avoid cache misses and divisions...)

 Most units are pipelined :
    Multiply, shift, additions, load and stores
 ==> high sustained throughput with correctly scheduled code

 1 instruction per cycle peak
 >>1 Arithmetic Operation per cycle with SIMD and very wide registers


FC0 pipeline hazards

  the "Superpipeline myth" (100s of stages and deadly bubbles)
  does not apply to FC0.

  * 1 cycle bubble for a taken jump

  * no "hard flush", no branch prediction (not worth the effort)

  * FC0 can be programmed like a 2 or 3-issue superscalar CPU
    (most instruction take 1 or 2 cycles to complete when no
     register dependency is detected)


Development plan :

  1) Execution Pipeline (All the execution units + Xbar)
       ==> well advanced

  2) Instruction Decoding and Scheduling
       --> when all the Execution Units will work properly

  3) Memory interface, virtual memory

  4) Interrupts, exceptions, protection.

  5) Opcode map optimization and definition

--------------------------------------------------------

This is the end of this small introduction to the F-CPU
and the FC0's general architecture. More informations can
be found on the 'net.


Other planning informations :

  No schedule is determined : we know by experience that
  a schedule exists only to not be respected. It proved
  remarkably well for F-CPU in the last years :o)

  F-CPU v1.0 will be started when everything will be tested
  and the opcode map will be optimally defined. F-CPU will
  not be tagged "1.0" until everything is absolutely OK,
  even though v0.7 or 0.8 will seem to work and will even
  probably be implemented : no software compatibility will
  be ensured before v1.0.

  Fabrication is not yet planned. It may arrive in the far
  future, or a desperate industrial company, a government body
  or a non-governmental organisation may want to help
  this project to cut their own development costs, in which case
  development could occur faster than expected.

  A 64-bit version will be experimented first, then a 256-bit
  version will follow to demonstrate the design's validity
  and scalability.