[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [f-cpu] EU precision, use of Q1,Q2,Q4, number inside the core

To: f-cpu@seul.org
Subject: Re: [f-cpu] EU precision, use of Q1,Q2,Q4, number inside the core
From: Yann Guidon <whygee@f-cpu.org>
Date: Mon, 24 Dec 2001 20:48:12 +0100
Delivered-To: archiver@seul.org
Delivered-To: f-cpu-outgoing@seul.org
Delivered-To: f-cpu@seul.org
Delivery-Date: Mon, 24 Dec 2001 14:44:53 -0500
Organization: http://www.f-cpu.org
References: <3C23F93C.4F316937@f-cpu.org> <20011222150530.23102@thrai.stud.uni-hannover.de> <3C26755E.A1ED6C1C@f-cpu.org> <3C2786C7.F92702@ifrance.com>
Reply-To: f-cpu@seul.org
Sender: owner-f-cpu@seul.org

hi !

nicO wrote:
> 
> An idea come to me few days ago. I hope it's not too late. I have
> followed a little the ffmpeg development : it's video compressor suite.
> 
> The 40% of the time is taken inside a IDCT routine in mmx ASM. They were
> a flame war because the current IDCT have rounding problem and the image
> goes darker. An other routine as been written but was slower.

can't they use dithering to offset the rounding loss ?

> My idea is to do as DSP do : create register with more bit than
> expected. For example, we could use 1 bit more for 8 bit operation, so 2
> for 16 bit (signal processing), 4 in 32 bits. Its became Q1, Q2 and Q4
> numbers. During write to memory there are dismiss.
> 
> Comments ?

yep :-)

DCT requires some minimal amount of precision, sure.
the little problem is that adding one bit or two or even four
might not be enough. PLus, the necessary precision depends on the
number of operations (the depth of the dataflow).

"the way i would do it" is :
 - use larger registers. good news : the f-cpu instruction set is much
handy and useful than MMX. the little loss in performance will be
balanced by the ease of coding. Imagine : not 8 but 64 registers ! :-)
you can nearly fit a whole 8*8 DCT inside the registers. the time
that MMX spends laoding/storing data can be compared to the loss
of performance when larger data chunks are used (32 bits instead of 16 bits).
 - use one of the alternate data representations : either LNS or fractional.
LNS is not yet easy to implement but fractional data support is reduced to
a MUX that shifts the results by one bit (or two depending on the operation).
Usually, the shift is performed by the operator but that could be performed
by the Xbar (?) on FC0.

We still have one reserved bit in the opcodes, so implementing fractional
(1Qx) data format is not a problem, when looking at the whole F-CPU ISA.
In fact, i have reserved this bit from the beginning, like a "souvenir"
from the days i hacked DSPs. i knew it would be useful :-)

However, adding "hidden bits" is a big no-no for the simple reason that
it requires special handling and it breaks the symmetry of the ISA.
Using large registers and fractional format is enough IMHO.

Unless there's a detail i missed...
But this summer, i worked on FFT/DCT so i should not be completely wrong.
Just in case of doubt, nicO can read the CD i gave him, it contains a lot
of implementation notes of FFT, DCT, MMX, ....

merry Xmas eve !

> nicO
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/

References:
- [f-cpu] correction : no delay required for the multiplier
  - From: Yann Guidon <whygee@f-cpu.org>
- Re: [f-cpu] correction : no delay required for the multiplier
  - From: Michael Riepe <michael@stud.uni-hannover.de>
- Re: [f-cpu] correction : no delay required for the multiplier
  - From: Yann Guidon <whygee@f-cpu.org>
- [f-cpu] EU precision, use of Q1,Q2,Q4, number inside the core
  - From: nicO <nicolas.boulay@ifrance.com>

Prev by Date: [f-cpu] EU precision, use of Q1,Q2,Q4, number inside the core
Next by Date: [f-cpu] CCC-CD
Prev by thread: [f-cpu] EU precision, use of Q1,Q2,Q4, number inside the core
Next by thread: [f-cpu] 3r2w -> 2r1w
Index(es):
- Date
- Thread