[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[f-cpu] DCT in VHDL
- To: <f-cpu@seul.org>
- Subject: [f-cpu] DCT in VHDL
- From: devik <devik@cdi.cz>
- Date: Thu, 10 Apr 2003 15:41:26 +0200 (CEST)
- Delivered-to: archiver@seul.org
- Delivered-to: f-cpu-outgoing@seul.org
- Delivered-to: f-cpu@seul.org
- Delivery-date: Thu, 10 Apr 2003 09:51:10 -0400
- Reply-to: f-cpu@seul.org
- Sender: owner-f-cpu@seul.org
Hi,
I wanted to create videocompresor in fpga, to have
4 cameras connected to cheap box which would stream
it to Ethernet. Because one videostream is about
60Mbit/s I wanted to do cheap compresion.
I know there is DCT (and other parts of mpeg) at
opencodes.org but it is too large.
I tried to compute 6 point DCT matrix - it seems
nice, you need only constants 1/2, 1/(2 sqrt(2)),
1/sqrt(3) and 1/sqrt(6).
So I did exhaustive search to find good coeficient
scale. I found that if you use scale 14037 then
these constants are 1b6a, 1fa8, 1362 and 1662.
If you look at them closely you find that they
need only 15 adds to implement multiply for all
of these constants (I was searching for these constants
whose have a few bits set and have large overlap in them).
Thus I was able to synthetize whole 1D 6 point pipelined
DCT into 342 Spartan slices and tracer estimates 60MHz for it.
What I need next is second pass to form 2D transform.
It will need 21 bit inputs instead of 8 bits thus it
will be much more complex :-\
Just I'm interested if someone has some ideas about design
(attached). It is my first VHDL I wrote ;).
-------------------------------
Martin Devera aka devik
Linux kernel QoS/HTB maintainer
http://luxik.cdi.cz/~devik/
-- devik@cdi.cz; 6 point 1D DCT
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.STD_LOGIC_ARITH.ALL;
use IEEE.STD_LOGIC_UNSIGNED.ALL;
entity test is
Port ( XOUT : out std_logic_vector(20 downto 0);
XIN : in std_logic_vector(7 downto 0);
clk : in std_logic
);
end test;
architecture Behavioral of test is
type matrix is array (natural range <>)
of std_logic_vector(20 downto 0);
function xsrl (A : in std_logic_vector) return std_logic_vector is
constant w : natural := A'length;
begin
return '0' & A(w downto 1);
end xsrl;
signal tmp : std_logic;
signal b : std_logic_vector(7 downto 0);
-- actual DCT point (0..5)
signal cnt : std_logic_vector(2 downto 0);
-- output of multiplier stage; one result vect per clock
signal c0 : matrix(3 downto 0);
-- output sums; valid at end of cycle where cnt=5
signal s0 : std_logic_vector(20 downto 0);
signal s1 : std_logic_vector(20 downto 0);
signal s2 : std_logic_vector(20 downto 0);
signal s3 : std_logic_vector(20 downto 0);
signal s4 : std_logic_vector(20 downto 0);
signal s5 : std_logic_vector(20 downto 0);
begin
stage_1 : process (c0,s0,s1,s2,s3,s4,s5,b,XIN,clk)
variable t,q,e1,e2,e3,e4,e5 : std_logic_vector(20 downto 0);
variable n1,n2,n5 : boolean;
begin
if rising_edge(clk) then
-- feed next image point
b <= XIN;
-- multiplier stage; these carefuly crafted constants
-- results in small and simple multipliers whose can
-- share many adders
c0(0) <= b*conv_std_logic_vector(7018,13);
c0(1) <= b*conv_std_logic_vector(8104,13);
c0(2) <= b*conv_std_logic_vector(5730,13);
c0(3) <= b*conv_std_logic_vector(4962,13);
-- c0 (average)
s0 <= s0 + c0(3);
-- prepare (sqrt(3)+-1)/2sqrt(6) constants
t := c0(2)+xsrl(c0(3));
q := c0(2)-xsrl(c0(3));
-- do MACs
case cnt is
when O"1"|O"2"|O"5" => s3 <= s3 - c0(3);
when others => s3 <= s3 + c0(3);
end case;
case cnt is
when O"0"|O"5" => e2 := c0(0); n2 := false;
when O"2"|O"3" => e2 := c0(0); n2 := true;
when others => e2 := (others=>'0');
end case;
-- trick needed to force bidir Acumulator inference in XST
if n2 then
s2 <= s2 - e2;
else
s2 <= s2 + e2;
end if;
case cnt is
when O"1"|O"4" => s4 <= s4 - c0(1);
when others => s4 <= s4 + xsrl(c0(1));
end case;
n1 := false;
n5 := false;
case cnt is
when O"0" => e1 := t; e5 := q;
when O"1" => e1 := c0(3); e5 := c0(3); n5 := true;
when O"2" => e1 := q; e5 := t;
when O"3" => e1 := q; e5 := t; n1 := true; n5 := true;
when O"4" => e1 := c0(3); e5 := c0(3); n1 := true;
when O"5" => e1 := t; e5 := q; n1 := true; n5 := true;
when others => e1 := (others=>'X'); e5 := (others=>'X');
end case;
if n1 then
s1 <= s1 - e1;
else
s1 <= s1 + e1;
end if;
if n5 then
s5 <= s5 - e5;
else
s5 <= s5 + e5;
end if;
case cnt is
when O"5" => cnt <= "000";
when others => cnt <= cnt + 1;
end case;
--s2 <= s2 + e2;
--s3 <= s3 + e3;
-- fictive output to force XST to keep the design
XOUT <= s0 xor s1 xor s2 xor s3 xor s4 xor s5;
end if;
end process;
end Behavioral;