[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[f-cpu] DCT in VHDL



Hi,

I wanted to create videocompresor in fpga, to have
4 cameras connected to cheap box which would stream
it to Ethernet. Because one videostream is about
60Mbit/s I wanted to do cheap compresion.

I know there is DCT (and other parts of mpeg) at
opencodes.org but it is too large.

I tried to compute 6 point DCT matrix - it seems
nice, you need only constants 1/2, 1/(2 sqrt(2)),
1/sqrt(3) and 1/sqrt(6).
So I did exhaustive search to find good coeficient
scale. I found that if you use scale 14037 then
these constants are 1b6a, 1fa8, 1362 and 1662.
If you look at them closely you find that they
need only 15 adds to implement multiply for all
of these constants (I was searching for these constants
whose have a few bits set and have large overlap in them).

Thus I was able to synthetize whole 1D 6 point pipelined
DCT into 342 Spartan slices and tracer estimates 60MHz for it.

What I need next is second pass to form 2D transform.
It will need 21 bit inputs instead of 8 bits thus it
will be much more complex :-\

Just I'm interested if someone has some ideas about design
(attached). It is my first VHDL I wrote ;).

-------------------------------
    Martin Devera aka devik
Linux kernel QoS/HTB maintainer
  http://luxik.cdi.cz/~devik/
-- devik@cdi.cz; 6 point 1D DCT



library IEEE;

use IEEE.STD_LOGIC_1164.ALL;

use IEEE.STD_LOGIC_ARITH.ALL;

use IEEE.STD_LOGIC_UNSIGNED.ALL;



entity test is

    Port ( 	XOUT : out std_logic_vector(20 downto 0);

		XIN : in std_logic_vector(7 downto 0);

		clk : in std_logic

				);

end test;



architecture Behavioral of test is

	type matrix is array (natural range <>)

	        of std_logic_vector(20 downto 0);



	function xsrl (A : in std_logic_vector) return std_logic_vector is

		constant w : natural := A'length;

	begin

		return '0' & A(w downto 1);

	end xsrl;



	signal tmp : std_logic;

	signal b : std_logic_vector(7 downto 0);



	-- actual DCT point (0..5)

	signal cnt : std_logic_vector(2 downto 0);



	-- output of multiplier stage; one result vect per clock

	signal c0 : matrix(3 downto 0);



	-- output sums; valid at end of cycle where cnt=5

	signal s0 : std_logic_vector(20 downto 0);

	signal s1 : std_logic_vector(20 downto 0);

	signal s2 : std_logic_vector(20 downto 0);

	signal s3 : std_logic_vector(20 downto 0);

	signal s4 : std_logic_vector(20 downto 0);

	signal s5 : std_logic_vector(20 downto 0);

   

begin

	stage_1 : process (c0,s0,s1,s2,s3,s4,s5,b,XIN,clk)

		variable t,q,e1,e2,e3,e4,e5 : std_logic_vector(20 downto 0);

		variable n1,n2,n5 : boolean;

	begin

		if rising_edge(clk) then

			-- feed next image point

			b <= XIN;



			-- multiplier stage; these carefuly crafted constants

			-- results in small and simple multipliers whose can

			-- share many adders

			c0(0) <= b*conv_std_logic_vector(7018,13);

			c0(1) <= b*conv_std_logic_vector(8104,13);

			c0(2) <= b*conv_std_logic_vector(5730,13);

			c0(3) <= b*conv_std_logic_vector(4962,13);



			-- c0 (average)

			s0 <= s0 + c0(3);



			-- prepare (sqrt(3)+-1)/2sqrt(6) constants

			t := c0(2)+xsrl(c0(3));

			q := c0(2)-xsrl(c0(3));



			-- do MACs

			case cnt is

			    when O"1"|O"2"|O"5"  => s3 <= s3 - c0(3);

			    when others =>          s3 <= s3 + c0(3);

			end case;

			case cnt is

			    when O"0"|O"5"  => e2 := c0(0); n2 := false;

			    when O"2"|O"3"  => e2 := c0(0); n2 := true;

			    when others     => e2 := (others=>'0');

			end case;

			-- trick needed to force bidir Acumulator inference in XST

			if n2 then

				s2 <= s2 - e2;

			else

				s2 <= s2 + e2;

			end if;

			case cnt is

			    when O"1"|O"4"  => s4 <= s4 - c0(1);

			    when others => s4 <= s4 + xsrl(c0(1));

			end case;

			n1 := false;

			n5 := false;

			case cnt is

			    when O"0" =>   e1 := t; e5 := q;

			    when O"1" =>   e1 := c0(3); e5 := c0(3); n5 := true;

			    when O"2" =>   e1 := q; e5 := t;

			    when O"3" =>   e1 := q; e5 := t; n1 := true; n5 := true;

			    when O"4" =>   e1 := c0(3); e5 := c0(3); n1 := true;

			    when O"5" =>   e1 := t; e5 := q; n1 := true; n5 := true;

			    when others => e1 := (others=>'X'); e5 := (others=>'X');

			end case;

			if n1 then

				s1 <= s1 - e1;

			else

				s1 <= s1 + e1;

			end if;

			if n5 then

				s5 <= s5 - e5;

			else

				s5 <= s5 + e5;

			end if;

			case cnt is

			    when O"5"     =>   cnt <= "000";

			    when others =>   cnt <= cnt + 1;

			end case;

			--s2 <= s2 + e2;

			--s3 <= s3 + e3;

			

			-- fictive output to force XST to keep the design		

			XOUT <= s0 xor s1 xor s2 xor s3 xor s4 xor s5;



		end if;

	end process;

end Behavioral;