Hi, today I've written part of my own emulator. Why ? Well simulator is better but not complete yet and probably slow. I wanted to have emulator which can emit stats on pipeline stalls, be fast and simple enough to have it complete in a few days. I want it also for gcc performance tests and run Linux on it later. It is basically loop with switch() and each opcode defined by a few lines (operation and pipeline data for scheduler). It is incomplete but when tried with loop of independent 64bit AND it gives 15MIPS on my PII/375. SIMD ADD.W gives 12MIPS and ADD.W with saturation and carry store 7MIPS. It can be still optimized a bit. It uses MMX where possible (it helps especially with .B and .W SIMD ops). SSE would help but there is much less machines with SSE than with MMX (mine for example). Adding next OPs is simple without need to schedule them - just specify type (if not 2r1w) and latency. I'll continue on it (it has only 200 lines just now) but wanted to share the ideas with you :) It uses timestamping of register writes and circular fifo so that there are almost no loops in critical path. As example see attached output (out.txt) of sequence: add r1,r2,r5 and r1,r2,r6 move r3,r7 and r1,r2,r6 xor r1,r6,r6 it detects both RAW and write port stalls. devik PS: There should be specified whether ADD saturation is signed or not !
Attachment:
devik-fcpu-emul-021207.tar.gz
Description: Binary data
cc -mmmx -O3 -c -o main.o main.c cc main.o -o main time ./main Now: 0 r0 [ 0]: 0x0000000000000000 r1 [ 0]: 0xFF100005FFFFFFFF r2 [ 0]: 0x0301000500000001 r3 [ 0]: 0xFFFFFF00F000FFF0 r4 [ 0]: 0x0000000000000000 r5 [ 0]: 0x0000000000000000 r6 [ 0]: 0x0000000000000000 r7 [ 0]: 0x0000000000000000 FIFO wports assignments: | | | | | | 0000000000000000000000000000000000000000000000000000000000000000 Now: 1 r0 [ 0]: 0x0000000000000000 r1 [ 0]: 0xFF100005FFFFFFFF r2 [ 0]: 0x0301000500000001 r3 [ 0]: 0xFFFFFF00F000FFF0 r4 [ 0]: 0x0000000000000000 r5 [ 4]: 0x0211000B00000000 r6 [ 0]: 0x0000000000000000 r7 [ 0]: 0x0000000000000000 FIFO wports assignments: | | | | | | 0010000000000000000000000000000000000000000000000000000000000000 Now: 2 r0 [ 0]: 0x0000000000000000 r1 [ 0]: 0xFF100005FFFFFFFF r2 [ 0]: 0x0301000500000001 r3 [ 0]: 0xFFFFFF00F000FFF0 r4 [ 0]: 0x0000000000000000 r5 [ 4]: 0x0211000B00000000 r6 [ 4]: 0x0300000500000001 r7 [ 0]: 0x0000000000000000 FIFO wports assignments: | | | | | | 0200000000000000000000000000000000000000000000000000000000000000 Write port stall 1 cycles ! Now: 4 r0 [ 0]: 0x0000000000000000 r1 [ 0]: 0xFF100005FFFFFFFF r2 [ 0]: 0x0301000500000001 r3 [ 0]: 0xFFFFFF00F000FFF0 r4 [ 0]: 0x0000000000000000 r5 [ 4]: 0x0211000B00000000 r6 [ 4]: 0x0300000500000001 r7 [ 5]: 0xFFFFFF00F000FFF0 FIFO wports assignments: | | | | | | 1000000000000000000000000000000000000000000000000000000000000000 Now: 5 r0 [ 0]: 0x0000000000000000 r1 [ 0]: 0xFF100005FFFFFFFF r2 [ 0]: 0x0301000500000001 r3 [ 0]: 0xFFFFFF00F000FFF0 r4 [ 0]: 0x0000000000000000 r5 [ 4]: 0x0211000B00000000 r6 [ 7]: 0x0300000500000001 r7 [ 5]: 0xFFFFFF00F000FFF0 FIFO wports assignments: | | | | | | 0100000000000000000000000000000000000000000000000000000000000000 RAW/WAW stall 1 cycles ! Now: 7 r0 [ 0]: 0x0000000000000000 r1 [ 0]: 0xFF100005FFFFFFFF r2 [ 0]: 0x0301000500000001 r3 [ 0]: 0xFFFFFF00F000FFF0 r4 [ 0]: 0x0000000000000000 r5 [ 4]: 0x0211000B00000000 r6 [ 9]: 0xFC100000FFFFFFFE r7 [ 5]: 0xFFFFFF00F000FFF0 FIFO wports assignments: | | | | | | 0100000000000000000000000000000000000000000000000000000000000000