Hi, today I've written part of my own emulator. Why ? Well simulator is better but not complete yet and probably slow. I wanted to have emulator which can emit stats on pipeline stalls, be fast and simple enough to have it complete in a few days. I want it also for gcc performance tests and run Linux on it later. It is basically loop with switch() and each opcode defined by a few lines (operation and pipeline data for scheduler). It is incomplete but when tried with loop of independent 64bit AND it gives 15MIPS on my PII/375. SIMD ADD.W gives 12MIPS and ADD.W with saturation and carry store 7MIPS. It can be still optimized a bit. It uses MMX where possible (it helps especially with .B and .W SIMD ops). SSE would help but there is much less machines with SSE than with MMX (mine for example). Adding next OPs is simple without need to schedule them - just specify type (if not 2r1w) and latency. I'll continue on it (it has only 200 lines just now) but wanted to share the ideas with you :) It uses timestamping of register writes and circular fifo so that there are almost no loops in critical path. As example see attached output (out.txt) of sequence: add r1,r2,r5 and r1,r2,r6 move r3,r7 and r1,r2,r6 xor r1,r6,r6 it detects both RAW and write port stalls. devik PS: There should be specified whether ADD saturation is signed or not !
Attachment:
devik-fcpu-emul-021207.tar.gz
Description: Binary data
cc -mmmx -O3 -c -o main.o main.c
cc main.o -o main
time ./main
Now: 0
r0 [ 0]: 0x0000000000000000
r1 [ 0]: 0xFF100005FFFFFFFF
r2 [ 0]: 0x0301000500000001
r3 [ 0]: 0xFFFFFF00F000FFF0
r4 [ 0]: 0x0000000000000000
r5 [ 0]: 0x0000000000000000
r6 [ 0]: 0x0000000000000000
r7 [ 0]: 0x0000000000000000
FIFO wports assignments:
| | | | | |
0000000000000000000000000000000000000000000000000000000000000000
Now: 1
r0 [ 0]: 0x0000000000000000
r1 [ 0]: 0xFF100005FFFFFFFF
r2 [ 0]: 0x0301000500000001
r3 [ 0]: 0xFFFFFF00F000FFF0
r4 [ 0]: 0x0000000000000000
r5 [ 4]: 0x0211000B00000000
r6 [ 0]: 0x0000000000000000
r7 [ 0]: 0x0000000000000000
FIFO wports assignments:
| | | | | |
0010000000000000000000000000000000000000000000000000000000000000
Now: 2
r0 [ 0]: 0x0000000000000000
r1 [ 0]: 0xFF100005FFFFFFFF
r2 [ 0]: 0x0301000500000001
r3 [ 0]: 0xFFFFFF00F000FFF0
r4 [ 0]: 0x0000000000000000
r5 [ 4]: 0x0211000B00000000
r6 [ 4]: 0x0300000500000001
r7 [ 0]: 0x0000000000000000
FIFO wports assignments:
| | | | | |
0200000000000000000000000000000000000000000000000000000000000000
Write port stall 1 cycles !
Now: 4
r0 [ 0]: 0x0000000000000000
r1 [ 0]: 0xFF100005FFFFFFFF
r2 [ 0]: 0x0301000500000001
r3 [ 0]: 0xFFFFFF00F000FFF0
r4 [ 0]: 0x0000000000000000
r5 [ 4]: 0x0211000B00000000
r6 [ 4]: 0x0300000500000001
r7 [ 5]: 0xFFFFFF00F000FFF0
FIFO wports assignments:
| | | | | |
1000000000000000000000000000000000000000000000000000000000000000
Now: 5
r0 [ 0]: 0x0000000000000000
r1 [ 0]: 0xFF100005FFFFFFFF
r2 [ 0]: 0x0301000500000001
r3 [ 0]: 0xFFFFFF00F000FFF0
r4 [ 0]: 0x0000000000000000
r5 [ 4]: 0x0211000B00000000
r6 [ 7]: 0x0300000500000001
r7 [ 5]: 0xFFFFFF00F000FFF0
FIFO wports assignments:
| | | | | |
0100000000000000000000000000000000000000000000000000000000000000
RAW/WAW stall 1 cycles !
Now: 7
r0 [ 0]: 0x0000000000000000
r1 [ 0]: 0xFF100005FFFFFFFF
r2 [ 0]: 0x0301000500000001
r3 [ 0]: 0xFFFFFF00F000FFF0
r4 [ 0]: 0x0000000000000000
r5 [ 4]: 0x0211000B00000000
r6 [ 9]: 0xFC100000FFFFFFFE
r7 [ 5]: 0xFFFFFF00F000FFF0
FIFO wports assignments:
| | | | | |
0100000000000000000000000000000000000000000000000000000000000000