[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [f-cpu] TLB resume
I try to explain for every one.
TLB is a little buffer used to translate virtual adresse to physical
memory adress. This is used to clearly sperate adresse space of
processes. It's possible to have 2 differentes TLB : one for data and
one for code.
This piece of hardware is critical because on the critical data path
when we access the memory. Sometimes the L1 cache cache virtual adresse
to avoid accessing to much the tlb.
Tlb is critical on task switch, too. On x86, it must be almost entirely
flush before being usable for the next task.
The tlb is in fact a kind of caches. It mainly use CAM (content adresse
memory, so the more entries, the slower) it's memory where you could
send a data and it give you back an address (that could used to access
an other normal memory bank, it's a key to content association). It
critical by it's size the number of address and by the size of the page
used.
x86 have a very week protection model by using only 2 bit, that's why
buffer overflow are so easy to do in the x86 world.
cedric a écrit :
>
> Here is an extract from the TLB discussion that append one month ago.
>
> Michael propose this :
> "
> I've been thinking about the TLB before; IMHO, we need at least the
> following (assuming <n> bits for the page offset):
>
> - virtual address (64-<n> bits)
the input.
> - physical address (64-<n> bits)
The main ouput.
> - address space identifier (ASI; 8 bits was suggested)
Could be an output and then we check it with an SR. But if it's an
input, we didn't to flush the tlb during a task switch !
> - user access rights (RWX, 3 bits)
Ouput and check what it's doing and raise the appropriate trap if
needed.
> - supervisor access rights (RWX, 3 bits)
i don't really understood the use of this one.
If a process make a system call. It use a specif call that change the
mode (user-> superuser). Superuser (the kernel) could access to
anything. But the strangest thing is that superuser are the one that
could change the tlb content...
> - valid bit (indicating that the entry is valid)
> - dirty bit (indicating that the page has been written to)
> - used bit (indicating that the page has been accessed)
That's for tlb miss handler to accelerate some choice and avoid
duplicate rewrite and so one...
>
> - page size (4K << size, 6 bits)
Page size are critical. How many memories could be acceded without a tlb
miss ? In x86 world, the pages are 4 Kb large. The tlb have between 32
to 128 entries : 512 Kb adressed. That's few !
Intel introduice big sized pages (4 Mb) for framebuffer for example.
Linux use it for the kernel code (so there is no tlb miss inside the
kernel code).
If the size of the page are bigger you could addresse more memories with
the same number of entries. But i could became hard for the pagging
system to find hole aligned with power of 2 adresse bits. A process
needs at least 3 pages (code, data, stack).
Here is a top extract :
10402 nico 9 0 23516 22M 12628 S 0,0 24,8 0:00
mozilla-bin
10402 nico 9 0 23516 22M 12628 S 0,0 24,8 0:00
mozilla-bin
10402 nico 9 0 23516 22M 12628 S 0,0 24,8 0:00
mozilla-bin
10403 nico 9 0 23516 22M 12628 S 0,0 24,8 0:00
mozilla-bin
10409 nico 9 0 20920 20M 9620 S 0,0 21,9 0:24
netscape-commun
1439 root 9 0 59432 7608 1216 S 0,0 8,0 2:16 X
10427 nico 8 0 3832 3752 3272 S 0,0 3,9 0:00
netscape-commun
1351 xfs 9 0 5080 3440 696 S 0,0 3,6 0:13 xfs
1656 nico 9 0 1812 1480 1016 S 0,0 1,5 0:15 wmaker
1739 nico 9 0 1476 1280 1080 S 0,0 1,3 0:00 bash
10440 nico 12 0 1056 1056 844 R 0,7 1,1 0:00 top
1730 nico 9 0 1108 792 628 S 0,0 0,8 0:00 bash
1731 nico 9 0 940 448 424 S 0,0 0,4 0:00 bash
8913 root 8 0 760 428 236 S 0,0 0,4 0:00 bash
1723 nico 9 0 492 412 400 S 0,0 0,4 0:00 aterm
10249 root 8 0 476 412 320 S 0,0 0,4 0:00 pppd
1724 nico 9 0 496 396 372 R 0,1 0,4 0:00 aterm
1725 nico 9 0 468 364 364 S 0,0 0,3 0:00 aterm
1726 nico 9 0 384 288 288 S 0,0 0,3 0:00 aterm
1727 nico 9 0 412 184 184 S 0,0 0,1 0:00
wmCalClock-Wind
878 root 9 0 236 180 180 S 0,0 0,1 0:00 syslogd
886 root 9 0 828 172 172 S 0,0 0,1 0:00 klogd
22 Mb for mozilla, 60 Mb for X (but used 7.5 Mb currently and swap
almost 90% of it's size), a few utility use ~1Mb and lot of them use 512
Kb.
So around 1 Mb for most of them and many Mb for 2 or 3 applications.
Hurd guys advice to have 4 size : tiny for message passing (4 Ko ? what
ever the size, a message is message and most of them are small, so it
will use a page), medium size for code (64 Kb was the first idea), big
size for framebuffer and fat kernel (4-8Mo, 16 Mo) and very big for
memory hungry application (data based, a scientific tools,...) (256 Mo
!).
So 4 sizes ! In the IA-64 arch, intel put 11 sizes !
My current problem is how managing different size of page inside the
same CAM. Maybe you could manipulat the bit mask but i don't see how or
split the adresse in 2 or 4.
Using how many table than size don't seems to be too realistic.
Or we can use faster cache. Instead of using full associativity we can
use 4 way caches. it's smaller and faster but you add depencies inside
the adresse that could be put inside the tlb.
hope this help.
nicO
> "
>
> "present bits" has been removed because it's only needed by HW TLB after the
> discussion. I hope I have maid a good resume, but if someone want to add
> something, before I add this to the manual (perhaps we must clarify how to
> access it before).
>
> Cedric
>
> *************************************************************
> To unsubscribe, send an e-mail to majordomo@seul.org with
> unsubscribe f-cpu in the body. http://f-cpu.seul.org/
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/