[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: [tor-bugs] #30857 [Internal Services/Services Admin Team]: migrate (some projects? everything?) from trac to gitlab
#30857: migrate (some projects? everything?) from trac to gitlab
-------------------------------------------------+-------------------------
Reporter: anarcat | Owner: (none)
Type: project | Status: new
Priority: Medium | Milestone:
Component: Internal Services/Services Admin | Version:
Team |
Severity: Normal | Resolution:
Keywords: | Actual Points:
Parent ID: #29400 | Points:
Reviewer: | Sponsor:
-------------------------------------------------+-------------------------
Comment (by anarcat):
10GB was a low-end estimate. The final crawl will be much, much larger.
Just to give you a range, the crawl of each ticket page was completed,
here:
https://archive.fart.website/archivebot/viewer/job/5vytc
It's around 700MB, compressed. That might seem like a lot, but that's just
for the tickets. The crawl job for the entire Trac site is still ongoing,
and is currently at 40GB, with 160,000 URLs crawled, and still 500,000
more to go, so we can assume it will be at least 200GB, but we just don't
really know until the crawl is finished (because each new page can yield
new links).
The problem is not 10GB, it's the 200GB or 500GB or more. :) Maybe it's
fine to have such a large dataset around forever, but from other
experience, I see we have trouble holding on to that stuff (see for
example the problems we have with archive.tpo now, in #29697).
So, TL;DR: it's not 10GB. It will be closer to 200GB, maybe a terabyte,
for the full crawl.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/30857#comment:7>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs