[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
[tor-bugs] #2921 [Metrics]: Improve bulk import of relay descriptors into metrics database
#2921: Improve bulk import of relay descriptors into metrics database
-------------------------+--------------------------------------------------
Reporter: karsten | Owner: karsten
Type: enhancement | Status: new
Priority: normal | Milestone:
Component: Metrics | Version:
Keywords: | Parent:
Points: | Actualpoints:
-------------------------+--------------------------------------------------
We currently have two ways to import relay descriptors into the metrics
database:
- JDBC import: We have a Java importer that connects to the metrics
database via JDBC. We use a few tweaks like committing batches of up to
500 rows, but importing months of data is still a time-consuming task.
- psql \copy: The Java importer can be configured to parse relay
descriptor files and write files for psql's \copy command. The
disadvantage is that \copy cannot handle duplicates very well, so that we
have to pre-process the bulk import files.
I wonder if there are better approaches than these two, or if there are
improvements to how we implement them. It would be good to compare the
performance of these two approaches and any improvements to them for 1
(12, 24) months of data.
--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/2921>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs