Be smarter about re-importing consensuses.
A recent analysis of Metrics' back-end performance has revealed that importing consensuses into the database can take between a few seconds and a few *hours*. More precisely, importing a consensus for the first time takes seconds and re-importing a consensus that was already (partially) contained in the database can take hours. The reason for the latter is that we're checking for every status entry whether it's contained in the database before we're inserting it, and these 7k queries are crazy expensive. What we should do, which is what we're doing now, is request and store a list of fingerprints of contained status entry for a given consensus and only inserting a status entry if its fingerprint is not contained in that list. Now we can avoid making these 7k queries and re-import a consensus within seconds. There were two situations when we re-imported one or more consensuses which took hours or more: whenever the host was rebooted during the database import and we lost import history, and whenever CollecTor fetched an outdated consensus from a directory authority that it already received the hour before and that Metrics already imported in its previous run.
parent
5611ef45
Please register or sign in to comment