-
- Downloads
Avoid reprocessing webstats files.
Web servers typically provide us with the last 14 days of request logs. We shouldn't process the whole 14 days over and over. Instead we should only process new logs files and any other log files containing log lines from newly written dates. In some cases web servers stop serving a given virtual host or stop acting as web server at all. However, in these cases we're left with 14 days of logs per virtual host. Ideally, these logs would get cleaned up, but until that's the case, we should at least not reprocess these files over and over. In order to avoid reprocessing webstats files, we need a new state file with log dates contained in given input files. We use that state file to determine which of the previously processed webstats files to re-process, so that we can write complete daily logs.
Showing
- CHANGELOG.md 1 addition, 0 deletionsCHANGELOG.md
- src/main/java/org/torproject/metrics/collector/webstats/LogMetadata.java 17 additions, 0 deletions...rg/torproject/metrics/collector/webstats/LogMetadata.java
- src/main/java/org/torproject/metrics/collector/webstats/SanitizeWeblogs.java 163 additions, 39 deletions...orproject/metrics/collector/webstats/SanitizeWeblogs.java
Loading
Please register or sign in to comment