<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/karsten/collector, branch collector-1.14.0</title>
<subtitle>Karsten's collector repository</subtitle>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/'/>
<entry>
<title>Prepare for 1.14.0 release.</title>
<updated>2020-01-15T22:07:02+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2020-01-15T22:07:02+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=3a9f05e01f1abe8315d25c780b3aab72376412f5'/>
<id>3a9f05e01f1abe8315d25c780b3aab72376412f5</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Update to metrics-lib 2.10.0.</title>
<updated>2020-01-15T21:59:26+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2020-01-15T21:58:17+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=27e41ea7393c24bc72429915dd74c309c2583fa2'/>
<id>27e41ea7393c24bc72429915dd74c309c2583fa2</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Remember processed files between module runs.</title>
<updated>2020-01-15T21:56:30+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2020-01-07T12:30:28+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=741401a0daffda52fd1de81b29b276ed9e939ba5'/>
<id>741401a0daffda52fd1de81b29b276ed9e939ba5</id>
<content type='text'>
The three recently added modules to archive Snowflake statistics,
bridge pool assignments, and BridgeDB metrics have in common that they
process any input files regardless of whether they already processed
them before.

The problem is that the input files processed by these modules are
either never removed (Snowflake statistics) or only removed manually
by the operator (bridge pool assignments and BridgeDB statistics).

The effect is that non-recent BridgeDB metrics and bridge pool
assignments are being placed in the indexed/recent/ directory in the
next execution after they are deleted for being older than 72 hours.
The same would happen with Snowflake statistics after the operator
removes them from the out/ directory.

The fix is to use a state file containing file names of previously
processed files and only process a file not found in there. This is
the same approach as taken for bridge descriptor tarballs.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The three recently added modules to archive Snowflake statistics,
bridge pool assignments, and BridgeDB metrics have in common that they
process any input files regardless of whether they already processed
them before.

The problem is that the input files processed by these modules are
either never removed (Snowflake statistics) or only removed manually
by the operator (bridge pool assignments and BridgeDB statistics).

The effect is that non-recent BridgeDB metrics and bridge pool
assignments are being placed in the indexed/recent/ directory in the
next execution after they are deleted for being older than 72 hours.
The same would happen with Snowflake statistics after the operator
removes them from the out/ directory.

The fix is to use a state file containing file names of previously
processed files and only process a file not found in there. This is
the same approach as taken for bridge descriptor tarballs.
</pre>
</div>
</content>
</entry>
<entry>
<title>Update copyright to 2020.</title>
<updated>2020-01-15T20:36:34+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2020-01-15T20:36:34+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=d2a74b676a0d1c8563638ca3607a866b95877949'/>
<id>d2a74b676a0d1c8563638ca3607a866b95877949</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Avoid reprocessing webstats files.</title>
<updated>2020-01-14T16:03:24+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-12-11T11:22:40+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=d48163379c2604626a62da775aafe68b5be62186'/>
<id>d48163379c2604626a62da775aafe68b5be62186</id>
<content type='text'>
Web servers typically provide us with the last 14 days of request
logs. We shouldn't process the whole 14 days over and over. Instead we
should only process new logs files and any other log files containing
log lines from newly written dates.

In some cases web servers stop serving a given virtual host or stop
acting as web server at all. However, in these cases we're left with
14 days of logs per virtual host. Ideally, these logs would get
cleaned up, but until that's the case, we should at least not
reprocess these files over and over.

In order to avoid reprocessing webstats files, we need a new state
file with log dates contained in given input files. We use that state
file to determine which of the previously processed webstats files to
re-process, so that we can write complete daily logs.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Web servers typically provide us with the last 14 days of request
logs. We shouldn't process the whole 14 days over and over. Instead we
should only process new logs files and any other log files containing
log lines from newly written dates.

In some cases web servers stop serving a given virtual host or stop
acting as web server at all. However, in these cases we're left with
14 days of logs per virtual host. Ideally, these logs would get
cleaned up, but until that's the case, we should at least not
reprocess these files over and over.

In order to avoid reprocessing webstats files, we need a new state
file with log dates contained in given input files. We use that state
file to determine which of the previously processed webstats files to
re-process, so that we can write complete daily logs.
</pre>
</div>
</content>
</entry>
<entry>
<title>Add some real tests for the webstats module.</title>
<updated>2020-01-14T16:03:17+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-12-11T11:16:05+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=3002d6bc6b6bf84953cf842cbf6b3b18dc944879'/>
<id>3002d6bc6b6bf84953cf842cbf6b3b18dc944879</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove dependency on metrics-lib's log package (4/4).</title>
<updated>2019-11-25T16:02:07+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-11-23T17:07:41+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=8263cc7bdbb0a632f12a84fb2051dd9a25c28142'/>
<id>8263cc7bdbb0a632f12a84fb2051dd9a25c28142</id>
<content type='text'>
 - Remove package-internal abstract class.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - Remove package-internal abstract class.
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove dependency on metrics-lib's log package (3/4).</title>
<updated>2019-11-25T16:01:09+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-11-23T16:55:17+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=c11b61465a644940559b97c93a769fda84287970'/>
<id>c11b61465a644940559b97c93a769fda84287970</id>
<content type='text'>
 - Remove package-internal interfaces InternalLogDescriptor and
   InternalWebServerAccessLog.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - Remove package-internal interfaces InternalLogDescriptor and
   InternalWebServerAccessLog.
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove dependency on metrics-lib's log package (2/4).</title>
<updated>2019-11-25T16:01:00+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-11-23T16:43:45+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=ea1b1b4f6ab11e7ac933b0e00d5f8c040e4cc11e'/>
<id>ea1b1b4f6ab11e7ac933b0e00d5f8c040e4cc11e</id>
<content type='text'>
 - Remove unused code.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - Remove unused code.
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove dependency on metrics-lib's log package (1/4).</title>
<updated>2019-11-25T16:00:44+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-11-23T16:13:22+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=859476ecaec2164e0d84bbba4377da11c90034b2'/>
<id>859476ecaec2164e0d84bbba4377da11c90034b2</id>
<content type='text'>
 - Copy types from metrics-lib to this code base.
 - Update package and import statements.
 - Copy remaining parts of metrics-lib's FileType.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - Copy types from metrics-lib to this code base.
 - Update package and import statements.
 - Copy remaining parts of metrics-lib's FileType.
</pre>
</div>
</content>
</entry>
</feed>
