<feed xmlns='http://www.w3.org/2005/Atom'>
<title>user/karsten/collector, branch task-32747</title>
<subtitle>Karsten's collector repository</subtitle>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/'/>
<entry>
<title>squash! Avoid reprocessing webstats files.</title>
<updated>2020-01-09T11:46:09+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2020-01-09T11:46:09+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=76db49572f55b4058f9d8c29f983a007f159dd4e'/>
<id>76db49572f55b4058f9d8c29f983a007f159dd4e</id>
<content type='text'>
 - Use the default options CREATE, TRUNCATE_EXISTING, and WRITE for
   overwriting the state file rather than just CREATE. The effect was
   that we kept lines from before, which is not a big issue, but also
   not what we wanted.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - Use the default options CREATE, TRUNCATE_EXISTING, and WRITE for
   overwriting the state file rather than just CREATE. The effect was
   that we kept lines from before, which is not a big issue, but also
   not what we wanted.
</pre>
</div>
</content>
</entry>
<entry>
<title>squash! Avoid reprocessing webstats files.</title>
<updated>2020-01-09T10:43:04+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2020-01-09T10:28:36+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=94b87099197890c1614b301a3c493f47c8003f02'/>
<id>94b87099197890c1614b301a3c493f47c8003f02</id>
<content type='text'>
 - Avoid duplicating code from WebServerAccessLogPersistence for
   calculating storage paths.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - Avoid duplicating code from WebServerAccessLogPersistence for
   calculating storage paths.
</pre>
</div>
</content>
</entry>
<entry>
<title>Avoid reprocessing webstats files.</title>
<updated>2019-12-13T09:47:57+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-12-11T11:22:40+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=d7117f8c8ee946748eea4d2f2741195d4dbfe056'/>
<id>d7117f8c8ee946748eea4d2f2741195d4dbfe056</id>
<content type='text'>
Web servers typically provide us with the last 14 days of request
logs. We shouldn't process the whole 14 days over and over. Instead we
should only process new logs files and any other log files containing
log lines from newly written dates.

In some cases web servers stop serving a given virtual host or stop
acting as web server at all. However, in these cases we're left with
14 days of logs per virtual host. Ideally, these logs would get
cleaned up, but until that's the case, we should at least not
reprocess these files over and over.

In order to avoid reprocessing webstats files, we need a new state
file with log dates contained in given input files. We use that state
file to determine which of the previously processed webstats files to
re-process, so that we can write complete daily logs.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Web servers typically provide us with the last 14 days of request
logs. We shouldn't process the whole 14 days over and over. Instead we
should only process new logs files and any other log files containing
log lines from newly written dates.

In some cases web servers stop serving a given virtual host or stop
acting as web server at all. However, in these cases we're left with
14 days of logs per virtual host. Ideally, these logs would get
cleaned up, but until that's the case, we should at least not
reprocess these files over and over.

In order to avoid reprocessing webstats files, we need a new state
file with log dates contained in given input files. We use that state
file to determine which of the previously processed webstats files to
re-process, so that we can write complete daily logs.
</pre>
</div>
</content>
</entry>
<entry>
<title>Add some real tests for the webstats module.</title>
<updated>2019-12-12T15:48:38+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-12-11T11:16:05+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=d5ca95a2bb74410004f5c4c93270f3fd90475068'/>
<id>d5ca95a2bb74410004f5c4c93270f3fd90475068</id>
<content type='text'>
Temporary commit: requires update to metrics-base commit 264e498 as
soon as that's merged to master.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Temporary commit: requires update to metrics-base commit 264e498 as
soon as that's merged to master.
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove dependency on metrics-lib's log package (4/4).</title>
<updated>2019-11-25T16:02:07+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-11-23T17:07:41+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=8263cc7bdbb0a632f12a84fb2051dd9a25c28142'/>
<id>8263cc7bdbb0a632f12a84fb2051dd9a25c28142</id>
<content type='text'>
 - Remove package-internal abstract class.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - Remove package-internal abstract class.
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove dependency on metrics-lib's log package (3/4).</title>
<updated>2019-11-25T16:01:09+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-11-23T16:55:17+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=c11b61465a644940559b97c93a769fda84287970'/>
<id>c11b61465a644940559b97c93a769fda84287970</id>
<content type='text'>
 - Remove package-internal interfaces InternalLogDescriptor and
   InternalWebServerAccessLog.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - Remove package-internal interfaces InternalLogDescriptor and
   InternalWebServerAccessLog.
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove dependency on metrics-lib's log package (2/4).</title>
<updated>2019-11-25T16:01:00+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-11-23T16:43:45+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=ea1b1b4f6ab11e7ac933b0e00d5f8c040e4cc11e'/>
<id>ea1b1b4f6ab11e7ac933b0e00d5f8c040e4cc11e</id>
<content type='text'>
 - Remove unused code.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - Remove unused code.
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove dependency on metrics-lib's log package (1/4).</title>
<updated>2019-11-25T16:00:44+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-11-23T16:13:22+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=859476ecaec2164e0d84bbba4377da11c90034b2'/>
<id>859476ecaec2164e0d84bbba4377da11c90034b2</id>
<content type='text'>
 - Copy types from metrics-lib to this code base.
 - Update package and import statements.
 - Copy remaining parts of metrics-lib's FileType.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - Copy types from metrics-lib to this code base.
 - Update package and import statements.
 - Copy remaining parts of metrics-lib's FileType.
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove dependency on DescriptorIndexCollector.</title>
<updated>2019-11-22T17:49:46+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-11-22T17:49:46+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=cc3aa57e57cdcd632a6a8a360f08880f0ec57242'/>
<id>cc3aa57e57cdcd632a6a8a360f08880f0ec57242</id>
<content type='text'>
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
</pre>
</div>
</content>
</entry>
<entry>
<title>Remove dependency on metrics-lib's internal package.</title>
<updated>2019-11-22T17:01:11+00:00</updated>
<author>
<name>Karsten Loesing</name>
<email>karsten.loesing@gmx.net</email>
</author>
<published>2019-11-22T16:54:26+00:00</published>
<link rel='alternate' type='text/html' href='https://gitweb.torproject.org/user/karsten/collector.git/commit/?id=5a0e6be21c2de4b35e4111364b3ecd7caf424164'/>
<id>5a0e6be21c2de4b35e4111364b3ecd7caf424164</id>
<content type='text'>
The only functionality contained in metrics-lib's internal package is
file (de-)compression, which in turn uses a third-party library that
we're using anyway. This is a weak reason for depending on our own
library for this functionality. Removing this dependency will make it
easier to make changes to our library in the future.

The new FileType class is based on a copy of the same enum type in
metrics-lib without @since tags and without methods that we don't use.
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The only functionality contained in metrics-lib's internal package is
file (de-)compression, which in turn uses a third-party library that
we're using anyway. This is a weak reason for depending on our own
library for this functionality. Removing this dependency will make it
easier to make changes to our library in the future.

The new FileType class is based on a copy of the same enum type in
metrics-lib without @since tags and without methods that we don't use.
</pre>
</div>
</content>
</entry>
</feed>
