Discussion:
[arch-mirrors] Mirror Stats Project - Using weblogs to gather useful information
Tyler Dence
2017-08-22 02:53:08 UTC
Permalink
I've made a post about this topic on the arch forums, but I thought I'd
bring it to the attention of the mailing list here.

So I run a mirror over at https://arlm.tyzoid.com/, and I decided it would
be cool to see if I can't get some interesting information back from the
logs generated.

Currently, I have a list of the most downloaded packages (compiled nightly):

- https://stats.arlm.tyzoid.com/pkgstats.json
- https://stats.arlm.tyzoid.com/pkgstats.txt


And a graph of network traffic:

- https://stats.arlm.tyzoid.com/


I'm wondering if there's any other information that might be interesting to
make available, or if anyone else here is interested in contributing some
of their own collected data for the stats project. If so, we should get a
project up on github sometime to have a unified method/web viewer.

Some caveats that I impose for now, to protect privacy:

- IP addresses will be available, nor any IP Prefix.
- Geographic data will be made available at no more than a weekly
granularity (exception for summary data which spans multiple weeks)
- Geographic data will not be made available with more granularity than
state/province.

Perhaps there should also be some discussion regarding what logs are
kept/deleted?
Miłosz Tyborowski
2017-08-22 04:33:05 UTC
Permalink
The sole fact that cryptsetup has been downloaded over 12 times more often
that linux package was is interesting, to say the least.
Post by Tyler Dence
I've made a post about this topic on the arch forums, but I thought I'd
bring it to the attention of the mailing list here.
So I run a mirror over at https://arlm.tyzoid.com/, and I decided it
would be cool to see if I can't get some interesting information back from
the logs generated.
- https://stats.arlm.tyzoid.com/pkgstats.json
- https://stats.arlm.tyzoid.com/pkgstats.txt
- https://stats.arlm.tyzoid.com/
I'm wondering if there's any other information that might be interesting
to make available, or if anyone else here is interested in contributing
some of their own collected data for the stats project. If so, we should
get a project up on github sometime to have a unified method/web viewer.
- IP addresses will be available, nor any IP Prefix.
- Geographic data will be made available at no more than a weekly
granularity (exception for summary data which spans multiple weeks)
- Geographic data will not be made available with more granularity
than state/province.
Perhaps there should also be some discussion regarding what logs are
kept/deleted?
Tyler Dence
2017-08-22 11:23:07 UTC
Permalink
My best guess is that it's some monitoring software package. I'm seeing the
user agent of a lot of these requests as Python-urllib/3.6, which is
different than the user agent for pacman:

$ grep cryptsetup access.log |tail -n 5| cut -f4- -d' '
[22/Aug/2017:07:14:01 -0400] "GET
/core/os/x86_64/cryptsetup-1.7.5-1-x86_64.pkg.tar.xz HTTP/1.1" 200 246880
"-" "Python-urllib/3.6"
[22/Aug/2017:07:17:34 -0400] "GET
/core/os/x86_64/cryptsetup-1.7.5-1-x86_64.pkg.tar.xz HTTP/1.1" 200 246880
"-" "Python-urllib/3.6"
[22/Aug/2017:07:18:15 -0400] "GET
/core/os/x86_64/cryptsetup-1.7.5-1-x86_64.pkg.tar.xz HTTP/1.1" 200 246880
"-" "Python-urllib/3.6"
[22/Aug/2017:07:18:43 -0400] "GET
/core/os/x86_64/cryptsetup-1.7.5-1-x86_64.pkg.tar.xz HTTP/1.1" 200 246880
"-" "Python-urllib/3.6"
[22/Aug/2017:07:19:30 -0400] "GET
/core/os/x86_64/cryptsetup-1.7.5-1-x86_64.pkg.tar.xz HTTP/1.1" 200 246880
"-" "Python-urllib/3.6"

Not shown is the IP of each request, but they are all unique.

vs

$ grep archlinux-keyring access.log |tail -n 5| cut -f4- -d' '
[21/Aug/2017:15:02:03 -0400] "GET
/core/os/x86_64/archlinux-keyring-20170611-1-any.pkg.tar.xz HTTP/1.1" 200
677725 "-" "python-requests/2.18.1"
[21/Aug/2017:16:12:31 -0400] "GET
/core/os/x86_64/archlinux-keyring-20170611-1-any.pkg.tar.xz HTTP/1.1" 200
677725 "-" "python-requests/2.18.1"
[21/Aug/2017:17:21:07 -0400] "GET
/core/os/x86_64/archlinux-keyring-20170611-1-any.pkg.tar.xz HTTP/1.1" 200
677669 "-" "pacman/5.0.2 (Linux x86_64) libalpm/10.0.2"
[22/Aug/2017:03:48:54 -0400] "GET
/core/os/i686/archlinux-keyring-20170611-1-any.pkg.tar.xz HTTP/1.1" 200
677669 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"
[22/Aug/2017:03:53:22 -0400] "GET
/core/os/i686/archlinux-keyring-20170611-1-any.pkg.tar.xz.sig HTTP/1.1" 200
554 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp)"

That said, this is still something I'm very much interested in finding out
what the root cause is.
Post by Miłosz Tyborowski
The sole fact that cryptsetup has been downloaded over 12 times more often
that linux package was is interesting, to say the least.
Post by Tyler Dence
I've made a post about this topic on the arch forums, but I thought I'd
bring it to the attention of the mailing list here.
So I run a mirror over at https://arlm.tyzoid.com/, and I decided it
would be cool to see if I can't get some interesting information back from
the logs generated.
- https://stats.arlm.tyzoid.com/pkgstats.json
- https://stats.arlm.tyzoid.com/pkgstats.txt
- https://stats.arlm.tyzoid.com/
I'm wondering if there's any other information that might be interesting
to make available, or if anyone else here is interested in contributing
some of their own collected data for the stats project. If so, we should
get a project up on github sometime to have a unified method/web viewer.
- IP addresses will be available, nor any IP Prefix.
- Geographic data will be made available at no more than a weekly
granularity (exception for summary data which spans multiple weeks)
- Geographic data will not be made available with more granularity
than state/province.
Perhaps there should also be some discussion regarding what logs are
kept/deleted?
Loading...