From effa01f8560800a78aad5835c09b4f61f3547a59 Mon Sep 17 00:00:00 2001 From: Yarmo Mackenbach Date: Mon, 13 Jul 2020 23:20:19 +0200 Subject: [PATCH] Add note --- .../07/2020-07-13--plausible-versus-logs.md | 82 +++++++++++++++++++ 1 file changed, 82 insertions(+) create mode 100644 content/notes/2020/07/2020-07-13--plausible-versus-logs.md diff --git a/content/notes/2020/07/2020-07-13--plausible-versus-logs.md b/content/notes/2020/07/2020-07-13--plausible-versus-logs.md new file mode 100644 index 0000000..08e2464 --- /dev/null +++ b/content/notes/2020/07/2020-07-13--plausible-versus-logs.md @@ -0,0 +1,82 @@ +--- +title: "Quick comparison: Plausible vs logs" +author: Yarmo Mackenbach +slug: plausible-versus-logs +date: "2020-07-13 23:19:39" +published: true +--- + +About a month ago, I started collecting website usage data using both [Plausible.io](https://plausible.io) and logs generated by [Caddyserver](https://caddyserver.com), my reverse proxy. The goal was to compare the data sources, just like [Marko Saric](https://markosaric.com/) did in a [post on the Plausible blog](https://plausible.io/blog/server-log-analysis). + +Here's a quick overview of the results. For more details, read the post mentioned above, the results are nearly identical and Marko does a great job explaining the results. + +## Results + +### Quantitative data + +The table below summarizes key metrics computed by both Plausible and [GoAccess](https://goaccess.io) (based on Caddyserver logs). Data used was collected between June 13th and July 13th. + +| Metric | Plausible.io | Logs + GoAccess | Δ factor | +| :--------- | :----------- | :-------------- | :------------- | +| Visitors | 32.1k | 76.9k | x2.4 | +| Pageviews | 44.5k | 468.6k | x10.5 | +| Bandwidth | - | 16.6 GiB | - | + +Just as Marko noticed, logs show much higher numbers of visitors and pageviews, likely due to crawlers and bots that get noticed in the logs but do not run javascript and therefore are not picked up by Plausible. + +I could compare other metrics like referrers and top pages, but again, I suggest you read the [post on the Plausible blog](https://plausible.io/blog/server-log-analysis). + +I'd like to add that the logs can provide some information about bandwidth usage and which files are downloaded the most. This would allow you to make informed decisions when optimizing caching and file loading. Plausible can't help you with this data, one needs logs for this. + +### Qualitative data + +The experience with Plausible was more convenient than with GoAccess, as the website of the former loads in seconds whilst the latter took 3 minutes to process the logs and generate the results. + +## Conclusion + +Both methods have advantages and disadvantages. Plausible gives fast and precise results but potentially impacts page load (although minimally). Server logs don't impact page load, can provide bandwidth stats but inflate numbers due to traffic noise generated by search engines, crawlers and bots. Personally, I will continue using both for the foreseeable future. + +## Methodology + +### Plausible + +Visit the [Plausible.io](https://plausible.io) website and simply look at the website's stats. + +### Caddy logs + +Logs were collected using the following snippet in the Caddyfile: + +``` +log { + output file /var/log/caddy/access.log { + roll_size 100MiB + roll_keep 10 + roll_keep_for 2160h + } +} +``` + +### GoAccess + +As GoAccess cannot read Caddy logs directly, a small bash script is needed: + +``` +today_date=$(date -u +"%Y-%m-%d") +today_date=$(date -u --date="$today_date -30 day" +"%Y-%m-%d") +today_ts=$(date -d $today_date +%s) + +goaccess <(zcat -f logs/access* | jq --raw-output ' + .request.remote_addr |= .[:-6] | + select(.request.remote_addr != "1.1.1.1") | + select(.request.remote_addr != "2.2.2.2") | + select(.ts >= '$today_ts') | + [ + .common_log, + .request.headers.Referer[0] // "-", + .request.headers."User-Agent"[0], + .duration + ] | @csv') \ + --log-format='"%h - - [%d:%t %^] ""%m %r %H"" %s %b","%R","%u",%T' --time-format='%H:%M:%S' --date-format='%d/%b/%Y' +``` + +This was adapted from the bash script described by [Alessandro](https://fosstodon.org/@AlexMV12) in this [blog post](https://alexmv12.xyz/blog/goaccess_caddy/).