How to Set up Auto-Archiving of Your Reports
If your website has more than a few hundreds visits per day (bravo!), waiting for Matomo to process your data may take a few minutes. The best way to avoid these waiting times is to set up a cron job on your server so that your data is automatically processed every hour.
If you are using Matomo for WordPress, you don’t need to do this as it utilises the WP Cron. If you are on the Matomo Cloud, this is automatically taken care of for you as well.
To automatically trigger the Matomo archives, you can set up a script that will execute every hour.
There are instructions below for Linux/Unix systems using a crontab, but also instructions for Windows users with the Windows Task Scheduler, and for tools such as CPanel. If you don’t have access to the server, you can also setup a web cron.
Linux/Unix: How to Set up a Crontab to Automatically Archive the Reports.
A crontab is a time-based scheduling service in a Unix-like server. The crontab requires php-cli or php-cgi installed. You will also need SSH access to your server in order to set it up. Let’s create a new crontab with the text editor nano
:
nano /etc/cron.d/matomo-archive
and then add the lines:
MAILTO="[email protected]"
5 * * * * www-data /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /home/example/matomo-archive.log
The Matomo archive script will run every hour (at 5 minutes past). Generally, it completes in less than one minute. On larger websites (10,000 visits and more), Matomo archiving can take up to 30 minutes.
Breakdown of the parameters:
MAILTO="[email protected]"
If there is an error during the script execution, the script output and error messages will be sent to the [email protected] address.www-data
is the user that the cron job will be executed by. The user is sometimes “apache”. It is recommended to run your crontab as the same user as your web server user (to avoid file permissions mismatch)./usr/bin/php
is the path to your PHP executable. It varies depending on your server configuration and operating system. You can execute the command “which php” or “which php” in a linux shell, to find out the the path of your PHP executable. If you don’t know the path, ask your web host or sysadmin./path/to/matomo/console
is the path to your Matomo app on your server. For example it may be/var/www/matomo/console
.--url=http://example.org/matomo/
is the only required parameter in the script, which must be set to your Matomo base URL eg. http://analytics.example.org/ or http://example.org/matomo/> /home/example/matomo-archive.log
is the path where the script will write the output. You can replace this path with /dev/null if you prefer not to log the last Matomo cron output text. The script output contains useful information such as which websites are archived, how long it takes to process for each date & website, etc. This log file should be written in a location outside of your web-server so that people cannot view it via their browser (because this log file will contain some sensitive information about your Matomo installation). You can also replace > by >> in order to append the script output to the log file, rather than overwrite it on each run (but then we recommend you rotate this log file or delete it eg. once a week).2> /home/example/matomo-archive-errors.log
is the optional path where the script will write the error messages. If you omit this from the cron tab, then errors will be emailed to your MAILTO address. If you write this in the crontab, then errors will be logged in this specified error log file. This log file should be written in a location outside of your web-server so that people cannot view it via their browser (because this log file will contain some sensitive information about your Matomo installation).
Description of the ‘linux cron’ utility: The cron utility uses two different types of configuration files: the system crontab and user crontabs. The only difference between these two formats is the sixth field.
- In the system crontab, the sixth field is the name of a user for the command to run as. This gives the system crontab the ability to run commands as any user.
- In a user crontab, the sixth field is the command to run, and all commands run as the user who created the crontab; this is an important security feature.
If you set up your crontab as a user crontab, you would instead write:
5 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null
This cron job will trigger the day/week/month/year archiving process at 5 minutes past every hour. This will make sure that when you visit your Matomo dashboard, the data has already been processed; Matomo will load quickly.
Test the cron command
Make sure the crontab will actually work by running the script as the crontab user in the shell:
su www-data -s /bin/bash -c "/usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/"
You should see the script output with the list of websites being archived, and a summary at the end stating that there was no error.
Launching multiple archivers at once
If you have multiple sites you may be interested in running multiple archivers in parallel for faster archiving. We recommend not starting them at the same time but launch them each a few seconds or minutes apart to avoid concurrency issues. For example:
5 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null
6 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null
In the above example one archiver will start at the minute 5 of each hour, the other starts one minute later. Alternatively, you can also start multiple archivers at the same time using a script which you then execute regularly through a cronjob.
CONCURRENT_ARCHIVERS=2
for i in $(seq 1 $CONCURRENT_ARCHIVERS)
do
(sleep $i && /path/to/matomo/console core:archive & )
done
Windows: How to Set up Auto-Archiving Using Windows Scheduler
-> Please see our dedicated FAQ for setting up a scheduled task in Windows.
Plesk: How to Set up the Cron Script using Plesk
Learn more about installing Matomo on Plesk and configuring the archiving crontab in the Plesk Matomo guide.
CPanel: How to Set up the Cron Script Using CPanel
It is easy to set up automatic archiving if you use a user interface such as CPanel, Webmin or Plesk. Here are the instructions for CPanel:
- Log in to CPanel for the domain with the Matomo installation
- Click on “Cron Jobs”
- Leave email blank
- In ‘Minutes’ put 00 and leave the rest blank.
-
You then need to paste in the path to the PHP executable, then the path to the Matomo /console script, then the parameter with your Matomo base URL –url=matomo.example.org/
Here is an example for a Hostgator install (in this example you would need to change ‘yourcpanelsitename’ to whatever your particular domains cpanel username is)/usr/local/bin/php -f /home/yourcpanelsitename/public_html/matomo/console core:archive --url=example.org/matomo/ > /home/example/matomo-archive-output.log
“yourcpanelsitename” tends to be the first eight letters of your domain (unless you changed it when you set up your cpanel account)
6. Click “Add New Cron Job”
Matomo will process your reports automatically at the hour.
Web Cron When Your Web Host Does Not Support Cron Tasks
If possible, we highly recommend that you run a cron or scheduled task. However, on some shared hosting, or on particular server configurations, running a cron or scheduled task may not be easy or possible.
Some web hosts let you set up a web cron, which is a simple URL that the host will automatically visit at a scheduled time. If your web host lets you create a web cron, you can input the following URL in their hosting interface:
https://matomo.your-server.example/path/to/matomo/misc/cron/archive.php?token_auth=XYZ
Replace the XYZ by the super user 32 characters token_auth. To find the token_auth, log in as a super user in Matomo, click on Administration link in the top menu, go to Personal and click Security. Scroll down below and you’ll find where to create a new Token_auth.
Notes:
- For security, if possible we recommend you
POST
the token_auth parameter to the URLhttps://matomo.your-server.example/path/to/matomo/misc/cron/archive.php
(instead of sending the token_auth as aGET
parameter) - You can test the web cron by pasting the URL in your browser, wait a few minutes for processing to finish and then check the output.
- The web cron should be triggered at least once per hour. You may also use a ‘Website Monitoring’ service (free or paid) to automatically request this page every hour.
Important Tips for Medium to High Traffic Websites
Disable browser triggers for Matomo archiving and limit Matomo reports to updating every hour
After you have set up the automatic archive script as explained above, you can set up Matomo so that requests in the user interface do not trigger archiving, but instead read the pre-archived reports. Login as the super user, click on Administration > System -> General Settings, and select:
- Archive reports when viewed from the browser: No
- Archive reports at most every X seconds : 3600 seconds
Click save to save your changes. Now that you have set up the archiving cron and changed these two settings, you can enjoy fast pre-processed near real-time reports in Matomo!
Today’s statistics will have a one hour lifetime, which ensures the reports are processed every hour (near real time)
Increase PHP Memory Limit
If you receive this error:
Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate X bytes)
you must increase the memory allocated to PHP. To give Matomo enough memory to process your web analytics reports, increase the memory limit. Sites with less data or fewer features enabled can use 512M or 2G. (If the problem persists, we recommend to increase the setting further. 8G is a common size for a medium to large Matomo instance.)
memory_limit = 512M
To find where is your php.ini
file on your server, you can follow the following steps: create a test.php
file and add the following code:
<?php phpinfo(); ?>
and open it in browser, it will show the file which is actually being read by PHP running on your webserver. It will also show your currently set max_execution_time
value.
More High Traffic Server Tips!
It is possible to track millions of pages per month on hundreds or thousands of websites using Matomo. Once you have set up cron archiving as explained above, there are other important and easy steps to improve Matomo performance.
For more information, see How to configure Matomo for speed.
More Information About Matomo Archiving
- If you run archiving several times per day, it will re-archive today’s reports, as well as any reports for a date range which includes today: current week, current month, etc.
- Your Matomo database size will grow over time, this is normal. Matomo will delete archives that were processed for incomplete periods (i.e. when you archived a week in the middle of this week), but will not delete other archives. This means that you will have archives for every day, every week, every month and every year in the MySQL tables. This ensures a very fast UI response and data access, but does require disk space.
- Matomo archiving for today’s reports is not incremental: running the archiving several times per day will not lower the memory requirement for weeks, months or yearly archives. Matomo will read all logs for the full day to process a report for that day.
- Once a day/week/month/year is complete and has been processed, it will be cached and not re-processed by Matomo.
- If you don’t set up archiving to run automatically, archiving will occur when a user requests a Matomo report. This can be slow and provide a bad user experience (users would have to wait N seconds). This is why we recommend that you set up auto-archiving for medium to large websites (click for more information) as explained above.
-
By default, when you disable browser triggers for Matomo archiving, it does not completely disable the trigger of archiving as you might expect. Users browsing Matomo will still be able to trigger processing of archives in one particular case: when a Custom segment is used. To ensure that users of your Matomo will never trigger any data processing, in your config.ini.php file you must add the following setting below the
[General]
category:; disable browser trigger archiving for all requests (even those with a segment) browser_archiving_disabled_enforce = 1
Help for core:archive command
Here is the help output for this command:
$ ./console help core:archive
Usage:
core:archive [--url="..."] [--skip-idsites[="..."]] [--skip-all-segments] [--force-idsites[="..."]] [--skip-segments-today] [--force- periods[="..."]] [--force-date-last-n[="..."]] [--force-date-range[="..."]] [--force-idsegments="..."] [--concurrent-requests-per-website[="..."]] [--concurrent-archivers[="..."]] [--max-websites-to-process="..."] [--max-archives-to-process="..."] [--disable-scheduled-tasks] [--accept-invalid-ssl-certificate] [--php-cli-options[="..."]] [--force-all-websites] [--force-report[="..."]]
Options:
--url Forces the value of this option to be used as the URL to Matomo.
If your system does not support archiving with CLI processes, you may need to set this in order for the archiving HTTP requests to use the desired URLs.
--skip-idsites If specified, archiving will be skipped for these websites (in case these website ids would have been archived).
--skip-all-segments If specified, all segments will be skipped during archiving.
--force-idsites If specified, archiving will be processed only for these Sites Ids (comma separated)
--skip-segments-today If specified, segments will be only archived for yesterday, but not today. If the segment was created or changed recently, then it will still be archived for today and the setting will be ignored for this segment.
--force-periods If specified, archiving will be processed only for these Periods (comma separated eg. day,week,month,year,range)
--force-date-last-n Deprecated. Please use the "process_new_segments_from" INI configuration option instead.
--force-date-range If specified, archiving will be processed only for periods included in this date range. Format: YYYY-MM-DD,YYYY-MM-DD
--force-idsegments If specified, only these segments will be processed (if the segment should be applied to a site in the first place).
Specify stored segment IDs, not the segments themselves, eg, 1,2,3.
Note: if identical segments exist w/ different IDs, they will both be skipped, even if you only supply one ID.
--concurrent-requests-per-website When processing a website and its segments, number of requests to process in parallel (default: 3)
--concurrent-archivers The number of max archivers to run in parallel. Depending on how you start the archiver as a cronjob, you may need to double the amount of archivers allowed if the same process appears twice in the `ps ex` output. (default: false)
--max-websites-to-process Maximum number of websites to process during a single execution of the archiver. Can be used to limit the process lifetime e.g. to avoid increasing memory usage.
--max-archives-to-process Maximum number of archives to process during a single execution of the archiver. Can be used to limit the process lifetime e.g. to avoid increasing memory usage.
--disable-scheduled-tasks Skips executing Scheduled tasks (sending scheduled reports, db optimization, etc.).
--accept-invalid-ssl-certificate It is _NOT_ recommended to use this argument. Instead, you should use a valid SSL certificate!
It can be useful if you specified --url=https://... or if you are using Matomo with force_ssl=1
--php-cli-options Forwards the PHP configuration options to the PHP CLI command. For example "-d memory_limit=8G". Note: These options are only applied if the archiver actually uses CLI and not HTTP. (default: "")
--force-all-websites Force archiving all websites.
--force-report If specified, only processes invalidations for a specific report in a specific plugin. Value must be in the format of "MyPlugin.myReport".
--help (-h) Display this help message
--quiet (-q) Do not output any message
--verbose (-v|vv|vvv) Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
--version (-V) Display this application version
--ansi Force ANSI output
--no-ansi Disable ANSI output
--no-interaction (-n) Do not ask any interactive question
--matomo-domain Matomo URL (protocol and domain) eg. "http://matomo.example.org"
--xhprof Enable profiling with XHProf
Help:
* It is recommended to run the script without any option.
* This script should be executed every hour via crontab, or as a daemon.
* You can also run it via http:// by specifying the Super User &token_auth=XYZ as a parameter ('Web Cron'),
but it is recommended to run it via command line/CLI instead.
* If you have any suggestion about this script, please let the team know at [email protected]
* Enjoy!
Making sense of the core:archive
output
The core:archive
output log displays useful information about the archiver process, in particular which websites and segments are being processed. The output shows in particular:
- which website ID is currently being archived:
INFO [2020-03-31 21:16:29] 23146 Will pre-process for website id = 1, period = month, date = last3
. - how many segments there are for this website, in this example there are 25 segments:
INFO [2020-03-31 21:16:29] 23146 - pre-processing segment 1/25 countryName!=Algeria March 29, 2022
. - how many websites are left to be processed from this archiver’s queue of websites, in this example it has finished processing 2 out of 3 websites:
INFO [2020-03-31 21:17:07] 23146 Archived website id = 3, 4 API requests, Time elapsed: 18.622s [2/3 done]
. - if you’re running multiple
core:archive
processes using--concurrent-archivers
you can tell the different concurrent archivers from each other by looking at the number after the timestamp:INFO [2020-03-31 21:17:07] 23146 [...]
. Each different concurrent archiver run will have a different number. So if you grep for this number across your logs you can find the output for this particular core:archive thread. You can also set--concurrent-archivers
to-1
which indicated unlimited concurrent archiver.
If you have any question or feedback, please use the feedback button below and we’ll get do our best to back to you.