When encountering a problem always make sure:
- The extension is of the latest version and the Kolab packages are up-to-date.
- The problem is reproducible on the same system: If it’s not reproducible at all, there is nothing that can be done. We need clear steps to reproduce the issue to be able to investigate.
- The problem can be reproduced on a second system: Unless a problem is reproducible on a second system it is not a software defect, but a configuration issue.
Once a problem has been reproduced on a second system with up to date extension and packages, it can be fixed.
Determining the faulty component
To isolate which component has a defect you can work your way from the client (where you see the symptoms) inwards.
For an issue with an IMAP client you could first have guam, and then dovecot. You could for instance bypass guam by connecting to dovecot directly, thus making sure that a specific problem indeed only exists with guam in the middle.
For an activesync/iRony issue there is NGINX -> Apache -> Syncroton/iRony. First make sure that activesync is reachable on the outside before investigating if there is a problem in syncroton.
Common scenarios
How to recognize responsible component(s) via symptoms.
- If Outlook cannot connect to the email, sync messages, or number of messages in the Outlook inbox is different from that in Roundcube, then the most likely culprit component is Synchroton (if connected via ActiveSync). Check this chain of services: NGINX -> Apache -> PHP -> Syncroton -> IMAP.
- Webmail is not reachable: NGINX -> Apache -> PHP -> IMAP
- Premium email is enabled but you end up on regular roundcube: Check for /etc/roundcubemail/$domain/freemium.inc.php (should be absent).
Capturing debug output
If a problem can be reproduced, debug output can help figuring out what is going wrong exactly.
Capture debug output of all components that are involved with a problem.
Debug output should always be captured for the duration of one reproduction of the issue only:
- Prepare everything to reproduce the problem
- Clear existing logfiles
- Record the timestamp when you reproduce the problem
- Reproduce the problem
- Immediately capture the logfiles (so we don’t end up with unnecessary extra information).
Instructions on how to enable debug output can be found here: https://kb.kolabenterprise.com/documentation/capturing-debug-logs
The following log-files/directories are relevant:
- /var/log/maillog (only relevant when troubleshooting IMAP/SMTP)
- /var/log/httpd /var/log/apache2
- /var/log/nginx
- /var/log/roundcubemail (only relevant when troubleshooting webmail)
- /var/log/kolab-syncroton (only relevant when troubleshooting activesync)
- /var/log/guam (only relevant when troubleshooting imap)
- /var/log/kolab-autoconf (only relevant when troubleshooting autoconf)
- /var/log/iRony (only relevant when troubleshooting DAV)
- /var/log/plesk/panel.log (only relevant when troubleshooting the plesk extension)
- /var/log/plesk-php$VERSION-fpm
To generate a tarball use a command like this (include the relevant directories according to what you’re troubleshooting):
tar -czf logfiles.tar.gz /var/log/plesk-php82-fpm/ /var/log/kolab-syncroton/
This ensure all files are bundled together in a single tarball, and retain the path which is important to e.g. distinguish multiple “error.log” files.
Use the following template to report the logfiles (every time with new log files):
These logfiles were collected while reproducing issue XXX. The following steps were executed to reproduce the issue: * Connected outlook for user doe@example.com (deviceid XXXX) * Triggered synchronization for folder INBOX on 12.04.2024 11:33:11 (including seconds, a lot can happen in a minute) * Waited for synchronization to complete Expected outcome, what happened: * Expected read flag to be changed * The read flag was not changed.
It is crucial that logfiles are always collected in the same manner, it is clear during which timeframe, and that it is clear what the logfiles are supposed to document (actions that were executed during the timeframe, including timestamp, and expectations that were not met (aka. the issue)).
Common issues & Troubleshooting techniques
- When starting the investigation, always check that extension is of the latest version and Kolab packages are up-to-date.
- If a component that has an issue and runs in apache look out for php / fcgid errors. Memory exhaustion or request timeouts can leave very few traces otherwise.
- Check if activesync is in principle available:
curl -i -u 'user@our.domain.tld:pass' -X OPTIONS https://our.domain.tld/Microsoft-Server-ActiveSync
- Check if SNI is working:
echo 'Q' | openssl s_client -connect localhost:993 -servername domain.tld -showcerts 2>&1 | grep -Eo 'CN=[^/]+' | uniq
- To deal with roundcube cache issues, such as disappearing calendar events or not updating address books:
/usr/share/roundcubemail/plugins/libkolab/bin/modcache.sh clear -u user@domain.tld<mailto:user@domain.tld> -h imap.host.name
- If you experience problems with the extension UI in Plesk, always check Plesk’s panel.log. Enable debug<https://support.plesk.com/hc/en-us/articles/213408889-How-to-enable-disable-Plesk-debug-mode>, if necessary.
- You can check for missing php-modules using:
/opt/plesk/php/7.3/bin/php -m
. The following modules should be available: kolabcalendaring, kolabformat, kolabicalendar, kolabobject, kolabshared. - If no logfiles are generated (e.g. by roundcube), ensure the log-directories have been created with sufficient permissions for writing and se-linux (see
/var/log/audit/audit.log
) is not interfering. If in doubt temporarily disable selinux and give world-writable permissions to the log-directory. - When a user has a “.” in the name, then sharing breaks. This is because we use “.” as namespace separator in dovecot (as required by the spam filter).
Dovecot
- Folders that have *only* the list right, but no read right break GETMETADATA commands that include the folder, resulting in an unexpected NO result to the command (which in turn breaks the roundcube folder list). Ensure that folders either have no list right, or always combine list with read rights. See also https://bifrost.kolabsystems.com/T802351
doveadm can be used to inspect the imap store:
list access rights:
doveadm acl get -u user@example.org Calendar
debug access rights to a folder:
doveadm acl debug -u user@example.org Calendar
list folders:
doveadm mailbox list -u user@example.org
set a folder-type annotation:
doveadm mailbox metadata get -u $user $mailbox /shared/vendor/kolab/folder-type
set a folder-type annotation:
doveadm mailbox metadata set -u $user $mailbox /shared/vendor/kolab/folder-type contact
Guam
- A “crash” in guam is not necessarily problematic. If e.g. an ssl handshake fails you will get a crash report, because the language this module is written in defaults to “crashing” as part of the normal error handling.
- Guam’s debug logging logs the complete imap traffic which will overwhelm a production system in use. Do not enable unless you need the output, and disable afterwards.
- If the guam service does not start you can attempt to start the process directly using
/usr/sbin/guam foreground
which may give you additional debug output
SNI
If SNI is enabled guam needs special configuration for SNI to work (because guam needs to terminate ssl). The guam configuration is updated on installation, and a troubleshooting item with a “Fix” button will appear if the configuration is out of sync.
You should have a file with a name like /etc/dovecot/conf.d/14-plesk-sni-$domain.conf
(from the SNI config for the domain). In that file we have a line starting with “ssl_cert =
” with a path pointing to the certificate.
This path should then be in the guam config at /etc/guam/sys.config
, in a line like
{ sni_hosts, [{ "$domain", [{ certfile, "$pathToCertificate" }]}]},
To verify if SNI is working use
echo 'Q' | openssl s_client -connect localhost:993 -servername domain.tld -showcerts 2>&1 | grep -Eo 'CN=[^/]+' | uniq
and compare to the output you get when connecting to port 9993 (dovecot).
Webserver
For issues with web server configuration (e.g., when some service like Mattermost is not available, but also Roundcube/ActiveSync/DAV issues), follow these steps:
- Run plesk repair web for the affected domain
- Check the custom configuration template in /usr/local/psa/admin/conf/templates/custom/webmail/roundcube.php . Make sure it exists, readable, and contains PHP code.
- Compare the web server configuration with the configuration of a similar test server on which everything works.
Process limit
Plesk has a low process limit configured by default (FcgidMaxProcesses). Because ActiveSync connections (e.g. Outlook) have persistent connections that each take up a process slot, it is possible that multiple clients exceed the process limit, resulting in the webserver failing to process any further requests. This results in mod_fcgid: can't apply process slot for /var/www/cgi-bin/cgi_wrapper/cgi_wrapper
errors in /var/log/httpd/error_log
.
To fix increase the following values in /etc/httpd/conf.d/fcgid.conf:
FcgidMaxProcessesPerClass 50
FcgidMaxProcesses 150
Debug output
Debug output can be enabled in /etc/httpd/conf/httpd.conf
by setting:
LogLevel debug
Debug output will appear in /var/log/httpd/error_log
Sieve
Sieve functionality is provided by pigeonhole.
Verify the sieve script is enabled:
$ doveadm sieve list -u admin@kolab-customer.maipo.wht.pxts.ch
roundcube ACTIVE
Activate sieve debug logging in /etc/dovecot/conf.d/90-plesk-sieve.conf:
plugin {
...
# You have to create the log directory first
sieve_trace_dir = /var/log/sieve
sieve_trace_level = tests
}
troubleshoot.php
The kolab extension comes with a troubleshooting script to detect various common issues. It can be executed like so to generate a report:
plesk bin extension -e kolab troubleshoot.php > report.txt
analyzelogs.sh
The following script can be used to grep through logfiles to surface various common issues. Please note that the paths are currently geared toward centos7, and the script is WIP. Suggestions are welcome.
!/usr/bin/env bash #This is a script to grep through logs to surface common problems quickly. #If a message matches this "might" point to an error, but will have to be judged on a case-by-case basis. echo "==> /var/log/kolab-syncroton/errors.log" grep -E "(PHP Fatal error|PHP Error|SMTP Error)" /var/log/kolab-syncroton/errors.log echo echo "==> /var/log/maillog" grep -E "(milter-reject)" /var/log/maillog echo echo "==> /var/log/roundcubemail/errors.log" grep -E "(PHP Fatal error|PHP Error|SMTP Error|IMAP Error)" /var/log/roundcubemail/errors.log echo echo "==> /var/log/httpd/error_log" grep -E "(PHP Warning|mod_fcgid: read data timeout|End of script output|mod_fcgid: can't apply process slot)" /var/log/apache2/error.log /var/log/httpd/error_log /var/log/httpd/ssl_error_log echo echo "==> /var/log/guam/" grep -r -E "(Fatal error)" /var/log/guam/ echo