The code for Sieve filtering in Exim was contributed by Michael Haardt, and most of the content of this chapter is taken from the notes he provided. Since Sieve is a extensible language, it is important to understand Sieve in this context as the specific implementation of Sieve for Exim.
This chapter does not contain a description of Sieve, since that can be found in RFC 3028, which should be read in conjunction with these notes.
The Exim Sieve implementation offers the core as defined by RFC 3028, the envelope and the fileinto extensions, but not the reject extension. Exim does not support message delivery notifications (MDNs), so adding it just to the Sieve filter (as required for reject) makes little sense.
In order for Sieve to work properly in Exim, the system administrator needs to make some adjustments to the Exim configuration. These are described in the chapter on the redirect router in the full Exim specification.
A filter file is interpreted as a Sieve filter if its first line is
# Sieve filter
This is what distinguishes it from a conventional .forward file or an Exim filter file.
If the system administrator has set things up as suggested in the Exim specification, and you use keep or fileinto to save a mail into a folder, absolute files are stored where specified, relative files are stored relative to $home, and inbox goes to the standard mailbox location.
RFC 3028 does not specify what happens if a string denoting a header field does not contain a valid header name, for example, it contains a colon. This implementation generates an error instead of ignoring the header field in order to ease script debugging, which fits in the common picture of Sieve.
The exists test succeeds only if all specified headers exist. RFC 3028 does not explicitly specify what happens on an empty list of headers. This implementation evaluates that condition as true, interpreting the RFC in a strict sense.
Some MUAs process invalid base64 encoded data, generating junk. Others ignore junk after seeing an equal sign in base64 encoded data. RFC 2047 does not specify how to react in this case, other than stating that a client must not forbid to process a message for that reason. RFC 2045 specifies that invalid data should be ignored (apparently looking at end of line characters). It also specifies that invalid data may lead to rejecting messages containing them (and there it appears to talk about true encoding violations), which is a clear contradiction to ignoring them.
RFC 3028 does not specify how to process incorrect MIME words. This implementation treats them literally, as it does if the word is correct but its character set cannot be converted to UTF-8.
A header may contain multiple addresses. RFC 3028 does not explicitly specify how to deal with them, but since the address test checks if anything matches anything else, matching one address suffices to satisfy the condition. That makes it impossible to test if a header contains a certain set of addresses and no more, but it is more logical than letting the test fail if the header contains an additional address besides the one the test checks for.
The keep command is equivalent to
fileinto "inbox";
It saves the message and resets the implicit keep flag. It does not set the implicit keep flag; there is no command to set it once it has been reset.
RFC 3028 does not specify whether fileinto should try to create a mail folder if it does not exist. This implementation allows the sysadmin to configure that aspect using the appendfile transport options create_directory, create_file, and file_must_exist. See the appendfile transport in the Exim specification for details.
Sieve scripts are supposed to be interoperable between servers, so this implementation does not allow mail to be redirected to unqualified addresses, because the domain would depend on the system being used. On systems with virtual mail domains, the default domain is probably not what the user expects it to be.
There has been confusion if the string arguments to require are to be matched case-sensitively or not. This implementation matches them with the match type :is (default, see section 2.7.1) and the comparator i;ascii-casemap (default, see section 2.7.3). The RFC defines the command defaults clearly, so any different implementations violate RFC 3028. The same is valid for comparator names, also specified as strings.
There is a mistake in RFC 3028: the suffix G denotes gibi-, not tebibyte. The mistake is obvious, because RFC 3028 specifies G to denote 2^30 (which is gibi, not tebi), and that is what this implementation uses as scaling factor for the suffix G.
Exim requires the first line of a Sieve filter to be
# Sieve filter
Of course the RFC does not specify that line. Do not expect examples to work without adding it, though.
RFC 3028 requires the use of CRLF to terminate a line. The rationale was that CRLF is universally used in network protocols to mark the end of the line. This implementation does not embed Sieve in a network protocol, but uses Sieve scripts as part of the Exim MTA. Since all parts of Exim use LF as newline character, this implementation does, too, by default, though the system administrator may choose (at Exim compile time) to use CRLF instead.
Exim violates RFC 2822, section 3.6.8, by accepting 8-bit header names, so this implementation repeats this violation to stay consistent with Exim. This is in preparation to UTF-8 data.
Sieve scripts cannot contain NUL characters in strings, but mail headers could contain MIME encoded NUL characters, which could never be matched by Sieve scripts using exact comparisons. For that reason, this implementation extends the Sieve quoted string syntax with \0 to describe a NUL character, violating \0 being the same as 0 in RFC 3028. Even without using \0, the following tests are all true in this implementation. Implementations that use C-style strings will only evaluate the first test as true.
Subject: =?iso-8859-1?q?abc=00def header :contains "Subject" ["abc"] header :contains "Subject" ["def"] header :matches "Subject" ["abc?def"]
Note that by considering Sieve to be a MUA, RFC 2047 can be interpreted in a way that NUL characters truncating strings is allowed for Sieve implementations, although not recommended. It is further allowed to use encoded NUL characters in headers, but that's not recommended either. The above example shows why.
RFC 3028 states that if an implementation fails to convert a character set to UTF-8, two strings cannot be equal if one contains octets greater than 127. Assuming that all unknown character sets are one-byte character sets with the lower 128 octets being US-ASCII is not sound, so this implementation violates RFC 3028 and treats such MIME words literally. That way at least something could be matched.
The folder specified by fileinto must not contain the character sequence .. to avoid security problems. RFC 3028 does not specify the syntax of folders apart from keep being equivalent to
fileinto "INBOX";
This implementation uses inbox instead.
Sieve script errors currently cause messages to be silently filed into inbox. RFC 3028 requires that the user is notified of that condition. This may be implemented in future by adding a header line to mails that are filed into inbox due to an error in the filter.