When body contains Windows-1251 move to database spamtrap.nsf
1. Chris LeRoy13/08/2004 18:12:25
Homepage: http://www.brainbent.com
I am not familiar with how you do your antispam, it sounds like you are using a domino based solution?
If so, this wouldn't be very difficult to do, I have a Lotusscript that could be easily modified for this task.
2. Chris Linfoot13/08/2004 20:31:57
Needs to work as a server mail rule. I believe rules are stored as simple @functions
3. Gretchen14/08/2004 12:17:14
Homepage: http://www.flick.com/~gretchen/
Hmmm. It will be ironic (if you use email notifications for your blog) if this message gets dumped into your spam filter. :)
I'm neither a Lotus Notes admin nor a real programmer, but is there a way to tell your filters to match across newlines or to otherwise insert regexps?
Matching on:
--<anything>
Content-Type: <anything>charset=<nasty charsets>
will give you an extremely high probability of hitting a MIME header without actually having to parse it out.
If you can't match across lines, then just the second line is still pretty indicative, especially if you can tell Lotus Notes 'make sure Content-Type is at the beginning of the line.' I.e.:
^Content-Type:.*charset=Windows-1251
or, for all the example nasty charsets, in extended regexp mode:
^Content-Type:.*charset=(\"?Windows-1251\"?|koi8-r)
The really strict way to do it would be to parse out the MIME headers, starting from content-type in the header to look for the boundary, and then searching the content-type field after each boundary for either the encoded string or another boundary field (in the case of multipart MIME messages.) I have a good example of a nested multipart Russian spam here (relevant message headers only, but message included in full up to the first chunk of payload):
From <yadda yadda>
MIME-Version: 1.0
Content-Type: multipart/related;
type="multipart/alternative";
boundary="----=_NextPart_000_0000_8B729DDD.714F0406"
This is a multi-part message in MIME format.
------=_NextPart_000_0000_8B729DDD.714F0406
Content-Type: multipart/alternative;
boundary="----=_NextPart_001_0001_C5DC58C2.B16F4A87"
------=_NextPart_001_0001_C5DC58C2.B16F4A87
Content-Type: text/plain; charset=Windows-1251
Content-Transfer-Encoding: 8bit
<snip>
In my case, if spamprobe thinks it's a decent match for spam, I check the content-type for being html or multipart and chuck it into a 'I haven't seen a false positive in here in ten thousand messages, so I don't have to get nitpicky about verifying this' folder. But I don't get a lot of valid html or multipart to begin with.
4. Nathan T. Freeman14/08/2004 20:05:02
I'm pretty sure you're out of luck on this one, Chris. You could make several rules to check for exact matches in each one, or set up an OR condition if you can for just one rule. But you can't hack up a formula for it, because there's no @functions for MIME decoding. The BODY contents are inaccessible to @function evaluation, except perhaps for length counts.
5. Chris Linfoot16/08/2004 09:31:24
Nathan, that's what I thought too but then another brain wave.
If Domino insists on protecting us from the raw MIME source of the message (and it does - the hack of adding rules like when body contains Windows-1251 only works when the MIME is slightly broken), then why not operate on the decoded text, not the MIME source?
I have created a rule that checks for the presence of what appears to be one fairly common cyrillic character in subject or body and moves matching messages to a trap and that seems to work.
Problem solved, I think.
6. Chris Linfoot16/08/2004 17:13:08
Oh and sorry Gretchen - I'm not ignoring you. Can't use regexp in Notes/Domino mail rules - that would solve an awful lot of problems if we could though...
7. Chris LeRoy16/08/2004 21:41:12
Homepage: http://www.brainbent.com
I wonder how bad the server overhead would be to write an agent that runs against mail as it is deposited to the mailbox that calls the java.util.regex package to perform regex lookups. Or, thinking back to the antispamagent that was in the sandbox a couple years back, which was essentially basic black and whitelisting... could it be improved with this?
Am I overthinking this? Underthinking it perhaps?
8. Gretchen17/08/2004 07:51:24
Homepage: http://www.flick.com/~gretchen/
Hmm, if you need to do something else like this, can you invoke helper apps or spawn external scripts that can interact with the message? That might be pretty vile on server overhead, though. It's a shame about the no built-in regexps, but it's good that you found out a definite identifier. I love the Cyrillic character trick!
9. 12/12/2004 21:51:51
10. erik13/05/2005 00:19:28
Can you set the filter with a set of words in English (or whatever language you DO want) that would be in almost any e-mail that would hit the filter first, example: "if
And OR The OR Is OR You OR Am in body then ---> send to inbox" (or not to "Russian Spam" folder? Does this make sense?
11. Chris Linfoot13/05/2005 08:26:26
@Erik - looks risky. The rule we settled on which traps at least 99.99% of Russian spam (I would say 100% but there may be one exception somewhere) is this:
When subject contains и OR body contains и [action]
Simple really.
12. 08/12/2005 06:45:16
13. Prabha08/12/2005 06:51:01
I uploaded a chinese file. The file has uploaded successfully. After uploading the file has to be modified with some contents. I opened the file in read mode. But the characters are not displayed properly instead they are displayed like ??????. I want to modify the files can anyone help me please
Thanks in priority
14. Chris Linfoot08/12/2005 08:49:26
You need to load Chinese character set support in your OS. Which character set you need I don't know. It may be Big5 or GB2312.
Unable to post a comment? Please read this for a possible explanation...