SpamAndNonSpam

Top  Previous  Next

Example projects > Plain text parsers > SpamAndNonSpam

 

The previous project shall be extended to the project SpamAndNonSpam.ttp so, that also spam words are recognized. However, these should be only a few well-chosen words which suggest spam with a very high probability, because of the high priority of the IMP filter.

 

To make the project more clear, at first two additional rules (productions) are created. The first NonSpam covers the not spam words.

 

  "Spamihilator"

| "TextTransformer"

| "tetra"

 

The second additional rule may be called Spam and contains some spam words:

 

  "Viagra"

| "Casino" 

| "Rolex"

| "Watch"

| "Watches"

 

In the start rule of SpamAndNonSpam a loop is executed until the complete text is processed. The star '*' behind the bracket symbolizes the loop.

 

{{

int iResult = 0;

}}

(

    SKIP

  | NonSpam  {{ iResult = 1; }}

  | Spam     {{ if(iResult == 0) iResult = -1; }}

)*

{{

out << iResult;

}}  

 

 

The use of the loop is a little trick. The alternatives to SKIP can occur in front of it and follow on it. In the start rule SKIP either jumps to the next spam word or to the next non spam word or to the end of the text. If a word was found, a value is assigned to the result.

However, the mail is classified as spam only when it doesn't contain a not spam word. This is guaranteed by the instruction:

 

if(iResult == 0) iResult = -1;

 

I.e. the result is set to -1 only, when it is still indifferent. If a not spam word follows, is the result put to 1 and totally the mail is classified as not spam. So the IMP filter can be used furthermore with a high priority.