Import mail attachment in DB based on mail subject filter

Hi all,

I’m not sure how to approach this exactly. And this post didn’t answer my question.

I want to:

  1. Import a CSV mail attachment into a database (this is easy)
  2. The mail should be filtered from the rest of the mail on a pattern in the subject

I thought this approach would work best:

  1. Use EmailReader to read out all the new mail in the mail account, make sure it doesn’t set the mail to “READ”
  2. Filter on the subject line and get the MessageID
  3. Store it somewhere for later use
  4. Re-use the EmailReader to get all the attachments, filter it on the MessageID you got in step 2, now set the email to “READ”
  5. Store the attachment locally
  6. Import it with the UniversalDataReader and DBTableOutput

Is this the best approach? Especially accessing the mail account twice gives me the feeling this could be done a whole lot easier

Any advice?

Thanks in advance!

Hi,

Provided approach can be shorten into one “phase”. Since EmailReader has two output ports you may use them at once – read subjects and emails at the same moment. Of course you need to enrich these records with MessageID using which you will join the streams. On the first output port (where the subjects are stored), you can apply Filter component using which you will get only the “right” records (based on a pattern/rule defined in the Filter component). Now you have two edges (one with filtered subjects and one with all attachment records) that you want to join. Since you want to join only those that have a matching record on both ports, you may use “INNER JOIN” join type on the MessageID attribute in ExtMergeJoin component in order to get this. The output from the ExtMergeJoin will contain only those records (with attachment path and other required information) that have been filtered by the “subject pattern/rule” filter. These records can be directly sent to a reader component, and further processed (their data transformed as required and inserted into database).

So again in short:

  • EmailReader with both output ports used.

  • Attach Filter component to the first output port in order to filter only the valid messages out.

  • Add ExtMergeJoin (with default INNER JOIN join type) and join these two streams using MessageId retrieved from EmailReader

  • Output port from ExtMergeJoin will contain only the data you required (attachment information filtered by the subject filter).

For more information about EmailReader, please refer to our documentation: http://doc.cloveretl.com/documentation/ … eader.html

Hope this helps.
Jan

Thanks. Using ExtMergeJoin was a great idea.

I did get this error though from CloverETL:

Data input 0 is not sorted in ascending order. Record #3: Key field=“MessageID”. Current=“MessageID:<CAOPFz2Sw7CYnCXTG4EG2cdjV1n=SR4+dcW52M6abY-45e2rcqg@mail.gmail.com>”; Previous=“MessageID:1686222707.184840.1348632498203.JavaMail.root@li202-140”.

It looks like it expets the MessageID’s to come in either descending or asceding. I have no control over this.

Any ideas?

Hi,

it’s simply because ExtMergeJoin works with sorted data only. In order to sort the incoming records use ExtSort (see our documentation for more information) component on both input edges.

Awesome! This works splendidly.

BTW I used FastSort instead of your tip. It seems to be the fastest Sorting Transformer you have.