Provided approach can be shorten into one “phase”. Since EmailReader has two output ports you may use them at once – read subjects and emails at the same moment. Of course you need to enrich these records with MessageID using which you will join the streams. On the first output port (where the subjects are stored), you can apply Filter component using which you will get only the “right” records (based on a pattern/rule defined in the Filter component). Now you have two edges (one with filtered subjects and one with all attachment records) that you want to join. Since you want to join only those that have a matching record on both ports, you may use “INNER JOIN” join type on the MessageID attribute in ExtMergeJoin component in order to get this. The output from the ExtMergeJoin will contain only those records (with attachment path and other required information) that have been filtered by the “subject pattern/rule” filter. These records can be directly sent to a reader component, and further processed (their data transformed as required and inserted into database).
So again in short:
EmailReader with both output ports used.
Attach Filter component to the first output port in order to filter only the valid messages out.
Add ExtMergeJoin (with default INNER JOIN join type) and join these two streams using MessageId retrieved from EmailReader
Output port from ExtMergeJoin will contain only the data you required (attachment information filtered by the subject filter).
it’s simply because ExtMergeJoin works with sorted data only. In order to sort the incoming records use ExtSort (see our documentation for more information) component on both input edges.