CloverETL EmailSender component incompatible with AWS Simple Email Service (SES)

Hi,

All our DW systems, including CloverETL Servers, are deployed on AWS EC2. We have many CloverETL jobflows that need to be able to send status emails to based on data available in the jobflows. To do so we are using the EmailSender component in the jobflow. It seems to work for the quickest and simplest jobs that don’t do much work, and hence, don’t take much time.

However, whenever one of these jobflows runs for a longer time or for any of our jobflows do substantial work we are always getting a failure from the EmailSender component. The error we’re getting is coming from AWS SES and always is:

-------------------------------------------------- Error details ----------------------------------------------------
Component [JobSuccessEmail:JOB_SUCCESS_EMAIL] finished with status ERROR. (In0: 1 recs, Out0: 0 recs, Out1: 0 recs)
Message couldn’t be sent
421 Timeout waiting for data from client.
---------------------------------------------------------------------------------------------------------------------

After working through this error with AWS Customer Support we have discovered that the cause of the error is that SES times out SMTP server connections. We empirically determined and AWS Customer Support have confirmed that in their us-east-1 region this timeout is 90 sec. This is hardly enough time to execute any jobflow that does substantial work. From AWS’s point of view, SES is a widely used SMTP-as-a-service across a very large number of AWS customers in the region and have to aggressively timeout SMTP connections to the service in order to recycle ports and prevent a Denial-of-Service from customers holding inactive connections open for long periods of time (i.e. > 90 sec).

From our empirical diagnosis it looks like when the CloverETL Server executes the jobflow, it is opening the connection to the SMTP server for the component at jobflow initialization time and then not actually using the connection to send to the SMTP Server until the EmailSender component executes in the jobflow, thereby leaving the SMTP connection open but idle while all the components in the jobflow from the initialization up to the EmailSender component do other work. Can you please confirm or deny this diagnosis? If not, can you please explain the SMTP connection lifetime during jobflow and dataflow graph execution?

It’s our belief that what we believe the STMP connection design (if confirmed) to be is incompatible with any SMTP service that is not tolerant of long held open but idle connections. Is there anyway in the EmailSender component or jobflow configuration to change the connection lifetime and/or management such that the connection is only created and held open when a component is actually actively using it.

I speculate that it may be an overall jobflow design goal to validate all dependencies of the jobflow during initialization and effectively “bind” the jobflow to those the dependent resource before starting executing the jobflow in order to reduce the probability of component failure later in the middle of jobflow execution. However, there has to be a design that could accomplish most if not all of the validation if not binding goals and, in this case, support a shorter SMTP connection lifetime to be more compatible with high workload services like AWS SES.

Assuming my diagnosis of the SMTP connection lifetime and jobflow design speculation are correct, I would suggest considering a connection lifetime management scheme where the connections are validated at initialization time with an short duration connection open/close, followed by a late component binding by reopening the connection when the component executes. An alternative workaround design might be for the component to support a retry mechanism, possibly controlled by a configurable component property to accommodate backward compatibility, where the component could reopen connections based on certain well known SMTP errors, like a connection timeout.

Thank!

Hi,

According to our developers, there is no SMTP connection validation on the start of a job. However, the issue you are describing may be caused by the fact, that the connection for EmailSender is created at the beginning of the phase in which is this EmailSender located. So if the whole job has only one phase, or if the one where the EmailSender is located takes long enough (90 seconds) before the email is sent, the connection will fail (timeout). Please, try to put the EmailSender into its own phase and the timeout issue should disappear. For more information about phases see our documentation, please.

Best regards,

Hi Lukas,

Yes, the jobflow only has one phase, phase 0. We’ll split the EmailSender component into 2 phases with the EmailSender in the final phase at the end of the jobflow. I’ll report the results back here.

Thanks for the suggestion.

Hi Lukas,

After making the phase modification that you suggested the connections to the AWS SES SMTP server no longer regularly times out when the phase (not the whole jobflow) the EmailSender component is in does less work that takes elapsed time since the phase initialization. As long as the EmailSender is at or close to the beginning of a phase, especially before any components in the phase that may take measurable time (in the SES case >= 90 sec) the EmailSender no longer is prone to SMTP connection timouts.

This issue is resolved for us.

Many thanks!

Hi Lukas. I am having a similar issue, but I am using the email sender component in a subgraph called by a graph that is then called by a job flow. I made the modifications suggested in the subgraph, but I still get the error on the fourth file. Three files process successfully then the fourth errors out with the error in the smtp timeout error. I am attaching my log file. Please let me know if you have any suggestions to resolve.

Thanks,
Heatherrun_log_4467.zip

Hi Heather,

Could you please share the whole execution history? I would like to see run logs for all the graphs and subgraphs in the cascade too, if possible.

Also, is there any limit for open SMTP connections on the AWS side?

Thanks.

Hi Lubos. Thank you for looking into this. I have checked with our AWS Admin, and he confirmed there is no limit for open SMTP connections on the AWS side. I am also attaching the requested execution history log files that go with the original file I had attachedall_run_logs.zip.

Thanks,
Heather

Heather,

Please try omitting the first three successful records and start with the one causing this error. Is the job cascade failing even in this configuration?

And could you please also send me the jobflow, graph and subgraph files? Feel free to leave out any sensitive information within them.

Thanks again.