Hi,
All our DW systems, including CloverETL Servers, are deployed on AWS EC2. We have many CloverETL jobflows that need to be able to send status emails to based on data available in the jobflows. To do so we are using the EmailSender component in the jobflow. It seems to work for the quickest and simplest jobs that don’t do much work, and hence, don’t take much time.
However, whenever one of these jobflows runs for a longer time or for any of our jobflows do substantial work we are always getting a failure from the EmailSender component. The error we’re getting is coming from AWS SES and always is:
-------------------------------------------------- Error details ----------------------------------------------------
Component [JobSuccessEmail:JOB_SUCCESS_EMAIL] finished with status ERROR. (In0: 1 recs, Out0: 0 recs, Out1: 0 recs)
Message couldn’t be sent
421 Timeout waiting for data from client.
---------------------------------------------------------------------------------------------------------------------
After working through this error with AWS Customer Support we have discovered that the cause of the error is that SES times out SMTP server connections. We empirically determined and AWS Customer Support have confirmed that in their us-east-1 region this timeout is 90 sec. This is hardly enough time to execute any jobflow that does substantial work. From AWS’s point of view, SES is a widely used SMTP-as-a-service across a very large number of AWS customers in the region and have to aggressively timeout SMTP connections to the service in order to recycle ports and prevent a Denial-of-Service from customers holding inactive connections open for long periods of time (i.e. > 90 sec).
From our empirical diagnosis it looks like when the CloverETL Server executes the jobflow, it is opening the connection to the SMTP server for the component at jobflow initialization time and then not actually using the connection to send to the SMTP Server until the EmailSender component executes in the jobflow, thereby leaving the SMTP connection open but idle while all the components in the jobflow from the initialization up to the EmailSender component do other work. Can you please confirm or deny this diagnosis? If not, can you please explain the SMTP connection lifetime during jobflow and dataflow graph execution?
It’s our belief that what we believe the STMP connection design (if confirmed) to be is incompatible with any SMTP service that is not tolerant of long held open but idle connections. Is there anyway in the EmailSender component or jobflow configuration to change the connection lifetime and/or management such that the connection is only created and held open when a component is actually actively using it.
I speculate that it may be an overall jobflow design goal to validate all dependencies of the jobflow during initialization and effectively “bind” the jobflow to those the dependent resource before starting executing the jobflow in order to reduce the probability of component failure later in the middle of jobflow execution. However, there has to be a design that could accomplish most if not all of the validation if not binding goals and, in this case, support a shorter SMTP connection lifetime to be more compatible with high workload services like AWS SES.
Assuming my diagnosis of the SMTP connection lifetime and jobflow design speculation are correct, I would suggest considering a connection lifetime management scheme where the connections are validated at initialization time with an short duration connection open/close, followed by a late component binding by reopening the connection when the component executes. An alternative workaround design might be for the component to support a retry mechanism, possibly controlled by a configurable component property to accommodate backward compatibility, where the component could reopen connections based on certain well known SMTP errors, like a connection timeout.
Thank!