Best practices for health checks against Clover Server / Execution History filters question

Greetings,

We recently had an interesting situation where the Tomcat server hosting our Clover server was up, but Clover itself had gone down. Our internal job error messaging protocols didn’t run because of course Clover wasn’t running, and our AWS health checks on the VM accessibility didn’t fire because Tomcat was running.

So, we need a health check to catch if Clover goes down when Tomcat is still running.

I created a trivial jobflow that can be called by HTTP on a frequent schedule to make sure Clover is responsive, however health checks need to run frequently and this is gumming up the Execution History, and our current version (4.4.1.7) doesn’t have a way to filter out (or in) a specific jobflow as far as I can tell.

So my questions are twofold:

  1. Is there a better way to institute healthchecks for the Clover server itself vs. the VM it runs on than what I’m doing.

  2. Do later versions of Clover server have more flexible ways of filtering the Execution History so that we can either ignore a particular jobflow / graph or limit the query to a particular jobflow/graph?

We are planning to upgrade when we can, but it has been challenging to get the timing going since we need to support our production instance while we are testing to make sure everything still works properly in our staging environment but we can’t have two versions of Designer on the same personal machine so I either need two separate machines or I will be unable to support production while testing the new version on staging, which is a problem.

Thanks in advance

Addendum to the question: if the security check jobflow is the best practice, is there a way we can make it not require a username & password when being executed through http? My devops guru says route 53 doesn’t support it so would need to put up a proxy to attach the auth header, but I’m wondering if there is another way?

If I’m not mistaken, you can disable write-in to execution history from both launch services and scheduled jobs (because of exactly this reason - frequently running jobs).

But in your case I would not bother and just check server’s HTTP API, more specifically cluster_status; if that one will get unresponsive, your server is down. See documentation for more info. Disregard its name, it works for single node too.

Come to think of it. You’ll still need to authenticate, although if you’d upgrade to 4.7 onwards; you will be able to create Data Service (successor of Launch Service) which can be openned for any call w/o authentication.

Hi,

You can check whether the CloverETL Server is up on the following URL: <http://:/clover/accessibilityTest>.jsp. This page does not require any form of authentication and can return 3 states: OK/200, 500 or 503. It is not a documented page as it is usually used only for our internal testing, however, it should help you out with your scenario.

Hope this helps,