Hi - we’re using a a system execute command to execute a python script (linux server) with parameters passed to it- synchronously 10 at a time. This works ok for a while but eventually locks up and gets a connection refused error. We’re thinking it’s that the graph that calls the script is waiting for a response from python that it’s finished and just stays open. The script calls an API to fetch some data and is done in less than 2 seconds typically.
Can anyone tell me how the sys execute command works in a linux environment? Does it run the script from the clover sandbox directory or does it copy it and run it somewhere else. It’s running on a weblogic server. We’re pretty certain that the issue is with clover or weblogic thinking there are too many open connections and not accepting any more because when we restart the clover etl serer instance it works again.
if anyone has any thoughts or configuration settings to check please let me know.
thanks
Hi pintail,
I’ve seen that you have started two very similar threads. Therefore, I will use only this one to, hopefully help you with this issue.
The script is indeed saved into a file in a temp folder. The exact path can be seen in run log of the graph when you switch the logging level to DEBUG. There might be other useful informations there as well. This can be done in the CloverETL Server console → Sandboxes → <sandbox_name> → Config properties.
When the connection is made from a Python script then it is not using any Java related connection, therefore, I cannot see how it can be affected by CloverETL Server or WebLogic. I’d suggest to use strace or similar tool to debug the python process.
I’m also not sure whether I understand correctly your current setup. Is it a SystemExute running the python script on a CloverETL Server while the script is fetching data from some remote webservice running on a non-CloverETL Server?
Best regards,
Ok, that’s what I thought about the file getting written out and executed separately. thanks.
It’s a little odd what’s happening. Essentially a high level here is the process…
-
Jobflow is executed which reads a file sending 500 about input parameters. The jobflow calls a single graph for each new parameter to get passed to the the system execute component, which executes a py script.
-
When we run this job - every hour so often - it will work fine for a while. Eventually we’ll get the error which says “connection refused”. This error is generated in the py script.
-
we’ve tried running synchronous and asynchronous and it doesn’t make a difference.
-
Architecture is Linux OS, Clover ETL Server, Weblogic.
-
If we restart the clover service this fixes all issues, so it seems like it’s stuck threads to me as it’s consistent with the error message of connection refused.
-
When I look in the weblogic logs, clover logs and several linux commands to look for number of used threads/available threads, stuck files, etc nothing really jumps out. There look to be more than enough open threads, available files, etc. That said, it’s definitely a networking issues and a restart of the clover service fixes it (clover and weblogic I should say).
So, for now we’ve stabilized it by just writing a cron job to restart the service every night. Sort of a sledgehammer approach but it is working for us. it seems like to me that the python scripts/clover/weblogic combination is not shutting down properly potentially on an error and that’s getting caught up somewhere (but I don’t see it in the logs with any of the architecture components). That’s my best guess. We gave up hunting it down when the brute force restart fixed the issue and things stabilized. it would be good to know if you have any other thoughts though.
thanks!
Hi pintail,
I’ve got two ideas on how to avoid this behavior.
1. Set the timeout property on the SystemExecute component. The command is executed as a separate process and if it keeps waiting for a response for example, this should kill the process after the selected time.
2. Try using Execute Script instead of SystemExecute. This is a jobflow component and is designed specifically for running scripts as the name implies. It is also much more configurable via its input port.
Please let me about your findings.
Best regards,