Reading xml file out of zip file issues

Greetings,
I have a zip file that contains two XML files that I’ll call the catalog file and the price file.
In my jobflow I move the zip file from a remote location to the DATAIN_DIR and then attempt to read from each of the XML files using XML Extract in a couple different places. According to this documentation link:

http://doc.cloveretl.com/documentation/ … aders.html

I should be able to access the inner contents of the xml and use wildcards, i.e.
zip:(/path/file??.zip)#innerfolder?/filename.*

In my case, the paths I am trying to resolve do have some wildcards in them. I will know the name of the zip file exactly because it will be passed in from a file event listener, and I will know the name of the inner folder because I can parse it from the zip file name, but won’t know the exact inner filenames at runtime - I know they are catalog*.xml and price*.xml. So I am trying to set the path as (for example)
zip:(${ZIP_FILE})#${INNER_FOLDER}/catalog*.xml
zip:(${ZIP_FILE})#${INNER_FOLDER}/price*.xml
(where the parameters are set properly)

Unfortunately, when I do this, even though it does not error, the XML Extract doesn’t actually find anything.
If I use the same underlying catalog*.xml file unzipped, it extracts properly. (Basically, I had a fully working jobflow with xml files and now am trying to retrofit it to work with the zip files, to no avail.)

To simplify the scenario I tried doing a simple hard-coding of the actual file name for testing purposes based on the suggested format, i.e.
zip:(${DATAIN_DIR}/catalog_5_123.zip)#catalog_5_123/catalog_5_abc.xml
(where the inner folder name is catalog_5_123 and the catalog xml file inside it is called catalog_5_abc.xml)
and it still is unable to extract any snippets, even though it does not fail the jobflow as it fails if I have an invalid zip file name.

I am open to unzipping the file first if I have to but my understanding is that this should work, so my first choice would be to get it working. If I have to unzip the file then I would need to be able to do that from my jobflow after FTP’ing the file and I don’t see a component to do this in the jobflow.

Do any of you have an idea how to get the XmlExtract component to pull from the inner files of the zip file or if not how I can unzip the files from the jobflow after FTPing them locally?

Thanks in advance!

Hi Anye,

I have designed a test graph for the given situation, but I am unable to recreate the same issue that you are facing. From what I can see, your URL definition seems correct and it should work exactly the way you’ve expected.

I would like to take a closer look at this issue and it would be very helpful if you could provide me with some more data:
1. Your graph with its externalised dependencies (metadata, parameters, etc.).
2. A sample of your data (e.g. the zipped folder).
3. Detailed information from your File Event Listener setup.
4. Log file from the last run of your graph (from Console in the Designer, or if you have already used the parameters passed from Event Listener, please provide me with a Log File which you can find on the Server: Executions History → select the appropriate job → Log File tab on the right side of the screen.)
5. A version of your CloverETL.

Please, feel free to remove any sensitive information or if you are not comfortable with posting your data on Forum, please send me the data to an email address support@cloveretl.com.

Thanks Eva

Thanks, I had gone forward with other work on these jobs and graphs so once I get the branch with the attempted zip file processing merged in with the latest and have all the latest logs. I’ll send you all the requested info. I don’t have a problem with posting here, it’s not super sensitive, but there are a lot of pieces involved so probably easier to email you a zip file.

Hi Anye,

I have reviewed the job flow and the provided data and I am a little bit confused. You might not be able to process the current file because the zip file doesn’t include a nested folder named catalog_5_20170330_123 (there are the XML files right away in the zip folder). Therefore you don’t need to use the “inner_folder” parameter at all and the URL syntax would look as follows:

zip:(${DATAIN_DIR}/catalog_5_20170330_123.zip)#catalog_5_20170330_123.xml

(The name of the file is right after the hashtag.)
Please let me know if this was not the original issue and I will take a closer look again.

However, in order to be able to use information about file path and zip file name from the Event file listener then, you might want to use the following syntax:

zip:(${EVENT_FILE_PATH}${EVENT_FILE_NAME})#catalog*.*

The EVENT_FILE_PATH is the path to the folder in question including slash sign at the end, therefore you can add EVENT_FILE_NAME right after that to complete the path to the exact file that has appeared in the folder. Please note that there is no need to set up any parameters in the Designer or in the File Event Listener. In a graph or job flow, you can use parameters above which are not explicitly defined but just passed from the Server.

Please let me know if I have misunderstood the situation or if there is anything else I can help you with.

Thanks Eva