ListFiles component not listing directory contents on AWS S3

I trying to use the ListFiles component to list the files in a sub-directory of an AWS S3 bucket. Regardless of the URL that I create to the sub-directory in the ListFiles “FileURL” property using the supplied editor all I can seem to get is one record output of the sub-directory it self. The bucket sub-directory contains modest number (<100) text files named like <file_name>.txt.

The forms of the “FileURL” propery that I’ve tried are:
http://${access_key}:${secret_key}@/bucket_name/sub1/sub2/sub3/sub4/sub5/sub6/
http://${access_key}:${secret_key}@/bucket_name/sub1/sub2/sub3/sub4/sub5/sub6
http://${access_key}:${secret_key}@/bucket_name/sub1/sub2/sub3/sub4/sub5/sub6/*
http://${access_key}:${secret_key}@/bucket_name/sub1/sub2/sub3/sub4/sub5/sub6/*.txt

For all four of the above URL the ListFiles component execution succeeds without error, but each only returns one output. Using the ListFiles component on the local filesystem(s) works just fine.

Hi,

I would recommend that you change the http:// to s3:// that should take care of you issue. Also, you can take a look at the supported CloverETL URL formats here.

When I change the protocol from HTTP:// to S3:// I get the following error (note actual keys and path changed for security purposes)
------------------------------------------------------------------------------------------ Error details --------------------------------------------------------------------------------------------
Component [ ListFiles in S3:LIST_FILES_IN_S3] finished with status ERROR. (Out0: 0 recs)
Failed to list s3://${access_key}:${secret_key}@/bucket_name/sub1/sub2/sub3/sub4/sub5/sub6/
Directory listing failed
Connection failed
unknown protocol: s3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

I should probably mention that I’m on CloverETL Designer and Server v. 3.5.1 and can’t upgrade either for the time being.

Hi,

Unfortunately, it seems that version of CloverETL wildcards are not supported in ListFiles component. Alternatively, depending on what you want to do with the files you can use a UniversalDataReader with this URL: http://${ACCESS_KEY}:${SECRET_KEY}@cloveretl.engine.test.s3.amazonaws.com/t7285457/zip/*.zip

Also note that in this case, the wildcard will also match files in the nested subdirectories.

Can you confirm or deny that this works in the current CloverETL version? That would help motivate us to upgrade soon, as we’ve putting it off for quite a while.

In the meantime, I’ve coded a workaround using the SystemExecute component to execute “s3cmd ls …” in a Bash shell. It’s a pain to parse the output, handle the shell return codes and it’s slow, but it works for now.

Hi,

I would highly recommend you upgrade. I can safely say that you will be able to use the AWS S3 in our ListFiles component. If you are curious to see which other supported URL Formats are supported in the latest version, you can take a look here.