- Remove a set of invalid characters from a gzip'ed data file (CSV)
- Read/parse the CSV data
- Add fields to the data stream using Reformat
- Write the data stream to a database table
I'm stuck at 1. It seems that the only ways to do this are a) employ a system command and use sed, perl, etc to create a fixed version of the file, then read the file normally, or b) read the CSV then iterate through all of the fields, removing the invalid characters from each.
I'm not wildly crazy about either. What I'd like to do is the following, but I don't see a combination of components and/or functions that will allow me to do it:
- Read the file s.t. each line is 1 record
- Remove invalid characters from each record with replace()
- Parse each record as a CSV into the necessary fields
- Do the balance of the transformation...
Perhaps I'm making this too hard and should just accept having to do the preprocessing using a shell command.
Thanks in advance,