Hadoop Reader - Reading log file (JASON format)

Hello,

I have problem with reading log from from HDFS.I have made all successful connection with HDFS. But while reading from Hadoop reader i am getting an error that This not a sequential file .

My Log file is in jason format contains following data
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1113}
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1112}
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1114}
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1167}
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1116}
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1100}

How to process log file with hadoop reader. It would be great if someone give a sample graph processing same log file .

Thanks
Bhavin
Ypoint Analytics

Hi

The HadoopReader is designed to read sequence files, for more information please refer to it’s documentation. If you want to read a JSON file, please use the JSONReader. Set the File URL in the JSONReader like this ‘hdfs://HADOOP1/’, where HADOOP1 is the Hadoop Connection ID, which in general is the string in uppercase you entered as connection name (when you were creating it). Just to be sure, you can open the edit dialog of the HadoopReader and click the downward arrow on the Hadoop connection line, you should be able to see the connection ID there.

If the solution above doesn’t work, please send me the following information.

1. Is this a part of the CloverETL Server project? If so, what version is the CloverETL Server?
2. What version of the Designer are you using?
3. What version of Hadoop Server is used?
4. If you now get different error message, please send me a screenshot and/or the log, where the error is visible.

Thank you.

You may also use JSONExtract if you are processing large (>>100kB) files. JSONReader is using XPath queries for extracting data (need to build DOM representation in memory first). JSONExtract is using SAX style parsing (events) which allows stream-based parsing.
Which one to use will depend on your particular case. Both can be (actually all CloverETL readers) set to read data directly from HDFS (as described above).