Hadoop Reader - Reading log file (JASON format)

bhavinultimate · November 22, 2015, 12:00am

Hello,

I have problem with reading log from from HDFS.I have made all successful connection with HDFS. But while reading from Hadoop reader i am getting an error that This not a sequential file .

My Log file is in jason format contains following data
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1113}
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1112}
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1114}
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1167}
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1116}
{“timestamp”:“1445938229176”,“total”:“340.0”,“shop_id”:“6a7619eb30e81be”,“items”:1100}

How to process log file with hadoop reader. It would be great if someone give a sample graph processing same log file .

Thanks
Bhavin
Ypoint Analytics

Lukas_Cholasta · November 23, 2015, 2:55pm

Hi

The HadoopReader is designed to read sequence files, for more information please refer to it’s documentation. If you want to read a JSON file, please use the JSONReader. Set the File URL in the JSONReader like this ‘hdfs://HADOOP1/’, where HADOOP1 is the Hadoop Connection ID, which in general is the string in uppercase you entered as connection name (when you were creating it). Just to be sure, you can open the edit dialog of the HadoopReader and click the downward arrow on the Hadoop connection line, you should be able to see the connection ID there.

If the solution above doesn’t work, please send me the following information.

1. Is this a part of the CloverETL Server project? If so, what version is the CloverETL Server?
2. What version of the Designer are you using?
3. What version of Hadoop Server is used?
4. If you now get different error message, please send me a screenshot and/or the log, where the error is visible.

Thank you.

dpavlis · November 23, 2015, 3:51pm

You may also use JSONExtract if you are processing large (>>100kB) files. JSONReader is using XPath queries for extracting data (need to build DOM representation in memory first). JSONExtract is using SAX style parsing (events) which allows stream-based parsing.
Which one to use will depend on your particular case. Both can be (actually all CloverETL readers) set to read data directly from HDFS (as described above).

Topic		Replies	Views
Reading From log file CloverDX Platform	1	1	March 10, 2015
How to read documents separation of binary data CloverDX Platform	9	1	December 24, 2007
Problem reading XML file from remote location CloverDX Platform	2	0	September 18, 2009
Import files sequentially CloverDX Platform	1	1	August 7, 2009
Parsing CSV based log files, using a date as delimiter? CloverDX Platform	1	0	June 27, 2014

Hadoop Reader - Reading log file (JASON format)

Related topics