JSONReader vs. JSONExtract mappings

Hello,

I just installed the CloverETL Designer (Version 4.0.0.030M2) demo and am testing these components with a file containing numerous of JSON objects.

JSONReader
I’ve configured a JSONReader component for Implicit Mapping and run into 2 problems:

  1. Even after setting my heap size to -Xmx1500m, I get a Java heap error if I try to read a file of 50,000 objects (25,000 objects works OK). Is there a way to get past this heap issue?
  2. In cases where the Element value is an array - only the first data value is returned to the output record. How do I configure this the return all the array values?

JSONExtract
I’ve configured the Mapping as:
<![CDATA[



/Mapping>


]]>

  1. No heap problems
  2. Each input row produces multiple output rows, one row/each value. (e.g. {“group_id”:“1”,“city”:“Paris”,“city_codes”:[“FR”,“MZ”]} produces 4 output rows, the first with just the group_id populated, the second with only the city populated, etc.). Can I get these to appear in a single row w/o using additional components (i.e Combine, etc.)?

Thanks!

Hello,

Let me answer step by step:

JSONReader

  1. This is unfortunately expected. JSONReader uses DOM tree to parse the input and this tree can grow quite fast in case of 50.000 records and any complex structure in the data. It is therefore recommended to use JSONExtract anywhere it is possible.
  2. All array values can be returned for example this way (you need two output ports):
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
  
<Context xpath="/root/object" outPort="0">

  <Mapping cloverField="group_id" xpath="group_x005fid"/>
  <Mapping cloverField="city" xpath="city"/>

    <Context xpath="city_x005fcodes" outPort="1">
      <Mapping cloverField="city_codes" xpath="." />

    </Context>
</Context>

JSONExtract

  1. This is thanks to SAX technology used instead of DOM. Memory requirements are much lower in this case.
  2. I think this is not possible or even wanted. There could be any number of items in an array so you can not prepare metadata in advance. You could read all of them in one string but this is better done using for example Denormalizer afterwards where you can define delimiters, quotation marks and so on for the values. There is however a plan to support direct extraction of arrays and maps. They should be extracted into a list or map variable in one field (“Container type” property set to “List” or “Map” on the metadata field). For more details, see https://bug.javlin.eu/browse/CLO-2054

I hope this helps.