How to use Json Extractor when records don't have all the same attributes

I have a json file that has 9 possible combinations of columns (based on web site actions). I’m trying to use Json Extractor to load this file into a sql database. I was able to get it working so that it reads all the records based on the generated xsd schema, but that is based on the first row in the file which doesn’t contain all the possible combinations. It is processing all the rows correctly for this smaller set, and I’ve gotten it also to map subtrees of the json data. But there are other pieces of data that are falling off.

I looked at the generated xsd schema and it has 61 rows total. I tried to edit the xsd to contain all the columns I wanted (has 115 rows now), but when I try to change the JsonExtractor to use this xsd, I get an error message in Clover: Error occured during schema reading. Reason: com.cloveretl.gui.dT:Error while initializing tree.
Clicking on details gives: com.cloveretl.gui.dT: Error while initializing tree
Caused by: org.apache.xmlbeans.XmlException: sandbox://Guruse/meta/stats14_json_serena.xsd:0: error: cvc-complex-type.2.4a: Expected elements ‘attribute@http://www.w3.org/2001/XMLSchema attributeGroup@http://www.w3.org/2001/XMLSchema anyAttribute@http://www.w3.org/2001/XMLSchema’ instead of ‘complexType@http://www.w3.org/2001/XMLSchema’ here in element complexType@http://www.w3.org/2001/XMLSchema

And then when I click on OK it brings up the same blank window where I need to generate tree structure before I can get started.

What can I do? How can I get Clover to recognize that each row in the file has the potential to need more or less mappings?

Hi,

I’m afraid that I don’t fully understand what you want to achieve. I understand that the JSON files may have different structure. If the files contain some arrays that may have different number of fields, the solution will be quite easy. You just need to set the corresponding metadata container type to list and map it as an usual element (in the JSONExtract mapping property). For more detailed information see our blog, please.

If the files have different count and names of elements, they cannot be parsed by a single JSONExtract. You will need to use multiple JSONExtract components, one for each file structure. You need to create a deciding structure capable of sending the file to the correct JSONExtract. You can do this by:

1. Subgraph. Put each JSONExtract component into separated subgraph and create the deciding structure in the parent graph. (http://www.cloveretl.com/blog/improving-subgraphs-cloveretl-4-1/)
2. Jobflow. Put the JSONExtract components into single graph and make them “enabled with condition” (via context menu) by a graph parameter. Create the deciding structure in the jobflow and pass the parameter to the graph.

If you want me to create a sample graph for you, send me the following, please.

1. Some example input JSON files (2 or 3), so I can see what exactly is different.
2. Your graph with its externalized dependencies (metadata, etc.), so I can see how you proceeded. Feel free to remove any sensitive information.
3. The edited *.xsd file (with 115 rows), so I can see what went wrong.

Best regards,

I have similar issue, I think.

Here is my sample data.

{
“total_results_count”: 2,
“code”: “OK”,
“offset”: 0,
“results_count”: 2,
“results”: [
{
“sem3_id”: “2IFH5VS98y0gG22kIQi0i2”,
“name”: “Sea Gull Lighting 8753-34 Single-Light Outdoor Wall Lantern with Smooth White Globe, Black”,
“model”: “8753 34”,
“brand”: “Sea Gull Lighting”,
“category”: “Porch & Patio Lights”,
“color”: “Smooth White”,
“created_at”: 1347510294,
“ean”: “0785652875335”,
“features”: {
“Recommended Light Bulb Shape”: “A19”,
“Number of Bulbs Required”: “1”,
“Bulbs Included”: “No”,
“Manufacturer Color/Finish”: “Black”,
“Style”: “Casual/Transitional”,
“Weather Resistant/Weatherproof”: “Weatherproof”,
“Glass Color”: “White”,
“Damp Rated”: “Yes”,
“Hardware Included”: “Yes”,
“Glass Style”: “Flat”,
“Material”: “Polycarbonate”,
“Dark Sky”: “No”,
“Package Quantity”: “1”,
“Bulb Type”: “Incandescent”,
“Fixture Height (Inches)”: “7.25”,
“Light Bulb Base Type”: “Medium base (E-26)”,
“UL Safety Listing”: “Yes”,
“Collection Name”: “N/A”,
“ENERGY STAR Qualified”: “No”,
“Maximum Bulb Wattage”: “60”,
“Color/Finish Family”: “Black”,
“CSA Safety Listing”: “Yes”,
“Fixture Width (Inches)”: “6”,
“Power Source”: “Hardwired”,
“Motion Activated”: “No”,
“Fixture Depth (Inches)”: “7”,
“Size Classification”: “Small (width of product 6-in below)”,
“ETL Safety Listing”: “No”
},
“gtins”: [
“00785652875335”
],
“geo”: [
“usa”
],
“height”: “184.15”,
“images_total”: 0,
“length”: “177.80”,
“manufacturer”: “Sea Gull Lighting”,
“mpn”: “8753-34”,
“upc”: “785652875335”,
“updated_at”: 1466435233,
“width”: “152.40”
},
{
“sem3_id”: “0HukkZv3LTEGuMg6UQSaGY”,
“name”: “Sea Gull Lighting 8753-34 One Light Outdoor Wall Fixture - Black Finish”,
“brand”: “Sea Gull Lighting”,
“category”: “Outdoor Lighting”,
“created_at”: 1407272933,
“features”: {
“Origin of Components”: “USA and/or Imported”,
“Shipping Weight (in pounds)”: “2.18”
},
“geo”: [
“usa”
],
“images_total”: 0,
“manufacturer”: “Sea Gull Lighting”,
“updated_at”: 1464091362
}
]
}

He is my jsonExtractor source mapping. In Bold, is the mapping I manually created, since I could not get the visual mapper to recognize the “features” data. I am trying to transform the features into a set of name/value pairs.

For example,

name:“Recommended Light Bulb Shape”/value:A19

Is this possible?

_****_ _****_

Hi mcomsto,

You can parse the ‘features’ by a component called Normalizer and dynamic field access CTL functions. I attached a simple example graph which utilizes this approach.

You mentioned that you cannot see the ‘features’ in the visual mapper of the JSONExtract component. I just copied your sample data into a file (with no editing) and was able to map them (see them in the mapping editor). I don’t know what version of CloverETL you are using but you can try the newest one (4.2.0). You should be able to map everything via the mapping editor there.

Also, the example graph should be viewed in the 4.2.0 Designer as it was created in that version. It uses sample data from your post.

json_read.grf
Best regards,