XMLExtract:Reading elements and attributes with same name

Hi,

I am not able to firgure out as to how can I define mapping that would actually read the elements and attributes that have the same name, but are under different hierarchy by using XMLExtract. Please find below an empty XML instance for reference.

<us:SyncItemMaster xmlns:us="" xmlns:oa="" releaseID=""
	systemEnvironmentCode="" languageCode="">
	<oa:ApplicationArea>
		<oa:Sender>
			<oa:LogicalID></oa:LogicalID>
			<oa:ComponentID></oa:ComponentID>
			<oa:TaskID></oa:TaskID>
			<oa:ReferenceID></oa:ReferenceID>
		</oa:Sender>
		<oa:CreationDateTime></oa:CreationDateTime>
	</oa:ApplicationArea>
	<us:DataArea>
		<oa:Sync></oa:Sync>
		<us:ItemMaster>
			<us:ItemMasterHeader>
				<oa:ItemID agencyRole="">
					<oa:ID></oa:ID>
				</oa:ItemID>
				<oa:ManufacturerItemID>
                                        <oa:ID></oa:ID>
                                </oa:ManufacturerItemID>
				<us:Classification type="">
					<us:Codes>
						<oa:Code sequence="" name=""></oa:Code>
					</us:Codes>
					<oa:Note status=""></oa:Note>
					<us:SkuGroupImage></us:SkuGroupImage>
					<us:SkuGroupAlternateImage></us:SkuGroupAlternateImage>
					<us:SkuGroupIcons></us:SkuGroupIcons>
				</us:Classification>
				<oa:Specification>
					<oa:Property sequence="">
						<oa:NameValue name=""></oa:NameValue>
					</oa:Property>
				</oa:Specification>
				<oa:Description type=""></oa:Description>
				<us:Packaging>
					<oa:PerPackageQuantity></oa:PerPackageQuantity>
				</us:Packaging>
				<oa:ItemStatus></oa:ItemStatus>
				<oa:DrawingAttachment>
					<oa:FileName></oa:FileName>
				</oa:DrawingAttachment>
				<oa:Attachment></oa:Attachment>
				<us:FreightClassification>
					<oa:Codes></oa:Codes>
				</us:FreightClassification>
				<us:Keywords></us:Keywords>
				<us:BrandId></us:BrandId>
			</us:ItemMasterHeader>
			<us:ItemLocation>
				<us:Dimensions>
					<us:Weight unitCode=""></us:Weight>
				</us:Dimensions>
				<us:UnitPackaging></us:UnitPackaging>
			</us:ItemLocation>
			<us:GlobalItem>
				<us:InventoryUnitCode></us:InventoryUnitCode>
				<us:ItemWeight unitCode=""></us:ItemWeight>
				<us:GTINItem></us:GTINItem>
				<us:GTINCarton></us:GTINCarton>
				<us:GTINBox></us:GTINBox>
				<us:GTINPallet></us:GTINPallet>
				<us:UPCRetail></us:UPCRetail>
				<us:UPCCarton></us:UPCCarton>
				<us:CountryOriginCode></us:CountryOriginCode>
			</us:GlobalItem>
			<us:ReferenceItem>
				<us:UsableCode></us:UsableCode>
				<us:ReferToVendorPrefix></us:ReferToVendorPrefix>
				<us:ReferToStockNumber></us:ReferToStockNumber>
				<us:ReferToUnbuttedNumber></us:ReferToUnbuttedNumber>
				<us:ChangeEffectiveDate></us:ChangeEffectiveDate>
			</us:ReferenceItem>
			<us:ConsumerPricing>
				<us:PricePlan>
					<us:PricePlanId></us:PricePlanId>
					<us:Column1Amount currencyID=""></us:Column1Amount>
					<us:Column2Amount currencyID=""></us:Column2Amount>
					<us:Column3Amount currencyID=""></us:Column3Amount>
					<us:Column4Amount currencyID=""></us:Column4Amount>
				</us:PricePlan>
			</us:ConsumerPricing>
			<us:ItemList>
				<us:CountryCode></us:CountryCode>
				<us:ListStartDate></us:ListStartDate>
				<us:ListEndDate></us:ListEndDate>
				<us:FacilityNumber></us:FacilityNumber>
				<us:VendorNumber></us:VendorNumber>
				<us:VendorShortName></us:VendorShortName>
				<us:ListAmount currencyID=""></us:ListAmount>
				<us:ListUnitCode></us:ListUnitCode>
			</us:ItemList>
			<us:WarrantyInfo>
				<us:WarrantyIndicator></us:WarrantyIndicator>
			</us:WarrantyInfo>
		</us:ItemMaster>
	</us:DataArea>
</us:SyncItemMaster>

I we look at the XML above, we have repeated occurrences of elements like “oa:ID” and attributes like “name”, but they fall under different hierarchy and belong to different tags.

Lets say one record be represented by the tag us:ItemMaster, and if at all I have to build a record out of the above XML such that

Clover field1 → oa:ID (under oa:ItemID)
Clover field2 → oa:ID (under ManufacturerID)
Clover field3 → name (attribute of tag Code)
Clover field4 → name (attribute of tag NameValue)

The reason that I cannot use other type of XML readers that support XPATH is the fact the files to be processed are huge of the order of 1 GB, it working with XPATH may lead into out of memory issues if loaded into DOM structures.

Please suggest how exactly should the mapping look if at all I have to build a record as shown above. Thanks for your support and eagerly looking forward for a solution around this issue.

Regards
Pankaj

Hello,

Mapping will look like this:


<Mappings>
<Mapping element="oa:ItemID">
  <Mapping element="oa:ID" outPort="0" xmlFields="oa:ID" cloverFields="oaID" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="oa:ManufacturerItemID">
  <Mapping element="oa:ID" outPort="1" xmlFields="oa:ID" cloverFields="oaID" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="us:Codes">
  <Mapping element="oa:Code" outPort="2" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
<Mapping element="oa:Property">
  <Mapping element="oa:NameValue" outPort="3" sequenceField="count" sequenceID="Sequence0" />
</Mapping>
</Mappings>

XMLExtract will have four output ports.
The ports 0 and 1 will have the following metadata:


<Metadata id="Metadata0" previewAttachmentCharset="ISO-8859-1">
<Record fieldDelimiter="|" name="recordName1" previewAttachmentCharset="ISO-8859-1" recordDelimiter="\r\n" type="delimited">
<Field name="count" type="integer"/>
<Field name="oaID" type="string"/>
</Record>
</Metadata>

The ports 2 and 3 will have them as follows:


<Metadata id="Metadata2">
<Record fieldDelimiter="|" name="recordName3" recordDelimiter="\r\n" type="delimited">
<Field name="count" type="integer"/>
<Field name="name" type="string"/>
</Record>
</Metadata>

These four edges will be joined using ExtMergeJoin where the “count” fields are used as Join key.

The transformation will be:


function transform() {
	$0.field1 := $0.oaID;
	$0.field2 := $1.oaID;
	$0.field3 := $2.name;
	$0.field4 := $3.name;
}

The output edge of ExtMergeJoin have the following metadata:


<Metadata id="Metadata1" previewAttachmentCharset="ISO-8859-1">
<Record fieldDelimiter="|" name="recordName2" previewAttachmentCharset="ISO-8859-1" recordDelimiter="\r\n" type="delimited">
<Field name="field1" type="string"/>
<Field name="field2" type="string"/>
<Field name="field3" type="string"/>
<Field name="field4" type="string"/>
</Record>
</Metadata>

The graph contains a sequence which serves to fill the “count” fields. Its ID is “Sequence0”.

Note that mapping elements like this:


<Mapping element="oa:ItemID">
 ...
</Mapping>

serve to specify the parent whose child element is sent out.

Best regards,

Tomas Waller