Error while loading data in the Database

Hi,

I am facing problem that , i have two million data in a csv file . when i start loading some where in the file data is disturbed and that is broke in two ldifferent line…so my first questin is that how can i check that which line has this type of problem.

Second thisng is that when i enter that data (Broken dat in two lines) it gives error at those lines because data disturbed in different column of the db …so it gives dat trucation error in the db…

So plese help how can i handle tis problem in Clover…it is very critical for me…

Apart of this…some time i get run time error becase of the wrong data in file…so this breaks my programe…what i want that if there is this type of data problm then it load that error line in the report.csv and the rest of the data should e loade in th database,
Means error should be logged in the file and the rest of the data should be loaded in the database.

Error coming in the file…

  1. Line breaking
  2. End of file is not currect

Please help me how to handle this…

Thansk,
Hauman Mishra

Hello,
if you set proper data policy on the Reader (see dataPolicy), data , which can’t be read are skipped. If you want to control db loading set on DBOUtputTable maxErrors attribute on some positive value and connect output port to it (see DBOutputTable).

Can we log those records in a file or in DB(error table) , so that we can have report that these are the records having problem…otherwise it is not usefull…because we must have solid reason to show that this is the eroor.

Thaks,
Hanuman Mishra

Hi,
when you connect logging port to DataReader (DataReader) you can do with wrong record what you want eg. write them to flat file (DataWriter) or do database (DBOutputTable); the same with the DBOutputTable - if db error occurs wrong records (with error message) can be processed by any way. See Data policy example and DB Load example

I am getting error in the data that , is in the data…

dta is like this:-

“1473227”,“31580”,“16”,“”,“9”,“4”,“372-372”,“1962”,“Addendum to “Single-transit, large-radius E-type devicest””,“”,“Nunn, W.M., Jr.”,“IEEE”,“”,“”,“”

how to load in DB…
actually it is title is devided in two parts…

Addendum to “Single-transit
&
large-radius E-type devicest”

Now how can i handle thsi situation…please help me i am in trouble…i have relise of the project

Hello,
it is not possible to read such data directly. Work around can be Transformat component, which will prepare such data for data base: eg.

<?xml version="1.0" encoding="UTF-8"?>
<Graph id="1222247484704" name="test" revision="1.43">
<Global>
<Metadata id="Metadata0" previewAttachmentCharset="ISO-8859-1">
<Record fieldDelimiter="," name="data" previewAttachmentCharset="ISO-8859-1" recordDelimiter="\n" type="delimited">
<Field name="field1" type="string"/>
<Field name="field2" type="string"/>
<Field name="field3" type="string"/>
<Field name="field4" type="string"/>
<Field name="field5" type="string"/>
<Field name="field6" type="string"/>
<Field name="field7" type="string"/>
<Field name="field8" type="string"/>
<Field name="field9" type="string"/>
<Field name="field10" type="string"/>
<Field name="field11" type="string"/>
<Field name="field12" type="string"/>
<Field name="field13" type="string"/>
<Field name="field14" type="string"/>
<Field name="field15" type="string"/>
</Record>
</Metadata>
<Metadata id="Metadata1" previewAttachmentCharset="ISO-8859-1">
<Record fieldDelimiter="|" name="error" previewAttachmentCharset="ISO-8859-1" recordDelimiter="\n" type="delimited">
<Field name="rec" type="integer"/>
<Field name="field" type="integer"/>
<Field name="offending" type="string"/>
<Field name="message" type="string"/>
</Record>
</Metadata>
<Property fileURL="workspace.prm" id="GraphParameter0"/>
</Global>
<Phase number="0">
<Node dataPolicy="controlled" enabled="enabled" fileURL="${DATAIN_DIR}/data.txt" id="DATA_READER0" quotedStrings="true" type="DATA_READER"/>
<Node id="TRASH0" type="TRASH"/>
<Edge fromNode="DATA_READER0:0" id="Edge0" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (output)" toNode="TRASH0:0"/>
<Edge fromNode="DATA_READER0:1"  id="Edge1" inPort="Port 0 (in)" metadata="Metadata1" outPort="Port 1 (logs)" toNode="REFORMAT0:0"/>
</Phase>
<Phase number="1">
<Node id="REFORMAT0" type="REFORMAT">
<attr name="transform"><![CDATA[//#TL
	int listIndex =0;
	list l;

function createString(){
	string s = l[listIndex];
	boolean found = false;
	print_err(listIndex);
	if (isnull(s) or length(s) == 0) {
		return null; 
	}
	listIndex++;
	if (char_at(s,0).eq.'"') s=substring(s,1,length(s)-1);  
	if (char_at(s,length(s)-1).eq.'"') {
		found = true;
		s = substring(s,0,length(s)-1);
	}else{
		do {
			s = concat(s, l[listIndex]);
			listIndex++;
			if (char_at(s,length(s)-1).eq.'"') {
				found = true;
				s = substring(s,0,length(s)-1);
			}
		} while (!found);
	}
	return s;
}
	
// Transforms input record into output record.
function transform() {
	l=split($offending,',');
	print_err(l);
	$field1 := nvl(createString(),'');
	$field2 := nvl(createString(),'');
	$field3 := nvl(createString(),'');
	$field4 := nvl(createString(),'');
	$field5 := nvl(createString(),'');
	$field6 := nvl(createString(),'');
	$field7 := nvl(createString(),'');
	$field8 := nvl(createString(),'');
	$field9 := nvl(createString(),'');
	$field10 := nvl(createString(),'');
	$field11 := nvl(createString(),'');
	$field12 := nvl(createString(),'');
	$field13 := nvl(createString(),'');
	$field14 := nvl(createString(),'');
}


// Called during component initialization.
// function init() {}

// Called after the component finishes.
// function finished() {}
]]></attr>
</Node>
<Node debugPrint="true"  id="TRASH1" type="TRASH"/>
<Edge fromNode="REFORMAT0:0"  id="Edge2" inPort="Port 0 (in)" metadata="Metadata0" outPort="Port 0 (out)" toNode="TRASH1:0"/>
</Phase>
</Graph>

I am using java transform whic is using
public boolean transform(DataRecord source, DataRecord target){

method…

so please can you send me this transformer in Java.

How should i take list as you explained in your example…

Thanks,
hanuman

in java you can use string split method or org.jetel.util.string.StringUtils.split method.

My question is not that how to split data///// but my question is that how to transform

DataRecord source


So please help me…just write initial code that how to use DataRecord source in your programe…

Thanks,
Hanuman

import org.jetel.component.DataRecordTransform;
import org.jetel.data.DataRecord;
import org.jetel.exception.TransformException;
import org.jetel.util.string.StringUtils;


public class Transform extends DataRecordTransform {

	@Override
	public int transform(DataRecord[] inputRecords, DataRecord[] outputRecords)
			throws TransformException {
		String[] list = StringUtils.split(inputRecords[0].getField("offending").getValue().toString());
		for (int i = 0; i < outputRecords[0].getNumFields(); i++) {
			outputRecords[0].getField(i).setValue(createString());
		}
		return 0;
	}

	private String createString() {
		// TODO Auto-generated method stub
		return null;
	}

}

i am not able to jhande because in metadat my seperator in ,

and when i read by
for(int i=0;i1473227
1----->31580
1----->16
1----->9
1----->4
1----->372-372
1----->1962
1----->Addendum to ""Single-transi
1-----> large-radius E-type"" devicest"
1----->Nunn, W.M., Jr.
1----->IEEE
1----->“,”

How can i handle that please help…

Thanks,
Hanuman

Hi Please help me for the same which i discueed as above…

Can i use String str=“\”,\“” known as “,” as seperator…if i use simple java code like BufferReader and Writers the i use seperator as “,”, it works file…but my data is tool long 2 milllion so it is very slow while inserting in DB sand finding deduplicates…

So Please help me how can it be solved it, is road block for me , and it is going to screw my clover application…

i want to break my data at “,” sepeartor…

Please help help…

Thanks,
Hanuman mIshra

It should work. All you need then, is to strip the quote from the begin of first field and the end of last field.

TransformationGraph institutionGraph = new TransformationGraph(“ArticlesLoadGraph”);
DataRecordMetadata inputMetaData= feedController.MetadataInput;
inputMetaData.setFieldDelimiter(“\”,\“”);
inputMetaData.setRecordDelimiters(“\n”);

i am using ","for seperator and "\n"record seperator…

But clover is not reading like that…

input:-

“371”,“18”,“9”,“”,“33”,“1”,“104-109”,“1988”,“Extension of the optimality of the threshold policy in heterogeneous multiserver queueing systems”,“10.1109/9.371”,“Viniotis, I.”,“IEEE”,“”,“”,“”
“417”,“21”,“35”,“”,“26”,“1”,“8-15”,“1988”,“G.722: a new CCITT coding standard for digital transmission of wideband audio signals”,“10.1109/35.417”,“Mermelstein, P.”,“IEEE”,“”,“”,“”

out put
:-
10
10
3
,"
2
143-147
1988
Performance characteristics of a 1.5 μm single-frequency semiconductor laser with an external waveguide Braff reflector
10.1109/3.107
Olsson, N.A.
IEEE
,“,”
“9
13
3
,“24”,“5”,“766-774”,“1988”,“The azimuthal effective-index method”,“10.1109/3.192”,“Marcatili, E.A.J.”,“IEEE”,”“,”“,”"
:::::::::::::::::::::::::::::::::::
124
83
9
,"
5
511-512
1988
Comments on ‘Dynamic path planning for a mobile automation with limited information on the environment’ [with reply]
10.1109/9.1243
Lucas, C.
IEEE
,“,”
“40
92
10
,“35”,“5”,“400-402”,“1988”,“An electronic device for triggering a stimulator during membrane responsiveness determinations in cardiac tissues”,“10.1109/10.1401”,“Pruett, J.K.”,“IEEE”,”“,”“,”"
:::::::::::::::::::::::::::::::::::
154
99
29
,"
3
417-420
1988
On estimating the instantaneous frequency of a Gaussian random signal by use of the Wigner-Ville distribution
10.1109/29.1543
White, L.V.
IEEE
,“,”
“54
100
29
,“36”,“4”,“433-439”,“1988”,“Cepstral domain talker stress compensation for robust speech recognition”,“10.1109/29.1547”,“Chen, Y.”,“IEEE”,”“,”“,”"
:::::::::::::::::::::::::::::::::::
19
13
3
,"
5
780-786
1988
Influence of Ag+-Na+ ion-exchange equilibrium on waveguide index profiles
10.1109/3.194
Ramaswamy, R.V.
IEEE
,“,”
“3
14
3
,“24”,“6”,“1114-1117”,“1988”,“Mn2+ as a potential solid-state laser ion”,“10.1109/3.234”,“Clausen, R.”,“IEEE”,”“,”“,”"
:::::::::::::::::::::::::::::::::::
23
14
3
,"
6
1141-1150
1988
CW arc-lamp-pumped alexandrite lasers
10.1109/3.237
Samelson, H.
IEEE
,“,”
“3
16
4
,“23”,“3”,“875-877”,“1988”,“A MOS implementation of totally self-checking checker for the 1-out-of-3 code”,“10.1109/4.334”,“Tao, D.L.”,“IEEE”,”“,”“,”"
:::::::::::::::::::::::::::::::::::
8
9
3
,"
1
13-21
1988
Analysis of lateral-mode behavior of “win-stripe"lasers related to the negative slope in their current-light characteristics
10.1109/3.88
Watanabe, M.
IEEE
,”,"
"
9
3
,“24”,“1”,“83-93”,“1988”,“Wavelength-tunable electrooptic polarization conversion in birefringent waveguides”,“10.1109/3.97”,“Heismann, F.”,“IEEE”,“”,“”,“”

So please llok into this…can i have your contact no or so that i can talk you…i ahve dead line…it is very critical for me…

Thanks,
Hnauman MIshra

Do you use last Clover.ETL version and DataReader component (not DelimiterDataReader)?
My output for your data is:

|Record |field1     |field2   |field3   |field4   |field5   |field6   |field7    |field8   |field9                                                                                              |field10          |field11           |field12   |field13   |field14   |field15   |
+-------+-----------+---------+---------+---------+---------+---------+----------+---------+----------------------------------------------------------------------------------------------------+-----------------+------------------+----------+----------+----------+----------+
|# 1    |"1473227   |31580    |16       |         |9        |4        |372-372   |1962     |Addendum to "Single-transit, large-radius E-type devicest"                                          |                 |Nunn, W.M., Jr.   |IEEE      |          |          |"         |
|# 2    |"371       |18       |9        |         |33       |1        |104-109   |1988     |Extension of the optimality of the threshold policy in heterogeneous multiserver queueing systems   |10.1109/9.371    |Viniotis, I.      |IEEE      |          |          |"         |
|# 3    |"417       |21       |35       |         |26       |1        |8-15      |1988     |G.722: a new CCITT coding standard for digital transmission of wideband audio signals               |10.1109/35.417   |Mermelstein, P.   |IEEE      |          |          |"         |

So all is needed, is to remove quote on the beginning of 1st field and on the end of last field.

Yes,
You are right but i found one big difference that DeLimiter handles automatically line breaks but…in the DataReared it consider that line as incorrect data and skip that line …f i put Controlled policy…

Is it works like that or i am wrong some where…so please suggest me that how should i handle line breaks…

Thanks,
Hauman