Heya,
I’m trying to find out if I am (1) mis-interepting the wiki, (2) What I want to do requires something else or (3) Needs custom work. Say I have the following input files:
Person.txt
=======
PersonID,Last_Name
1,Weston
Phone.txt
=======
PersonID,phone_number
1,555-555-5555
1,777-777-7777
1,888-888-8888
Pets.txt
=======
PersonID,pet_type
1,cat
1,dog
What I am looking for is an output like:
PersonID,Last_Name,phone_number,pet_type
1,Weston,555-555-5555,cat
1,Weston,777-777-7777,dog
1,Weston,888-888-8888,
I have these files join using ExtMergeJoin where Person.txt is the master and Phone.txt/Pets.txt are slaves. When I looked at the wiki, it says it traverses the slaves to match up the records, but I thought it would join like above (that is, using each slave once until all the slaves have no data, then moving to the next master record). Instead, I seem to be getting more of a SQL-like join which results in somelthing like:
1,Weston,555-555-5555,cat
1,Weston,555-555-5555,dog
1,Weston,777-777-7777,cat
1,Weston,777-777-7777,dog
1,Weston,888-888-8888,cat
1,Weston,888-888-8888,dog
Is this the correct behaviour for ExtMergeJoin? If so, is there another way I can produce this output?
Thanks,
Anna