Hello,
we are working on a solution that requires us to join data by substring match. None of the build in joins seem to do this, so it looks like I’m going to need to write something. I understand CloverETL is open source – does this mean that I can modify an existing joiner, rather than write one from scratch?
Thanks,
- Steven
You are perfectly right. CloverETL, being LGPL, allows you to take existing component (its code) and create a new one.
Said that, I would be careful how you implement the join if you want it to perform well. You will need some kind of pre-generated key to bring together groups of records which may potentially match/join and then just refine the pairing based on substring match.
You better start with something like ApproximativeJoin which does exactly this by using edit distance for doing the fine-pairing.
What exactly is your use-case for which you need this functionality ? It is a bit uncommon in ETL arena.