How to do dynamic data profiling

Hi

I am trying to make a more general purpose data profiling graph that get the Sum, Count, Distinct Count,Null,Not null, Min Value Max Value,Min Lengh,Max Lengh for each field that is present in the metadata. We have managed to dynamically generate the metadata for the input files but can’t find a way to dynamically produce the mapping in the aggregator component or the profiling probe.

Do you guys have any suggests on how we can achieve that?

Thank you

Hi there,

The Aggregate component does not create metadata dynamically based on the function used for the given field. Unfortunately, you have to define those prior to using the component, therefore it is not a suitable component for some general purpose.

Based on your question, however, I believe that you might appreciate what the CloverDX Profiler may offer.

First of all, let me show you so-called Profiler Job:
1. In the Designer, right-click your project and select New → Profiler Job.
2. Then you are guided by a wizard that helps you name your Profiler Job and save it to the “profile” folder within the project (->Next).
3. Select the file you want to review in more detail.
4. One part of the wizard also helps you creating metadata by extracting it from the file (click “Extract from file” in the Metadata screen).
Profiler_metadata.PNG
5.The most important setup is called “Metrics selection” where you can select what needs to be calculated for each field. Or you can “Select/Deselect all” by each field. There are multiple options available based on the type of the original field.
6. After you click “Finish” the profiler job is run automatically and you can review the results in the resulting report.
profiler_results.PNG

Now that you know the Profiler, let me emphasize a component called ProfilerProbe. You can use this component to add the Profiler logic into the graph. The advantage of this component is that you can use the second output port to see your results easier. The component auto-propagates the metadata based on metrics selected. You just need to set the Output mapping attribute. If you define it as:

$out.1.* = $in.1.*;

is allows you to use the graph more generally. Please review the functionality on your own and let me know all your thoughts and concerns and I will be happy to provide you with more details if needed.

Best Regards,
Eva