Has anyone thought of or tried to develop a data profiling application? I want to be able to create data statistics and histograms for an incoming data flow.
Hello,
attached graph creates basic statistic for any data file. All you need is to set input_file and metadata parameters.The first Normalizer gets basic information about each data piece: field name, data type and the value itself or the length of the value for string or bye data. Then Rollup creates basic statistic (minimal, maximal and average value or length) for each field. The second Normalizer only reformat the statistic to be more user friendly.
David.
There is now a new member of Clover family - CloverETL Profiler - which allows profiling of data from various sources.