NLTK with Clover

Hello,

I would like to use NLTK (python) package within a larger graph that serves a completely different purpose (basically need to use a few tokenizers, freqdists and classifiers to clean up data to a degree first so it can then go into a processing cycle). I have read a blog post on how to incorporate python in general but it seemed to involve a lot of 3rd party elements (jython, additional Eclipse IDE for development etc) that I am not familiar with and don’t want to invest a ton of time to learn new pieces (if I can help it). Ultimately one can get the info written to a flat file somewhere and turn to python to get the classification bits done and continue from there on with CloverDx as this still an adhoc process, however that feels a bit clunky and you might point me to a better way.

So my questions really are:

  • does CloverDx [Designer] have any capabilities built-in that are similar to NLTK? Here I am mostly talking about predictive classifiers that one can train. Maybe I can do this all in Clover?

  • If not, what’s the easiest way to incorporate NLTK? Given that I also use custom corpus for this data set, ideally trying to minimise the number of resources/adjustments I need to make. I am happy to write the script on python side once and migrate across to Clover if that limits the set up steps.

  • Has anyone did this (essentially supplement CloverDx with python) and package it to a degree where it can be set up in another laptop without a ton of tinkering?

Thank you for your help.
uykusuZzZz

Hi Uykusuzzzz,
let me answer your questions below:

  • I am not familiar with NLTK and its usage but generally speaking, CloverDX does not possess machine learning capabilities.

  • Generally, the easiest, most convenient, and error-prone way of using Python in CloverDX data transformations is to call Python scripts using ExecuteScript or SystemExecute components. This way, the scripts are called headlessly via a command line and can be part of a larger data transformation and job cascade.

  • The approach described in the blog post that you mentioned (written 5 years ago) did not prove to be a very good option through time (mostly for the reasons you expressed yourself). Calling Python scripts by using the 2 aforementioned components is the recommended approach that, as a matter of fact, was implemented by some of the CloverDX users in the past.

Regards,

Oh thank you bartonv. I am glad there is an easier way. Cheers.

Yes, thanks for the explanation, that will prove useful to me as well in the future!