NLTK with Clover

Support/help with CloverETL (4.9) and CloverDX (5.0 or newer) implementation problems

uykusuzzzz
Posts: 2
Joined: Thu Apr 23, 2020 7:35 pm

NLTK with Clover

Postby uykusuzzzz » Thu Apr 23, 2020 7:58 pm

Hello,

I would like to use NLTK (python) package within a larger graph that serves a completely different purpose (basically need to use a few tokenizers, freqdists and classifiers to clean up data to a degree first so it can then go into a processing cycle). I have read a blog post on how to incorporate python in general but it seemed to involve a lot of 3rd party elements (jython, additional Eclipse IDE for development etc) that I am not familiar with and don't want to invest a ton of time to learn new pieces (if I can help it). Ultimately one can get the info written to a flat file somewhere and turn to python to get the classification bits done and continue from there on with CloverDx as this still an adhoc process, however that feels a bit clunky and you might point me to a better way.

So my questions really are:
  • does CloverDx [Designer] have any capabilities built-in that are similar to NLTK? Here I am mostly talking about predictive classifiers that one can train. Maybe I can do this all in Clover?
  • If not, what's the easiest way to incorporate NLTK? Given that I also use custom corpus for this data set, ideally trying to minimise the number of resources/adjustments I need to make. I am happy to write the script on python side once and migrate across to Clover if that limits the set up steps.
  • Has anyone did this (essentially supplement CloverDx with python) and package it to a degree where it can be set up in another laptop without a ton of tinkering?

Thank you for your help.
uykusuZzZz

bartonv
Posts: 145
Joined: Wed May 03, 2017 12:10 pm

Re: NLTK with Clover

Postby bartonv » Thu Jun 04, 2020 1:24 pm

Hi Uykusuzzzz,
let me answer your questions below:
  • I am not familiar with NLTK and its usage but generally speaking, CloverDX does not possess machine learning capabilities.
  • Generally, the easiest, most convenient, and error-prone way of using Python in CloverDX data transformations is to call Python scripts using ExecuteScript or SystemExecute components. This way, the scripts are called headlessly via a command line and can be part of a larger data transformation and job cascade.
  • The approach described in the blog post that you mentioned (written 5 years ago) did not prove to be a very good option through time (mostly for the reasons you expressed yourself). Calling Python scripts by using the 2 aforementioned components is the recommended approach that, as a matter of fact, was implemented by some of the CloverDX users in the past.

Regards,
---
Vladimir Barton
CloverCARE Support
CloverDX

Visit us online at http://www.cloverdx.com

uykusuzzzz
Posts: 2
Joined: Thu Apr 23, 2020 7:35 pm

Re: NLTK with Clover

Postby uykusuzzzz » Wed Jun 24, 2020 12:44 am

Oh thank you bartonv. I am glad there is an easier way. Cheers.

Tanker62
Posts: 6
Joined: Sat Aug 15, 2020 10:48 am

Re: NLTK with Clover

Postby Tanker62 » Mon Aug 31, 2020 11:41 am

Yes, thanks for the explanation, that will prove useful to me as well in the future!


cron