Count the Occurrence of a word in String

Hi All,

I have a requirement where I need to count the occurrence of a word i.e. “DownloadServlet” in Message field by each date.I have highlight the word below shown example. I have used delimiter as “space” to create the following fields as input Metadata Date, Mins, TPProcessor, JVMCode,MiscCode,Type and Message.

1)Output should be Date, Count of DownloadServelt(occurrences of word DownloadServelt will tell me the count)
2) If above is achieved then need to show which file is downloaded from below example i.e. file name : DATC1403223921659[]HAR_TEST_FLE.TXT.

I want to know what would be the function I should be using in Reformat and Aggregation.

Really Appreciate your help.

Thxs
Sri

The input is a text file with following content in it.

2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.DownloadServlet - viewFile: Y
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.DownloadServlet - acknowledgment: null
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.DownloadServlet - namespace: ftf_ReceiveFilesPortlet_7_OO189B1A08L170I5JL351Q30K2
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.DownloadServlet - acknowledgmentsType: null
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.DownloadServlet - file name : DATC1403223921659HAR_TEST_FLE.TXT
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.BaseServlet - Get Session By JSESSIONID. JSESSIONID is 960029B30E53038C4FE87085E0E44BFC.jvm1
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.DownloadServlet - ticketId: h8m1fzOz4pLdMDOx16KMoahxD2rUpzTG
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.DownloadServlet - ftfUserId: 30032389_FERLYNO@ftf.com
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.DownloadServlet - dummyRequest: null
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.DownloadServlet - locale: en_US
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.BaseServlet - sessionFtfUserId:30032389_FERLYNO@ftf.com
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.BaseServlet - Session contains 5tickets
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.BaseServlet - Security is passed!
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.DownloadServlet - Call SMG upload
2014-03-23 03:02:03,135 [TP-Processor33] [960029B30E53038C4FE87085E0E44BFC.jvm1] [30032389_FERLYNO@ftf.com] DEBUG com.gxs.bmo.ftf.servlet.DownloadServlet - Download parameters: user ID=AHB54525, receiverId=, snrf=, tnDocId=39fea1008jlpob1s0007gmg2

Hello Sri,

Do you really want to do the aggregation grouping with the precision of miliseconds? Or maybe days? Or do you want to do the count per file? Which date should be in the output record in the case there is more date values? And is there always only one record with file name?

I could make an example graph based on your input sample but I do not know whether it would be generic enough to cover all possible input files you can have. Please answer the questions above and I will try to make an example according to your answer.

Thanks,

Hi Lubos,

Sure I will Thanks again for your help. I tired with iif condition and also indexOf function but no luck.

Yes I am looking for aggregation grouping with the precision of milliseconds in each file as Count not as CountDistinct .If I do countdistinct I will get 1 per each millisecond. Lets say for example below which I have given

2014-03-23 03:02:03,135 Count of DownloadServelt is 12 and when I do it by Date i.e. 2014-03-23 it show all addup.

If the file has multiple dates with precision of milliseconds then I should be getting records with two dates by milliseconds.

Also if we can get the userid who is downloading the file i.e. ftfUserId: 30032389_FERLYNO@ftf.com should be great.

Thanks for your help

Sridhar

Hi Sridhar,

I prepared for you working example of your scenario. I used the Denormalizer component, which creates single output record from one or more input records. In this example I used date, time and message as keys for grouping.

In the example I used your sample input data, output is stored in file with this format:

Date|Time|Message|Count|File|User
2014-03-23|03:02:03,135|com.gxs.bmo.ftf.servlet.DownloadServlet|11|DATC1403223921659[]HAR_TEST_FLE.TXT|30032389_FERLYNO@ftf.com

I hope that the example will help you to solve your scenario.