Hey all,
I’m currently working on a project that involves parsing log files in CSV format. However, some of the messages that are contained in the log files contain carriage return/line feed characters, which is throwing clover for quite the loop. Here’s a sample of the data I’m trying to work with:
2014-06-24 07:48:36,Kernel.Info,localhost,Jun 24 07:48:36 TiddlyBox MSWinEventLog 6 PreEmptive 231 Tue Jun 24 07:48:36 2014 0 SolarWinds Log Forwarder for Windows (PreEmptive) N/A Information TiddlyBox 0 Test Message from Log Forwarder to the 'PreEmptive' event log.
2014-06-24 07:48:36,Kernel.Info,localhost,Jun 24 07:48:36 TiddlyBox MSWinEventLog 6 SolarWinds.Net 232 Tue Jun 24 07:48:36 2014 0 SolarWinds Log Forwarder for Windows (SolarWinds.Net) N/A Information TiddlyBox 0 Test Message from Log Forwarder to the 'SolarWinds.Net' event log.
2014-06-24 07:48:36,Kernel.Info,localhost,Jun 24 07:48:36 TiddlyBox MSWinEventLog 6 System 233 Tue Jun 24 07:48:36 2014 0 SolarWinds Log Forwarder for Windows (System) N/A Information TiddlyBox 0 Test Message from Log Forwarder to the 'System' event log.
2014-06-24 07:48:36,Kernel.Info,localhost,Jun 24 07:48:36 TiddlyBox MSWinEventLog 6 Windows PowerShell 234 Tue Jun 24 07:48:36 2014 0 SolarWinds Log Forwarder for Windows (Windows PowerShell) N/A Information TiddlyBox 0 Test Message from Log Forwarder to the 'Windows PowerShell' event log.
2014-06-24 07:49:16,Kernel.Notice,localhost,Jun 24 07:49:16 TiddlyBox MSWinEventLog 5 Security 235 Tue Jun 24 07:49:15 2014 4797 Microsoft-Windows-Security-Auditing N/A Audit Success TiddlyBox 13824 An attempt was made to query the existence of a blank password for an account.
Subject:
Security ID: S-1-5-21-3786561293-739534104-609097707-1005
Account Name: mrh50_000
Account Domain: TIDDLYBOX
Logon ID: 0x81CB0
Additional Information:
Caller Workstation: TIDDLYBOX
Target Account Name: Administrator
Target Account Domain: TiddlyBox
2014-06-24 07:49:16,Kernel.Notice,localhost,Jun 24 07:49:16 TiddlyBox MSWinEventLog 5 Security 236 Tue Jun 24 07:49:15 2014 4797 Microsoft-Windows-Security-Auditing N/A Audit Success TiddlyBox 13824 An attempt was made to query the existence of a blank password for an account.
I’ve stared at the data for quite a while in notepad++ with show all characters enabled in some effort to find a way to delimit between lines, however, each entry is somewhat unique, with the “message portion” ranging from this:
Test Message from Log Forwarder to the 'Windows PowerShell' event log.
all the way to this:
An attempt was made to query the existence of a blank password for an account.
Subject:
Security ID: S-1-5-21-3786561293-739534104-609097707-1005
Account Name: mrh50_000
Account Domain: TIDDLYBOX
Logon ID: 0x81CB0
Additional Information:
Caller Workstation: TIDDLYBOX
Target Account Name: Administrator
Target Account Domain: TiddlyBox
The only constant that I can see in the data is that each new entry begins with a date in the format YYYY-mm-dd. Is there any way to use regex or pattern matching in order to use that as the record delimiter? I’d like to avoid having to homebrew a solution to this as a seperate program that must run before I can use clover to transform the data.
The configuration I’m using at this point is a simple universaldatareader object connected to the trash in order to test loading the data. The graph fails as soon as it hits one of the lines involving the more complex multi-line message.