Parsing CSV based log files, using a date as delimiter?

Hey all,

I’m currently working on a project that involves parsing log files in CSV format. However, some of the messages that are contained in the log files contain carriage return/line feed characters, which is throwing clover for quite the loop. Here’s a sample of the data I’m trying to work with:

2014-06-24 07:48:36,Kernel.Info,localhost,Jun 24 07:48:36 TiddlyBox MSWinEventLog	6	PreEmptive	231	Tue Jun 24 07:48:36 2014	0	SolarWinds Log Forwarder for Windows (PreEmptive)		N/A	Information	TiddlyBox	0	Test Message from Log Forwarder to the 'PreEmptive' event log.
2014-06-24 07:48:36,Kernel.Info,localhost,Jun 24 07:48:36 TiddlyBox MSWinEventLog	6	SolarWinds.Net	232	Tue Jun 24 07:48:36 2014	0	SolarWinds Log Forwarder for Windows (SolarWinds.Net)		N/A	Information	TiddlyBox	0	Test Message from Log Forwarder to the 'SolarWinds.Net' event log.
2014-06-24 07:48:36,Kernel.Info,localhost,Jun 24 07:48:36 TiddlyBox MSWinEventLog	6	System	233	Tue Jun 24 07:48:36 2014	0	SolarWinds Log Forwarder for Windows (System)		N/A	Information	TiddlyBox	0	Test Message from Log Forwarder to the 'System' event log.
2014-06-24 07:48:36,Kernel.Info,localhost,Jun 24 07:48:36 TiddlyBox MSWinEventLog	6	Windows PowerShell	234	Tue Jun 24 07:48:36 2014	0	SolarWinds Log Forwarder for Windows (Windows PowerShell)		N/A	Information	TiddlyBox	0	Test Message from Log Forwarder to the 'Windows PowerShell' event log.
2014-06-24 07:49:16,Kernel.Notice,localhost,Jun 24 07:49:16 TiddlyBox MSWinEventLog	5	Security	235	Tue Jun 24 07:49:15 2014	4797	Microsoft-Windows-Security-Auditing		N/A	Audit Success	TiddlyBox	13824	An attempt was made to query the existence of a blank password for an account.

Subject:
	Security ID:		S-1-5-21-3786561293-739534104-609097707-1005
	Account Name:		mrh50_000
	Account Domain:		TIDDLYBOX
	Logon ID:		0x81CB0

Additional Information:
	Caller Workstation:	TIDDLYBOX
	Target Account Name:	Administrator
	Target Account Domain:	TiddlyBox
2014-06-24 07:49:16,Kernel.Notice,localhost,Jun 24 07:49:16 TiddlyBox MSWinEventLog	5	Security	236	Tue Jun 24 07:49:15 2014	4797	Microsoft-Windows-Security-Auditing		N/A	Audit Success	TiddlyBox	13824	An attempt was made to query the existence of a blank password for an account.

I’ve stared at the data for quite a while in notepad++ with show all characters enabled in some effort to find a way to delimit between lines, however, each entry is somewhat unique, with the “message portion” ranging from this:

Test Message from Log Forwarder to the 'Windows PowerShell' event log.

all the way to this:

An attempt was made to query the existence of a blank password for an account.

Subject:
	Security ID:		S-1-5-21-3786561293-739534104-609097707-1005
	Account Name:		mrh50_000
	Account Domain:		TIDDLYBOX
	Logon ID:		0x81CB0

Additional Information:
	Caller Workstation:	TIDDLYBOX
	Target Account Name:	Administrator
	Target Account Domain:	TiddlyBox

The only constant that I can see in the data is that each new entry begins with a date in the format YYYY-mm-dd. Is there any way to use regex or pattern matching in order to use that as the record delimiter? I’d like to avoid having to homebrew a solution to this as a seperate program that must run before I can use clover to transform the data.

The configuration I’m using at this point is a simple universaldatareader object connected to the trash in order to test loading the data. The graph fails as soon as it hits one of the lines involving the more complex multi-line message.

Hi,

Here is an example graph. I hope it will help you with your solution.
log_parser.zip
Regards,