I am having two issues with the HTTPConnector. First, a URL being pulled sends multiple redirects and one of the redirects has pipes (‘|’) in the query string which are not properly escaped during the attempt to GET that URL. See the below debug log:
DEBUG [HTTP_CONNECTOR0_0] - Creating GET request to http://feeds.newscientist.com/c/749/f/1 … tory01.htm
DEBUG [HTTP_CONNECTOR0_0] - Sending HTTP request:GET /c/749/f/10899/s/289935a2/l/0L0Snewscientist0N0Carticle0Cmg217290A460B0A0A0A0Efake0Epointers0Eon0Eyour0Escreen0Efoil0Eshoulder0Esurfers0Bhtml0Dcmpid0FRSS0QNSNS0Q20A120EGLOBAL0Qtech/story01.htm HTTP/1.1
Host: feeds.newscientist.com
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.2 (java 1.5)DEBUG [HTTP_CONNECTOR0_0] - Received HTTP response:
HTTP/1.1 301 OK
Server: FeedsPortal
Set-Cookie: MF2=19rfgzo1clmiu; expires=Wed, 11-Mar-15 20:12:37 GMT; path=/
Location: http://da.feedsportal.com/c/749/f/10899 … ch/ia1.htm
Content-Type: text/plain; charset=iso-8859-1
Content-Length: 0
Date: Mon, 11 Mar 2013 20:12:36 GMT
Connection: closeDEBUG [HTTP_CONNECTOR0_0] - Sending HTTP request:
GET /c/749/f/10899/s/289935a2/l/0L0Snewscientist0N0Carticle0Cmg217290A460B0A0A0A0Efake0Epointers0Eon0Eyour0Escreen0Efoil0Eshoulder0Esurfers0Bhtml0Dcmpid0FRSS0QNSNS0Q20A120EGLOBAL0Qtech/ia1.htm HTTP/1.1
Host: da.feedsportal.com
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.2 (java 1.5)
Cookie: MF2=19rfgzo1clmiu
Cookie2: $Version=1DEBUG [HTTP_CONNECTOR0_0] - Received HTTP response:
HTTP/1.1 301 OK
Server: FeedsPortal
Location: http://www.newscientist.com/article/mg2 … LOBAL|tech
Content-Type: text/plain; charset=iso-8859-1
Content-Length: 0
Date: Mon, 11 Mar 2013 20:12:37 GMT
Connection: closeERROR [HTTP_CONNECTOR0_0] - org.apache.http.client.ClientProtocolException.
DEBUG [HTTP_CONNECTOR0_0] - Execution unsuccessful:
org.apache.http.client.ClientProtocolException
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at org.jetel.component.HttpConnector.buildAndSendRequest(HttpConnector.java:1793)
at org.jetel.component.HttpConnector.process(HttpConnector.java:1760)
at org.jetel.component.HttpConnector.executeForRecord(HttpConnector.java:1913)
at org.jetel.component.HttpConnector.execute(HttpConnector.java:1860)
at org.jetel.graph.Node.run(Node.java:465)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.http.ProtocolException: Invalid redirect URI: http://www.newscientist.com/article/mg2 … LOBAL|tech
at org.apache.http.impl.client.DefaultRedirectStrategy.createLocationURI(DefaultRedirectStrategy.java:189)
at org.apache.http.impl.client.DefaultRedirectStrategy.getLocationURI(DefaultRedirectStrategy.java:140)
at org.apache.http.impl.client.DefaultRedirectStrategy.getRedirect(DefaultRedirectStrategy.java:209)
at org.apache.http.impl.client.DefaultRequestDirector.handleResponse(DefaultRequestDirector.java:1070)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:546)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
… 8 more
Caused by: java.net.URISyntaxException: Illegal character in query at index 116: http://www.newscientist.com/article/mg2 … LOBAL|tech
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.checkChars(URI.java:3002)
at java.net.URI$Parser.parseHierarchical(URI.java:3092)
at java.net.URI$Parser.parse(URI.java:3034)
at java.net.URI.(URI.java:595)
at org.apache.http.impl.client.DefaultRedirectStrategy.createLocationURI(DefaultRedirectStrategy.java:187)
… 13 more
ERROR [WatchDog] - Graph execution finished with error
ERROR [WatchDog] - Node HTTP_CONNECTOR0 finished with status: ERROR
ERROR [WatchDog] - Node HTTP_CONNECTOR0 error details:
Second, I try to have errors from the HTTPConnector sent to output port 1 hoping that I can have the graph not fail when the above happens, but after configuring the error mapping I get the below initialization error in the log:
ERROR [main] - Error: Line 5 column 7 - Line 5 column 17: Cannot write to output port ‘1’! [Either the port has no edge connected or the operation is not permitted…]
ERROR [main] - Graph configuration is invalid.
ERROR [main] - [HTTP connector:HTTP_CONNECTOR0] - Initialization failed. Error output mapping is invalid.
ERROR [main] - Error during graph initialization !
Element [1312941581690:LoadData]-Graph configuration is invalid.
at org.jetel.graph.runtime.EngineInitializer.initGraph(EngineInitializer.java:263)
at org.jetel.graph.runtime.EngineInitializer.initGraph(EngineInitializer.java:239)
at org.jetel.main.runGraph.runGraph(runGraph.java:377)
at org.jetel.main.runGraph.main(runGraph.java:341)
Caused by: org.jetel.exception.ConfigurationException: [HTTP connector:HTTP_CONNECTOR0] - Initialization failed. Error output mapping is invalid.
at org.jetel.exception.ConfigurationProblem.toException(ConfigurationProblem.java:156)
at org.jetel.exception.ConfigurationStatus.toException(ConfigurationStatus.java:106)
… 4 more
And the associated part of the graph is:
<!\[CDATA\[//#CTL2// Transforms input record into output record.
function integer transform() {
$out.1.Data = $in.1.errorMessage;return OK;
}// Called during component initialization.
// function boolean init() {}// Called during each graph run before the transform is executed. May be used to allocate and initialize resources
// required by the transform. All resources allocated within this method should be released
// by the postExecute() method.
// function void preExecute() {}// Called only if transform() throws an exception.
// function integer transformOnError(string errorMessage, string stackTrace) {}// Called during each graph run after the entire transform was executed. Should be used to free any resources
// allocated within the preExecute() method.
// function void postExecute() {}// Called to return a user-defined error message when an error occurs.
// function string getMessage() {}
]]>
<![CDATA[//#CTL2// Transforms input record into output record.
function integer transform() {
$out.0.URL = $in.0.RSS_Item_Link;return OK;
}// Called during component initialization.
// function boolean init() {}// Called during each graph run before the transform is executed. May be used to allocate and initialize resources
// required by the transform. All resources allocated within this method should be released
// by the postExecute() method.
// function void preExecute() {}// Called only if transform() throws an exception.
// function integer transformOnError(string errorMessage, string stackTrace) {}// Called during each graph run after the entire transform was executed. Should be used to free any resources
// allocated within the preExecute() method.
// function void postExecute() {}// Called to return a user-defined error message when an error occurs.
// function string getMessage() {}
]]>
<![CDATA[//#CTL2// Transforms input record into output record.
function integer transform() {
$out.0.Item_ID = $in.0.Id;return OK;
}// Called during component initialization.
// function boolean init() {}// Called during each graph run before the transform is executed. May be used to allocate and initialize resources
// required by the transform. All resources allocated within this method should be released
// by the postExecute() method.
// function void preExecute() {}// Called only if transform() throws an exception.
// function integer transformOnError(string errorMessage, string stackTrace) {}// Called during each graph run after the entire transform was executed. Should be used to free any resources
// allocated within the preExecute() method.
// function void postExecute() {}// Called to return a user-defined error message when an error occurs.
// function string getMessage() {}
]]>
I have output port 1 connected to Trash with a very simple metadata schema on the edge of a single string value called “Data”.
Any ideas how to resolve? Thanks!