HTTPConnector issues

I am having two issues with the HTTPConnector. First, a URL being pulled sends multiple redirects and one of the redirects has pipes (‘|’) in the query string which are not properly escaped during the attempt to GET that URL. See the below debug log:

DEBUG [HTTP_CONNECTOR0_0] - Creating GET request to http://feeds.newscientist.com/c/749/f/1 … tory01.htm
DEBUG [HTTP_CONNECTOR0_0] - Sending HTTP request:

GET /c/749/f/10899/s/289935a2/l/0L0Snewscientist0N0Carticle0Cmg217290A460B0A0A0A0Efake0Epointers0Eon0Eyour0Escreen0Efoil0Eshoulder0Esurfers0Bhtml0Dcmpid0FRSS0QNSNS0Q20A120EGLOBAL0Qtech/story01.htm HTTP/1.1
Host: feeds.newscientist.com
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.2 (java 1.5)

DEBUG [HTTP_CONNECTOR0_0] - Received HTTP response:

HTTP/1.1 301 OK
Server: FeedsPortal
Set-Cookie: MF2=19rfgzo1clmiu; expires=Wed, 11-Mar-15 20:12:37 GMT; path=/
Location: http://da.feedsportal.com/c/749/f/10899 … ch/ia1.htm
Content-Type: text/plain; charset=iso-8859-1
Content-Length: 0
Date: Mon, 11 Mar 2013 20:12:36 GMT
Connection: close

DEBUG [HTTP_CONNECTOR0_0] - Sending HTTP request:

GET /c/749/f/10899/s/289935a2/l/0L0Snewscientist0N0Carticle0Cmg217290A460B0A0A0A0Efake0Epointers0Eon0Eyour0Escreen0Efoil0Eshoulder0Esurfers0Bhtml0Dcmpid0FRSS0QNSNS0Q20A120EGLOBAL0Qtech/ia1.htm HTTP/1.1
Host: da.feedsportal.com
Connection: Keep-Alive
User-Agent: Apache-HttpClient/4.2 (java 1.5)
Cookie: MF2=19rfgzo1clmiu
Cookie2: $Version=1

DEBUG [HTTP_CONNECTOR0_0] - Received HTTP response:

HTTP/1.1 301 OK
Server: FeedsPortal
Location: http://www.newscientist.com/article/mg2 … LOBAL|tech
Content-Type: text/plain; charset=iso-8859-1
Content-Length: 0
Date: Mon, 11 Mar 2013 20:12:37 GMT
Connection: close

ERROR [HTTP_CONNECTOR0_0] - org.apache.http.client.ClientProtocolException.
DEBUG [HTTP_CONNECTOR0_0] - Execution unsuccessful:
org.apache.http.client.ClientProtocolException
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:909)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at org.jetel.component.HttpConnector.buildAndSendRequest(HttpConnector.java:1793)
at org.jetel.component.HttpConnector.process(HttpConnector.java:1760)
at org.jetel.component.HttpConnector.executeForRecord(HttpConnector.java:1913)
at org.jetel.component.HttpConnector.execute(HttpConnector.java:1860)
at org.jetel.graph.Node.run(Node.java:465)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.http.ProtocolException: Invalid redirect URI: http://www.newscientist.com/article/mg2 … LOBAL|tech
at org.apache.http.impl.client.DefaultRedirectStrategy.createLocationURI(DefaultRedirectStrategy.java:189)
at org.apache.http.impl.client.DefaultRedirectStrategy.getLocationURI(DefaultRedirectStrategy.java:140)
at org.apache.http.impl.client.DefaultRedirectStrategy.getRedirect(DefaultRedirectStrategy.java:209)
at org.apache.http.impl.client.DefaultRequestDirector.handleResponse(DefaultRequestDirector.java:1070)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:546)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
… 8 more
Caused by: java.net.URISyntaxException: Illegal character in query at index 116: http://www.newscientist.com/article/mg2 … LOBAL|tech
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.checkChars(URI.java:3002)
at java.net.URI$Parser.parseHierarchical(URI.java:3092)
at java.net.URI$Parser.parse(URI.java:3034)
at java.net.URI.(URI.java:595)
at org.apache.http.impl.client.DefaultRedirectStrategy.createLocationURI(DefaultRedirectStrategy.java:187)
… 13 more
ERROR [WatchDog] - Graph execution finished with error
ERROR [WatchDog] - Node HTTP_CONNECTOR0 finished with status: ERROR
ERROR [WatchDog] - Node HTTP_CONNECTOR0 error details:

Second, I try to have errors from the HTTPConnector sent to output port 1 hoping that I can have the graph not fail when the above happens, but after configuring the error mapping I get the below initialization error in the log:

ERROR [main] - Error: Line 5 column 7 - Line 5 column 17: Cannot write to output port ‘1’! [Either the port has no edge connected or the operation is not permitted…]
ERROR [main] - Graph configuration is invalid.
ERROR [main] - [HTTP connector:HTTP_CONNECTOR0] - Initialization failed. Error output mapping is invalid.
ERROR [main] - Error during graph initialization !
Element [1312941581690:LoadData]-Graph configuration is invalid.
at org.jetel.graph.runtime.EngineInitializer.initGraph(EngineInitializer.java:263)
at org.jetel.graph.runtime.EngineInitializer.initGraph(EngineInitializer.java:239)
at org.jetel.main.runGraph.runGraph(runGraph.java:377)
at org.jetel.main.runGraph.main(runGraph.java:341)
Caused by: org.jetel.exception.ConfigurationException: [HTTP connector:HTTP_CONNECTOR0] - Initialization failed. Error output mapping is invalid.
at org.jetel.exception.ConfigurationProblem.toException(ConfigurationProblem.java:156)
at org.jetel.exception.ConfigurationStatus.toException(ConfigurationStatus.java:106)
… 4 more

And the associated part of the graph is:

<!\[CDATA\[//#CTL2

// Transforms input record into output record.
function integer transform() {
$out.1.Data = $in.1.errorMessage;

return OK;
}

// Called during component initialization.
// function boolean init() {}

// Called during each graph run before the transform is executed. May be used to allocate and initialize resources
// required by the transform. All resources allocated within this method should be released
// by the postExecute() method.
// function void preExecute() {}

// Called only if transform() throws an exception.
// function integer transformOnError(string errorMessage, string stackTrace) {}

// Called during each graph run after the entire transform was executed. Should be used to free any resources
// allocated within the preExecute() method.
// function void postExecute() {}

// Called to return a user-defined error message when an error occurs.
// function string getMessage() {}
]]>
<![CDATA[//#CTL2

// Transforms input record into output record.
function integer transform() {
$out.0.URL = $in.0.RSS_Item_Link;

return OK;
}

// Called during component initialization.
// function boolean init() {}

// Called during each graph run before the transform is executed. May be used to allocate and initialize resources
// required by the transform. All resources allocated within this method should be released
// by the postExecute() method.
// function void preExecute() {}

// Called only if transform() throws an exception.
// function integer transformOnError(string errorMessage, string stackTrace) {}

// Called during each graph run after the entire transform was executed. Should be used to free any resources
// allocated within the preExecute() method.
// function void postExecute() {}

// Called to return a user-defined error message when an error occurs.
// function string getMessage() {}
]]>
<![CDATA[//#CTL2

// Transforms input record into output record.
function integer transform() {
$out.0.Item_ID = $in.0.Id;

return OK;
}

// Called during component initialization.
// function boolean init() {}

// Called during each graph run before the transform is executed. May be used to allocate and initialize resources
// required by the transform. All resources allocated within this method should be released
// by the postExecute() method.
// function void preExecute() {}

// Called only if transform() throws an exception.
// function integer transformOnError(string errorMessage, string stackTrace) {}

// Called during each graph run after the entire transform was executed. Should be used to free any resources
// allocated within the preExecute() method.
// function void postExecute() {}

// Called to return a user-defined error message when an error occurs.
// function string getMessage() {}
]]>

I have output port 1 connected to Trash with a very simple metadata schema on the edge of a single string value called “Data”.

Any ideas how to resolve? Thanks!

I did remove the error mappings from the HTTP Component and the mapping error goes away, and the graph will now run to completion, apparently forwarding errors to port 1. Only issue is I have no way of knowing what those errors were on port 1, as no mappings exist… Still interested in how to improve on this workaround. Thanks!

Hi,

Thank you for your post. As you have already pointed out, the URL is not valid (illegal charcters needs to be encoded to %-like form). However, based on your findings we have created a new ticktes in our issue tracking system (for more information, please refer to: http://bug.javlin.eu/browse/CL-2741 and http://bug.javlin.eu/browse/CL-2742).
In the meantime you might use an external utility that accepts non-valid addresses (such as wget) and retreive the file using it. Such utlity can be executed using SystemExecute.

Thanks. Any thoughts on why the Error Mapping is throwing the initialization error?

Hi,

it was bug - please take a look on https://bug.javlin.eu/browse/CL-2741 for details.