How to submit requests to web sites programatically using HttpWebRequest

This article is more of a guide on how to programatically execute some actions on web sites. I took example of submission of links to DZone to illustrate this concept. These days there is lot of emphasis on submitting or sharing your post, links, articles with whole community so I thought this example of DZone link submission will work out good. At the heart of implementing this whole concept is HttpWebRequest and HttpWebResponse objects. Since I am using .Net framework, so thats why I mentioned these classes. But behind the scene it is as simple as sending HTTP request and analyzing response. So you can use whatever tool you have at hand. I will explain each step that I followed to come up with the solution. These steps pretty much work for all kind of applications.

Use web site to perform the action

First you need to analyze what action you are performing and how does web site send its request and what kind of response is returned. These two analysis steps are what drive this whole solution. Lets take example of submitting a new link from DZone site. You clink on "Add a new link" and you are taken to a new page where it is asking you to login. Then you login and you are sent to page where you supply values for URL, Title, Description, Tags etc. Then you click "Submit" button and you are done. So based on this, following are the steps that you need perform programatically.

  • Submit request to add link
  • Catch redirect to login page
  • Perform login into site
  • Send request to add page after usccessful login

Now to see what browser is doing perform all these action, fire up tool like Fiddler and monitor all request/response for these actions. So if you can mimic these action, you are good to go. Now lets see how you will perform this action programatically.

Submit Add Request

You will be using HttpWebRequest object to send request to http://www.dzone.com/links/add.html. At this point you do not have to worry about specifying any other parameters like Title, Url etc. because your request is not going to go through because you are not logged in to the site. In technical terms, you have no established an authenticated session with the site.

Catch Redirect To Login Page

When you send unauthorized request to add link, the site will redirect you to login page. What this means is that when you send HTTP request to access add.html page, server sends HTTP response with status code 302 which means that response is being redirects. And with that response, it sends the redirection location in Location header in response. So programatically you need to submit a request, look for response status code and find Location header. The code as shown below.

static string GetLoginUrl(CookieContainer cookies, string targetUrl)
{
	int hops = 1;
	int maxRedirects = 20;
	bool foundIt = false;
	HttpWebRequest webReq;
	string loginUrl = targetUrl;
	do
	{
		webReq = WebRequest.Create(loginUrl) as HttpWebRequest;
		webReq.CookieContainer = cookies;
		webReq.AllowAutoRedirect = false;
		string msg = string.Format("Hope[(0) - {1}", hops++, loginUrl);
		Debug.WriteLine(msg);
		HttpWebResponse webResp = webReq.GetResponse() as HttpWebResponse;
		webResp.Close();
		if (webResp.StatusCode == HttpStatusCode.Found)
		{
			loginUrl = webResp.Headers["Location"] as String;
		}
		else
		{
			foundIt = (webResp.StatusCode == HttpStatusCode.OK);
			break;
		}
	} while (hops <= maxRedirects);
	return foundIt ? loginUrl : string.Empty;
}

Notice that code is in while loop. The reason being that some sites actually can redirect you to couple of pages before sending you to final login page. So I have limited the loop to 20 hops.

Cookies

This is the most part of the whole implementation. When you start a session with a site, it sends some cookies in response. And it expects some of those cookies sent in subsequent requests to make sure that you have an authorized session open. You if you look at the code above, I have attached a CookieContainer object to request to make sure that all the cookies sent in response are collected. And then this container can be attached with subsequest requests.

Perform Login

When you perform login on site, it does a FORM submission to server with some key-value pairs that contain the data required to validate user. You can use IE Toolbar, FireBug or any other tool to inspect HTML of the page to locate the FORM tag and values that need to be sent. I used FireBug to inspect that section to find out the values that I need. Following images show show the result.

login box inspection

You can see that there is a FORM with POST action pointing to /links/j_acegi_security_check. And you will find that it has two text boxes with element names j_username and j_password that take login information and are used to submit data with POST request. So these are the pieces of information you needed to perform login action. Following code shows how this is accomplished.

RequestAttributes reqAttribs = new RequestAttributes();
reqAttribs.OverrideConfigurationSettings = true;
reqAttribs.AllowSecureSiteCrawl = true;
reqAttribs.AutoRediectEnabled = false;
reqAttribs.MaxRedirects = 100;
reqAttribs.IsPost = true;
reqAttribs.RequestUrl = "http://www.dzone.com/links/j_acegi_security_check";
reqAttribs.CookieContainer = container;
reqAttribs.RequestParameters.Add("j_username", "xxxxxx");
reqAttribs.RequestParameters.Add("j_password", "xxxxxx");
HttpProtocol obHttp = new HttpProtocol(reqAttribs);
HttpProtocolOutput obOutput = obHttp.GetProtocolOutput();

Did Login Succeed?

After you have execute above request and you got response back. Now big question you will ask is how do I check if login succeeded or not. You can't rely on status code of response because if will be 200 meaning request succeeded. There are couple of things that you can check. Some sites will redirect you to a landing page so you can check if you got 302 response code. Or sure way to check is to parse the reponse and see if you have login box on the page. For example in case of DZone.com site, you can check if there is a markup node on the page that has name attribute with value of j_username or any mark up that is unique to login page. if you will find that node, that means login did not work. Here is some sample code that I used for my application.

static bool CheckLoginStatus(HttpProtocolOutput loginRespOutput)
{		
	ParserStream obStream = 
	  new ParserStream(new System.IO.MemoryStream(loginRespOutput.Content.ContentData));
	Source obSource = 
	  new InputStreamSource(obStream, null, loginRespOutput.Content.ContentData.Length);
	Page obPage = new Page(obSource);
	obPage.Url = "http://www.dzone.com/links/j_acegi_security_check";
	Lexer obLexer = new Lexer(obPage);
	Parser obParser = new Parser(obLexer);

	HasAttributeFilter filter = new HasAttributeFilter("name", "j_username");
	NodeList oNodes = obParser.ExtractAllNodesThatMatch(filter);
	return (oNodes.Count == 0);
}

Submit new request with authorized session

During this whole process of login process and redirections make sure that you keep the cookie coontainer around so that keeps collecting all the cookies. You are going to need this cookie container to send request to submit your links. Now you just need to send a new POST request to target URL with appropriate FORM parameters like title, url and description.

Dealing with ASP.Net sites

When you are simulating login process or any other postback process there are some additional details that you have to worry about. Read my post How to login into ASP,Net application programatically about those details.

Sample Project

I have used HTMLParser.Net Pro for parsing of page response. You can use whatever tool you like for parsing pages. Download the following file that implements sample console application for which I showed some code snippets above.

Download Sample Project

blog comments powered by Disqus