Thursday, March 3, 2011

C# - WebRequest Doesn't Return Different Pages

Here's the purpose of my console program: Make a web request > Save results from web request > Use QueryString to get next page from web request > Save those results > Use QueryString to get next page from web request, etc.

So here's some pseudocode for how I set the code up.

 for (int i = 0; i < 3; i++)
        {
            strPageNo = Convert.ToString(i);  

            //creates the url I want, with incrementing pages
            strURL = "http://www.website.com/results.aspx?page=" + strPageNo;   

            //makes the web request
            wrGETURL = WebRequest.Create(strURL);

            //gets the web page for me
            objStream = wrGETURL.GetResponse().GetResponseStream();

            //for reading web page
            objReader = new StreamReader(objStream);

            //--------
            // -snip- code that saves it to file, etc.
            //--------

            objStream.Close();
            objReader.Close();

            //so the server doesn't get hammered
            System.Threading.Thread.Sleep(1000); 
         }

Pretty simple, right? The problem is, even though it increments the page number to get a different web page, I'm getting the exact same results page each time the loop runs.

i IS incrementing correctly, and I can cut/paste the url strURL creates into a web browser and it works just fine.

I can manually type in &page=1, &page=2, &page=3, and it'll return the correct pages. Somehow putting the increment in there screws it up.

Does it have anything to do with sessions, or what? I make sure I close both the stream and the reader before it loops again...

From stackoverflow
  • That URL doesn't quite make sense to me unless you are using MVC or something that can interpret the querystring correctly.

    http://www.website.com/results.aspx&page=
    

    should be:

    http://www.website.com/results.aspx?page=
    

    Some browsers will accept poorly formed URLs and render them fine. Others may not which may be the problem with your console app.

    Matt S : I fudged it for the pseudocode. It's correct in my program. What else could be the problem?
    JB King : What is the user agent that the server is seeing from the console program? Maybe the server is handling a request from nothing differently than the request from a specific browser. There is also the question of how well does the program resolve the DNS of the web request that may be something....
  • Have you tried creating a new WebRequest object for each time during the loop, it could be the Create() method isn't adequately flushing out all of its old data.

    Another thing to check is that the ResponseStream is adequately flushed out before the next loop iteration.

    Matt S : I can't make a new WebRequest object because I get the "it's a 'method' but you're using it like a 'type'" error. I put objStream.Flush() method at the end of the loop, but with no success :(
  • Just a suggestion, try disposing the Stream, and the Reader. I've seen some weird cases where not disposing objects like these and using them in loops can yield some wacky results....

    Matt S : I added objStream/objReader.Dispose() to the end of the loop with no luck :( I even took Dillie-O's advice and put objStream.Flush() at the end, but it didn't help either...
  • This code works fine for me:

    var urls = new [] { "http://www.google.com", "http://www.yahoo.com", "http://www.live.com" };
    
    foreach (var url in urls)
    {
        WebRequest request = WebRequest.Create(url);
        using (Stream responseStream = request.GetResponse().GetResponseStream())
        using (Stream outputStream = new FileStream("file" + DateTime.Now.Ticks.ToString(), FileMode.Create, FileAccess.Write, FileShare.None))
        {
            const int chunkSize = 1024;
            byte[] buffer = new byte[chunkSize];
            int bytesRead;
            while ((bytesRead = responseStream.Read(buffer, 0, buffer.Length)) > 0)
            {
                byte[] actual = new byte[bytesRead];
                Buffer.BlockCopy(buffer, 0, actual, 0, bytesRead);
                outputStream.Write(actual, 0, actual.Length);
            }
        }
        Thread.Sleep(1000);
    }
    
  • Here's my terrible, hack-ish, workaround solution:

    Make another console app that calls THIS one, in which the first console app passes an argument at the end of strURL. It works, but I feel so dirty.

    Matt Dawdy : Bad. Find this line in your code (that isn't shown) WebRequest wrGETURL; Put that line INSIDE your loop. All will be right with the world now. By the way, this is what Dillie-O was saying.

0 comments:

Post a Comment