Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL

Category: Computer Science
Author: Michael Schrenk
4.3
All Stack Overflow 7
This Month Stack Overflow 2

Comments

by anonymous   2019-07-21

You may want to check the following books:

"Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL" http://www.amazon.com/Webbots-Spiders-Screen-Scrapers-Developing/dp/1593271204

"HTTP Programming Recipes for C# Bots" http://www.amazon.com/HTTP-Programming-Recipes-C-Bots/dp/0977320677

"HTTP Programming Recipes for Java Bots" http://www.amazon.com/HTTP-Programming-Recipes-Java-Bots/dp/0977320669

by Bill the Lizard   2017-08-20

Not a tutorial, but I can recommend the book Webbots, Spiders, and Screen Scrapers.

by crono   2017-08-20

There is a Book "Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL" on this topic - see a review here

PHP-Architect covered it in a well written article in the December 2007 Issue by Matthew Turland

by anonymous   2017-08-20

You can create a script that scrapes the content of that link. The problem is that you have to maintain that script everytime that the website gets updated.

As the form doesn't have any captcha or a mechanism to prevent automated queries you can setup something easy.

You can make the post request using CURL:

//set POST variables
$url = 'http://220.225.242.179/locm.asp';
$fields = array(
    'mrn' => "406691",
);

//url-ify the data for the POST
foreach($fields as $key=>$value) { $fields_string .= $key.'='.$value.'&'; }
rtrim($fields_string, '&');

//open connection
$ch = curl_init();

//set the url, number of POST vars, POST data
curl_setopt($ch,CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_POST, count($fields));
curl_setopt($ch,CURLOPT_POSTFIELDS, $fields_string);

//execute post
$result = curl_exec($ch);

//close connection
curl_close($ch);

Take a look to the following links:

https://github.com/fabpot/goutte

http://www.jacobward.co.uk/web-scraping-with-php-curl-part-1/

http://www.amazon.com/dp/1593271204/?tag=stackoverfl08-20