Request quote

script to create a php site crawl robot easily

Posted on: August 30th, 2012 by Mohammed Aqeel 4 Comments

The following tutorial is a simple example of how to create a site crawl robot with php. This can be used for search engines, creating caches of your sites at regular intervals etc. The crawler uses php’s CURL library to fetch the URL contents. It will then look for an anchor tag inside the page contents and will crawl those pages also. Here is the script.

If you have any other easier or better way of doing it, dont forget to post it below in the comments section.

Tags: , , , ,

4 Responses

  1. iMi says:

    It fails if a site uses absolute links.

  2. jules says:

    What does the extension_loaded(‘http’) function do?

  3. George says:

    Trying to make this script work so I can crawl my website for caching purposes. It seems to work OK.
    What I wanted to ask you is if it’s possible to limit the URLs crawled within the same domain.
    If I give more depth say 10 or 15 then it goes to my social links (FB, G+, Twitter etc) and I get 500 server error.
    I don’t know programming. What’s the line I need to change?