Robots or Spiders are the algorithm written by humans that visit and crawls your web pages. There too many robots on the web and sometimes it become to figure out which of them are good and which are spammers. Surely, nobody would want that his or her web page be crawled by a spam robot.
What exactly are spam robots?
You must have heard about ‘bad robots’. Bad robots or spam robots are a special class of algorithms that overloads the network and servers. Presently, there have been no international standard set of rules to write a robot. Being written by human and in lack of any proper set or rules and guidelines, the robots are prone to configuration mistakes.
How can I keep spam robots out of my website?
You can restrict some or all robots, which you do not want, should visit your website. The code to restrict all of them is:
User-agent: *
Disallow: /
In case you wan to prevent some specific search engine robots, you will have to replace the asterisk with specific agents (robot’s algorithm) name.
In case you wan to prevent search engine robots to crawl some specific page, you will have to replace the slash (/) with the page URL. Alternatively, you can keep all the files that you don’t want a robot to scan in a directory and disallow the same as follows:
User-Agent: *
Disallow: /norobots/
You can also specify whether page is to be indexed by the robots, or they should follow links, directly in the page through a META tag. If you include a tag like:
META NAME="ROBOTS" CONTENT="NOINDEX"
in your HTML document, that document won't be indexed.
If you do:
META NAME="ROBOTS" CONTENT="NOFOLLOW"
the links in that document will not be parsed by the robot.
There is no hard & fast rules regarding creation of robots file and one can conveniently do without the same. However, this is a good approach when you do not want to make any particular section of your website public.
Monday, October 22, 2007
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment