Ten Spider Business Blog
Topic Thread:      Business Center   »   Ten Spider Business Blog

Recommended:   Spider Silk Link Exchange Partners,    Website Link Exchange Policy,    Feature of the Week Archive


Ten Spider Business Blog


Bot Watch - Recently-Detected Web Spiders & Bots

voyager/1.0 -- This bot provides no identification. It has been visiting our websites for some time now, but we could find little info about it other than the fact that it appears in the access logs of numerous other websites. There has been a great deal of uncertainty relating to this bot because there are a variety of web applications claiming title to "Voyager 1.0", including an internet adventure game. We have been blocking this bot on the assumption that it was malicious.


MORE ....

Recently, primarily due to the persistence of the bot despite our blocking, I decided to reinvestigate. Viola! It is now identified. Voyager belongs to Kosmix Corporation (This company seems not to have learned how to spell its own name, as the copyright notice lists it as "Cosmix Corporation".); voyager/1.0 is the Kosmix web crawler. The company now provides an official web crawler page where you can read about Voyager, so they now pass our test for one of the rules of web crawler etiquette. They still do not show the URL of their web crawler page or an email address when their bot visits; I had to hunt for that info. I have fired off an email to the company suggesting that they add this information, but have received no reply as of this writing.

voyager/1.0 was formerly cfetch/1.0, another mystery bot. The company claims that its bot follows the Robots Exclusion Protocol. We have not verified this, but have noticed no attempt at excessive pageloads. voyager/1.0 seems to visit only a few pages at a time. The company further claims that Voyager follows the robots.txt "Crawl-delay" directive -- which Googlebot still does not recognize.

We have observed Voyager on our websites using IP addresses beginning with "38.112.". The company states that the bot's IP will vary over time, so screening by IP may prove futile. As the company appears legitimate, though young, we are now allowing "Voyager" access to our website.

As more search engine companies emerge to stake a claim on the internet, we sincerely hope that they will learn to employ proper web crawling etiquette: provide a crawler explanation page; insert the URL of this page into their bot's User Agent string; obey robots.txt directives and the Robots Exclusion Protocol; and do not perform pageloads at an excessive rate so as to overburden hosting servers. Perhaps if these lessons are learned, minor search engines will not find their bots being constantly banned by webmasters.


Posted by: The Spidermaster on Apr 05, 06 | 3:00 pm | Profile

COMMENTS



Notify me when someone replies to this post?





Powered by pMachine


Receive updates to this and other pages on Twitter!


Ten Spider™ and tenspider™ are trademarks of Ten Spider Enterprises, LLC, and are protected by United States and international trademark laws.
Valid XHTML 1.0!