PDA

View Full Version : spiders


ukpl1
06-04-2005, 11:28 PM
sorry if this is not the right place for the question. im new to all of this, and need to know a few things about spiders

I was reading a thread here and someone said that the spider stoped after only 4 pages? how would you get to know this info, how do you know if you have been visited by a spider, and last of all what is pr1 pr2 etc...

and how did this happen, i am building a site, its online just to test it, the domain is new, how did i find my site in msn search engine? 1 its not ready for that, and the text they use is being changed, is there a way i can make msn remove the text or is it going to be like that all the time. Thanks

John

Mikkel deMib Svendsen
06-05-2005, 04:40 AM
Welcome to the forums, ukpl1.

You will usually get the best feedback and answers here if you try to focus your posts to just one or two related questions. It seems that you have many, but let's take a look at the most important ones for now:

someone said that the spider stoped after only 4 pages?

You must have got that wrong. Spiders visit as many pages from every website as they can and that is often a lot more than 4 pages. sometimes they crawl millions of pages from one website.


how do you know if you have been visited by a spider

You can either see it (later) if you turn up the engines search results, or you can check your server log files and look for the agent names of the spiders your are looking for.


and how did this happen, i am building a site, its online just to test it, the domain is new, how did i find my site in msn search engine? 1 its not ready for that, and the text they use is being changed, is there a way i can make msn remove the text or is it going to be like that all the time. Thanks

Search engines index what they can find on the web so if you publish something - open for all, they will spider it. They don't ask for permission first. So if you have a "test" website or pages you don't want them to crawl or index you can either stop them with access control on your server or set up a robots.txt file to tell them to stay out. You can read more about how to implement robots.txt at www.robotstxt.org