sg: Re: Do 403, 404, and 410 get different results?

Hi, Shawn ...

Sorry for the very belated reply to your posting. In the flurry of
other things going on, I must have clean missed it. No discourtesy
intended. You said:

>Or, do what I do - use serverside limits to prevent spiders explicitly
>from accessing those sections of your site you forbid.

Yes, that's exactly what we do. Once we closely observed how these
critters actually behave, it became clear that anything less than
outright fascist control of our resources would end badly.

The case for providing an "Allow" field does not contradict the purpose
of an "exclusion" standard, but simply provides more economical syntax
for some situations. For example, if I want to exclude access to all
areas of a site apart from one directory, the "Allow" field makes this
easy:

Disallow: /
Allow: /ok_to_index_this.html

... rather than having to list *every* individual file and subdirectory
which bots are *not* to visit. If that got expressed as a new standard
("robots.xml", perhaps, as you suggest), that would be fine also.

Actually, though, I continue to believe sitemaps would be the exactly
right and total answer if regarded as gospel when present, rendering
the robots.txt file moot. Since standards tend to be defined by
marketshare rather than committee, there is always hope that Google
will sooner or later figure out that no fancy search algorithm is
required if the webmaster's preferences are followed to the letter
rather than being second-guessed and worked around at every turn.
While the present "game" may offer an interesting intellectual game for
bot programmers, it's pure administrative flap-doodle and wasted time
for webmasters. Resistance is not futile, but it is *very* expensive.

bud

sg

Önceki Mesajlar

Cuma, Aralık 16, 2005

Re: Do 403, 404, and 410 get different results?

0 Comments: