[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-talk] Hidden Services

> In almost all cases (99% or higher), robots.txt is used to indicate
> that a site shouldn't be crawled, *because* they don't want it
> to be indexed. The intention is painfully clear...

Not really, maybe they could care less about the index, but don't
want crawlers looping through all their bandwidth, blowing out
their logs, skewing their stats, etc.

> And they should all move to places where they won't be killed for
> ...
> But that doesn't make it right to act in a way that can be expected
> to harm people when you know better and can avoid it.

People are going to publish links whether or not the site wishes
to be crawled, indexed, found, outed, whatever. Phone books exist,
deal with it.

Those who wish to find, kill, or play you don't care about robots,
noindex, passwords, laws, wishes, infiltrating your secret online
or real life networks or anything else.

Whether or not Tor2Web publishes its domain/url list is moot because
somewhere, somehow, someone already collated and published it. Not
least of which are those bad political opressors who already tapped
Tor2Web's clearnet and whatever else to get it.

Re: the original topic... Tor2Web would obtain its domains/urls via
being a *proxy*, not crawling. Robots/noindex is not part of that.

> rendezvous collection

Operations involving the dirservs have been deprecated, it's now

As to being any specific relay [the RP?], not sure. But if so, the
domain view there is going to be narrow and slow going. Someone
who has read that part of the design could answer...

> Hopefully some kind of NG onion would include addition data in

tor-talk mailing list