[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] GSoC - Search Engine for Hidden services



Replying to some new additions in the proposal:

> Thanks asn! "Ask help from organizations that are crawling" Today I
> emailed to duckduckgo and asked is there an easy way to search new
> .onions using their search engine. "Checking out the backlinks from
> public WWW" With known onion address it is possible to find the
> popularity of an address checking the number of search results:
> https://duckduckgo.com/?q=%22http%3A%2F%2Fjlve2y45zacpbz6s.onion%22
> and https://www.google.com/#q=%22http:%2F%2Fjlve2y45zacpbz6s.onion%22
> and https://www.google.com/#q=link:http:%2F%2Fjlve2y45zacpbz6s.onion
> This way I will get a list that tells the popularity according to
> links from the public WWW: onion address & number of WWW sites that
> are linking to it xyz.onion 123 abc.onion 90 uio.onion 24 mre.onion 17
> Today I asked from the YaCy's developer how could I use this
> information. "Commenting features" I agree that commenting might be a
> mouth of madness because people might write just some random crap
> there. Technically this would be developed to the Django
> framework. Note that the priority of this task is low (10). We could
> decide to leave this commenting feature to the very last task or skip
> it.

ACK wrt commenting.

As far as backlinks are concerned, while I appreciate how rapid and
easy your solution is, you might want to make it a bit more robust.

The way you did it, you treat the 123 references to 'xyz.onion', as
strictly better than the 90 references to 'abc.onion'. This is not the
case in the real web, since the 123 references to 'xyz.onion' might be
SEO and they might be coming from xyz.onion itself or related websites.

Proper search engines assign weights to each backlink, according to
how legit the search engine believes the linker to be. This has to do
with how many backlinks the linker had, and how legit the HTML content
of the linker looks like, etc. You can find more heuristics that
search engines use by skimming an SEO book or an SEO forum.

It's up to you how deep you want to go into backlinking during GSoC,
but IMO backlinking is a more reliable heuristic than popularity
tracking. Up to you anyway!

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev