[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] GSoC - Search Engine for Hidden services



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 24.03.2014 13:57, George Kadianakis wrote:
> Replying to some new additions in the proposal:
> 
>> Thanks asn! "Ask help from organizations that are crawling" Today
>> I emailed to duckduckgo and asked is there an easy way to search
>> new .onions using their search engine. "Checking out the
>> backlinks from public WWW" With known onion address it is
>> possible to find the popularity of an address checking the number
>> of search results: 
>> https://duckduckgo.com/?q=%22http%3A%2F%2Fjlve2y45zacpbz6s.onion%22
>>
>> 
and https://www.google.com/#q=%22http:%2F%2Fjlve2y45zacpbz6s.onion%22
>> and
>> https://www.google.com/#q=link:http:%2F%2Fjlve2y45zacpbz6s.onion 
>> This way I will get a list that tells the popularity according
>> to links from the public WWW: onion address & number of WWW sites
>> that are linking to it xyz.onion 123 abc.onion 90 uio.onion 24
>> mre.onion 17 Today I asked from the YaCy's developer how could I
>> use this information. "Commenting features" I agree that
>> commenting might be a mouth of madness because people might write
>> just some random crap there. Technically this would be developed
>> to the Django framework. Note that the priority of this task is
>> low (10). We could decide to leave this commenting feature to the
>> very last task or skip it.
> 
> ACK wrt commenting.
> 
> As far as backlinks are concerned, while I appreciate how rapid
> and easy your solution is, you might want to make it a bit more
> robust.
> 
> The way you did it, you treat the 123 references to 'xyz.onion',
> as strictly better than the 90 references to 'abc.onion'. This is
> not the case in the real web, since the 123 references to
> 'xyz.onion' might be SEO and they might be coming from xyz.onion
> itself or related websites.
> 
> Proper search engines assign weights to each backlink, according
> to how legit the search engine believes the linker to be. This has
> to do with how many backlinks the linker had, and how legit the
> HTML content of the linker looks like, etc. You can find more
> heuristics that search engines use by skimming an SEO book or an
> SEO forum.
> 
> It's up to you how deep you want to go into backlinking during
> GSoC, but IMO backlinking is a more reliable heuristic than
> popularity tracking. Up to you anyway!
> 
> _______________________________________________ tor-dev mailing
> list tor-dev@xxxxxxxxxxxxxxxxxxxx 
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
> 

We could test the reliability of the linkers too. As you said, there
are multiple methods to do this. Because the number of .onions and the
linkers is relatively small we can analyze the linking sites too.
Usually there are <10 sites linking to an .onion site.

- -Juha
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTMmA4AAoJELGTs54GL8vAggAH/A/n6mVtrAxWNaJ4pvqevw+l
gIDpW69HDgP3431jEeH6n8WqN42AbfAxvqBb+cUtPSvUDV+ihopxK/aUs88mexjd
kLpsPbzT84idYRxNP1w/nt4r7uUjSTEEL/XBG0CEv5IAyzZIe+kzYm2ghIW7RRKp
BwIEyJcYLMDPnlAjZEkFJ2D06CghmUJYxNwywyIcrDLQi/4yhzE0bpxPg7axfo5h
yfjN3z6kogrDY0dHmQ6ljC7RawVc2TyfWDcIo/NghIjHQkon+JRY+s0s49c/Nng3
n8da1/UwCLXB5g/tW9NcOUNpvFhwSDIRimIHASMuw0s3OvQoU6KT43AtDGQg6Nw=
=wcJt
-----END PGP SIGNATURE-----
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev