[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] GSoC: Ahmia.fi - Search Engine for Hidden Services



-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 25.04.2014 17:27, George Kadianakis wrote:
> Juha Nurmi <juha.nurmi@xxxxxxxx> writes:
> 
>> On 22.04.2014 17:35, George Kadianakis wrote:
>>> Enjoy GSoC :)
>> 
>> I will :)
>> 
>>> BTW, looking again at your proposal, I see that you are going
>>> to do both popularity tracking and backlinks.
>> 
>> Yes, another crawler gathers backlinks from the public WWW and I
>> will start gathering the URL clicks from the users.
>> 
>>> How are these two technologies going to interact with each
>>> other? That is, how will the indexer consider the output of
>>> those two features?
>> 
>> Django front-end re-sorts the answers from YaCy back-end.
>> 
>> See https://ahmia.fi/static/gsoc/re_sort.jpg
>> 
>> I have this idea in mind: https://ahmia.fi/static/gsoc/sorter.py
>> 
>> The result is sorted according to YaCy result index, number of 
>> backlinks and clicks which are scaled.
>> 
>> Note the scaling:  p_info.backlinks = 1 / (float(index) + 1)
>> etc.
>> 
>> sum_function = 3.0*self.yacy + 2.0*self.backlinks +
>> 1.0*self.clicks
>> 
>> where 3, 2 and 1 are test coefficients. I will optimize these and
>> made a better model if necessary. However, clicks are easily
>> spoofed and there have to be small coefficient for them.
>> 
> 
> That makes sense.
> 
> BTW, what is the 'yacy' score? Is it just the order that YaCy's 
> indexer chose for each result? Or does YaCy actually expose a
> score for each result? How is the score derived? Or do you treat it
> as a blackbox and assume it's the most accurate of backlinks and 
> popularity.
> 

I am using only the order information.

BTW, we (Mikko installed new servers) are migrating YaCy servers and
took down the old one system. There should be a working crawler +
fresh full text search results soon :)

- -Juha
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJTXK5uAAoJELGTs54GL8vA1bcH/R/8xYJMCk7rc296/UBWBlaX
SDGYO/85EjbdBUokleQAZ8odxrV+rNCbsWMbncddo8QLxl6w99tS9Wz1ehZ+KOI2
beSCSEdS46gnztoGTRrRos4YFxEfbq708wFUh0CDQbzeT9doBX6dAV62FXhP8Fgm
sY/YvqNMJSBnqqlojsAfHV70IorjveEJ23pnktX8fcfkTqM+xBIVk0Ul2zggQNW+
c/d9SuaZLDB2Fdbsch4Ip3Tln8C/tLF7HC1cyRh7QDwU1zmr8UUe0N3mmzwEqUWA
h/uD/U3yZSNQfGrSI8/19QjvsDqCdoWIP/i78B90iIZhJ8YNlyN+cydb1O+cj9A=
=Dfu/
-----END PGP SIGNATURE-----
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev