[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] GSoC: Ahmia.fi - Search Engine for Hidden Services



Juha Nurmi <juha.nurmi@xxxxxxxx> writes:

> On 22.04.2014 17:35, George Kadianakis wrote:
>> Enjoy GSoC :)
>
> I will :)
>
>> BTW, looking again at your proposal, I see that you are going to
>> do both popularity tracking and backlinks.
>
> Yes, another crawler gathers backlinks from the public WWW and I will
> start gathering the URL clicks from the users.
>
>> How are these two technologies going to interact with each other?
>> That is, how will the indexer consider the output of those two
>> features?
>
> Django front-end re-sorts the answers from YaCy back-end.
>
> See https://ahmia.fi/static/gsoc/re_sort.jpg
>
> I have this idea in mind: https://ahmia.fi/static/gsoc/sorter.py
>
> The result is sorted according to YaCy result index, number of
> backlinks and clicks which are scaled.
>
> Note the scaling:  p_info.backlinks = 1 / (float(index) + 1) etc.
>
> sum_function = 3.0*self.yacy + 2.0*self.backlinks + 1.0*self.clicks
>
> where 3, 2 and 1 are test coefficients. I will optimize these and made
> a better model if necessary. However, clicks are easily spoofed and
> there have to be small coefficient for them.
>

That makes sense.

BTW, what is the 'yacy' score? Is it just the order that YaCy's
indexer chose for each result? Or does YaCy actually expose a score
for each result? How is the score derived? Or do you treat it as a
blackbox and assume it's the most accurate of backlinks and
popularity.

Thanks!
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev