[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [tor-dev] [GSOC 16] Ahmia status update #6



Thanks Ismael!

Great work. You are very productive.

-Juha

On Sat, Aug 13, 2016 at 1:02 AM, Ismael R <zma@xxxxxxxxxx> wrote:
Hi everyone,

I'm working on ahmia.fi, the hidden service search engine and you're reading
status update #6.

During the last two weeks, I finished porting the django app to the new
structure. I'm also working on last minute things before shipping the new site
online.

I will continue updating documentation and add some unit tests to the project.

The code is not merged yet but you're welcome to check it on my forks. [1] [2]


Since this status report is short, here is a list of goals I had in my initial
project proposition and what work has been done on each.

Review code and infrastructure:
- Split the project in several repositories
- Improve documentation
- Automate testing (Travis.CI)
- Track code quality (Landscape.IO)
- Track requirements (Requires.IO)
- Refactor each subproject

Improve search results:
- Better use of elasticsearch (use of stemmers, shingles, term-centric search)
- Search results are now pages instead of domains.

Improve UI/UX:
Not much work has been done for this goal. The website has been in the process
of porting old pages to a new design. All pages are now using the new design.

Gather more statistics:
- Pagerank is now used to compute an authority score for each page
- I suggested that we could use a self hosted statistics framework like piwik
[3] but no decision has been made.

Use stats to better rank search results:
- Results are ranked by authority score.

Make sense of the indexed info to understand a search meaning:
- Shingles enable us to differenciate these two queries: "i'm not happy i'm
working" and "i'm happy i'm not working".
- Synonyms could be used by the search algorithm if we provided a synonym
dictionnary. No work has been done at making this work.

Make a google trend-like interface to visualize searches over time:
No work has been done to reach this optional goal. Even some stats
fonctionnalities were dropped in the new site because they were "domain-
centric" when a search engine needs to be "page-centric". We could probably
index searches in elasticsearch and use Date Histogram Aggregation [4] to
display trends.

Make stats available with the API:
No work has been done to reach this optional goal. Some API endpoints were
also dropped because they were domain-centric. It would be great to have an
API with a coherent url scheme. I think Django Rest Framework can help design
that API while keeping the code simple.


That's it for this week,
Have a nice weekend.

Ismael R.


[1] https://github.com/iriahi/ahmia-site
[2] https://github.com/iriahi/ahmia-crawler
[3] https://piwik.org/
[4] https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-datehistogram-aggregation.html
[5] http://www.django-rest-framework.org/
_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

_______________________________________________
tor-dev mailing list
tor-dev@xxxxxxxxxxxxxxxxxxxx
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev