[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Search Thoughts & Questions




Cleaning up the retriction code of Search.pm tonight I realized a few
things.

1) Adding a 3rd document type (such as my suggested Offical Documents)
breaks my current search engine stratagy for processing URL's.

2) I think I know how to solve for #1 with some CGI param re-arranging or
DB magic.

Let's take the sample article in the DB:
http://dev.linuxkb.org:81/render.php3?op=1.3.5.7&oid=7&type=6&act=DisplayArticle&si=History(1:5.3:1.5:4)

Only a two params actually interest the search engine:
op - where this article is in the tree?
type - where did this document originate?

Ht://Dig has two ways of limiting hits by url:
restrict (URL must contain)
exclude (URL must NOT contain)

In both above cases, the value is a string, and has no rexep-like powers.
So if I wanted all documents of type 7 under 1.5.8:

&restrict=op=1.5.8&exclude=type=6

Make sense?  (notice that the exclude is the reverse of what you're
looking for)

Now, if there are three possible values of type, this doesn't work,
because there can be one and only one exclude.

So here's my idea:

rewrite that above URL to be:
http://dev.linuxkb.org:81/render.php3?type=6&op=1.3.5.7&oid=7&act=DisplayArticle&si=History(1:5.3:1.5:4)

now I can choose documents of type 7 under 1.5.8:

&restrict=type=7&op=1.5.8

possible problem though.  Is the above string one CGI param
&restrict=type=7&op=1.5.8 or two &restrict=type=7 and &op=1.5.8 ?

Now I think we can convert & to a %26 and pass this to the search engine
as: &restrict=type=7%26op=1.5.8

to fix that, but I need to run some tests to see if htsearch will
interpret this like I want it to.  However I need the URL for Articles to
be reformatted to be able to run these tests.  If this doesn't work then
I've got to do a rather IMHO nasty hack of storing each article type in a
seperate DB and then another DB with all the articles in it (hence
requiring 2x the disk space).

Anyways, I looked at functions.php3 to try to figure out how to reformat
the URL this way, but got quickly lost.  Hopefully Jason or someone more
PHP literate can make this change?

I really think that being able to support more than two doc types is
important, even if we don't all agree on my new proposed type.

Anyways, I made more changes to the search forms and header/categories, so
if you're editing those files be sure to do an update.

--
Aaron Turner, Core Developer       http://vodka.linuxkb.org/~aturner/
Linux Knowledge Base Organization  http://linuxkb.org/
Because world domination requires quality open documentation.
aka: aturner@vicinity.com, aturner@pobox.com, ion_beam_head@ashtech.net