[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Another feature for Ht://Dig
well, right now I don't think I have too much time for either the SQL or
this, however just off the top of my head, I remember talking about
natrual language searches a while back with someone(name escapes me, but
they usually do), he said the easiest way around this is to figure out
which words to drop, and which ones to group
from your examples
> How do I compile a Linux kernel?
becomes
compile AND "Linux kernel"
and
> How can I setup a PPP connection to my ISP?
becomes
setup AND "PPP connection"
I'm not sure how well htdig treats groups/phrases such as the "PPP
connection" (most however know how to use them)
its easier than teaching a search engine english, and basically how we
actually parse sentences. (this is coming from the linked database
school of AI)
Aaron Turner wrote:
>
> Marc, (Hugo read this too)
>
> I know you're looking into the port of htdig to SQL, but I was wondering
> if you'd be interested in some other things first.
>
> First, you're probably aware that 3.1.4 was released. I haven't tested if
> your patches are compatible or not. If you could check that, it would be
> great.
>
> Secondly, Hugo and I have been talking about implimenting more "natural
> english searching" capabilities to the LinuxKB. Things like:
>
> How do I compile a Linux kernel?
> How can I setup a PPP connection to my ISP?
>
> Are more friendly to end users than boolean logic search strings.
>
> The new Ht://Dig 3.2.x series supports phrase searching so people can
> search for "compile linux kernel" or "setup PPP connection" which is a
> positive step in that direction. I have no idea how hard it would be able
> to port your 3.1.3 patches to this, but it would be nice to start playing
> with it.
>
> Also it would be nice to be able to search on words, based on their
> distance to other words in the search string. Something like:
>
> (compile kernel)4
>
> would find all docuements with the two words "compile" and "kernel" that
> are no more than 4 words apart. So:
>
> "How do I compile the linux kernel?" would be a match
> "How do I compile htdig on a system with 2.1.x kernel?" would not be
>
> Having both the distance and phrase features (with your existing de-dup
> code) would be very powerful I think. I'm not sure what a good user
> interface to the distance feature would be (I just provided one idea), so
> I'm hoping the rest of the list (specifically Hugo) would provide some
> thoughts on the matter.
>
> In talking with the Ht://Dig developers, all the information you need to
> determine the distance between two words in a document is already there
> in the DB's.
>
> I'd write more but I've got a splitting headache...
>
> --
> Aaron Turner, Core Developer http://vodka.linuxkb.org/~aturner/
> Linux Knowledge Base Organization http://linuxkb.org/
> Because world domination requires quality open documentation.
> aka: aturner@vicinity.com, aturner@pobox.com, ion_beam_head@ashtech.net