Improvement of the search engine

Archived discussion about features (predating the use of Bugzilla as a bug and feature tracker)

Moderator: Moderators

Locked
Bazzi of Dawn
Posts: 1
Joined: 2003-07-18 16:37

Improvement of the search engine

Post by Bazzi of Dawn » 2003-07-18 17:05

I have a suggestion, a improvement of the search engine. When searching on the internet with ex. Yahoo, you can add “ symbols, witch would be very useful.

I tried to search for a band named The polis, it gave a lot of results, but none of the mp3s were made by the right band.

The result were a lot of “metroPOLIS…oTHErs�

It can be fixed, can’t it?
Bazzi of Dawn - the Carpenter of your street.

A Carpenter for your girl.

<<<<<<<<<<<<<<<*>>>>>>>>>>>>>>>

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2003-07-19 15:04

There have been other suggestions on how to change the search function. Full regular expression matching (*, ?) will cause more drag on your computer, since it will take longer to parse and search. I think the sub-string matching of DC++ is the way to go for now.

Dj_Offset
Posts: 48
Joined: 2003-02-22 19:22
Location: Oslo, Norway
Contact:

Re: Improvement of the search engine

Post by Dj_Offset » 2003-07-30 06:47

Bazzi of Dawn wrote:I have a suggestion, a improvement of the search engine. When searching on the internet with ex. Yahoo, you can add ? symbols, witch would be very useful.

I tried to search for a band named The polis, it gave a lot of results, but none of the mp3s were made by the right band.

The result were a lot of ?metroPOLIS?oTHErs?

It can be fixed, can?t it?
I think the easiest solution for this would be strict ordering for the
substring matches (which in turn would further optimize it).

A search for "The Polis" would match:
THE metroPOLIS

but not "metroPOLIS THE", since THE isn't first...
I wrote QuickDC - A DC++ compatible client for Linux and FreeBSD.

jbyrd
Posts: 255
Joined: 2003-05-10 09:26
Location: no-la-usa-earth
Contact:

Post by jbyrd » 2003-07-30 07:48

Ehh. I don't like that idea.

Say I want the Loony Toons cartoon starring Bugs Bunny named "The Tortoise and the Hare". Using your method, if I type 'Tortoise Hare', then I wouldn't receive results like "The Hare and the Tortoise". :)

Granted, neither would the other idea. But, I don't see how the substring ordering will really improve search accuracy.

How about the simple pull-down menu of "all" or "any" or "exact"? I guess the same problems occur. But I don't think there would be any noticable "drag" with somewhat efficient code. Most machines are up above 2 GHz and 512 MB RAM.

cyberal
Posts: 360
Joined: 2003-05-16 05:42

Post by cyberal » 2003-07-30 09:01

most machines eh? lol
http://whyrar.omfg.se - Guide to RAR and DC behaviour!
http://bodstrom.omfg.se - Bodströmsamhället, Länksamling om hoten mot vår personliga integritet

Dj_Offset
Posts: 48
Joined: 2003-02-22 19:22
Location: Oslo, Norway
Contact:

Post by Dj_Offset » 2003-07-30 09:26

jbyrd wrote:Ehh. I don't like that idea.

Granted, neither would the other idea. But, I don't see how the substring ordering will really improve search accuracy.

How about the simple pull-down menu of "all" or "any" or "exact"? I guess the same problems occur. But I don't think there would be any noticable "drag" with somewhat efficient code. Most machines are up above 2 GHz and 512 MB RAM.
Search: "Turtoise Hare" for the file with name: "The Turtoise and the Hare".
_WILL_ match, because "Hare" comes AFTER "Toroise", however
Search: "Hare Turtoise" will NOT match.

This way we can reduce CPU _some_, and keep correct search results.
Substring matching is expensive, it's alot less expensive with exact matches but that will not work here unfortunately.
I wrote QuickDC - A DC++ compatible client for Linux and FreeBSD.

jbyrd
Posts: 255
Joined: 2003-05-10 09:26
Location: no-la-usa-earth
Contact:

Post by jbyrd » 2003-07-30 12:02

DJ Offset wrote:Search: "Turtoise Hare" for the file with name: "The Turtoise and the Hare".
_WILL_ match, because "Hare" comes AFTER "Toroise", however
Search: "Hare Turtoise" will NOT match
I wrote:Using your method, if I type 'Tortoise Hare', then I wouldn't receive results like "The Hare and the Tortoise".
You are saying the same thing that I said, except mine has a point to it. My point is, if the file name isn't in the exact order that you *think* it is, then you're going to miss out on files. Some may name it "Tortoise and the Hare", some may name it "Hare and Tortoise". Nevermind.

Again, I don't see how the substring ordering will really improve search accuracy. The method should look at the search terms given, and find them only, not words that ^contain^ the search terms. I'm no c++ wizard, but it seems that if you don't look for names ^containing^ terms, and just those that are the terms, it would take up less cpu.

Locked