Generating an Index..

Which hub software is the best? Where can I find script XXX? Discuss it here...(no, this is not for advertising your hub...)

Moderator: Moderators

Locked
royen99
Posts: 4
Joined: 2004-04-23 00:32

Generating an Index..

Post by royen99 » 2004-04-23 01:33

Wondering if someone already made something that creates an index file with ALL files from the users in a particular hub.
(probably needs to be some hubscript/bot, or maybe a standalone prog/script..)

What i want to do with the end result, is parse the complete index into a mysql database, have a nice php frontend to it. Let users search/browse the index from a website and their results in the users browser will give a dchub:// link with the file they was searching for ....

Any idea's ??

/fREaK Out ...

Todi
Forum Moderator
Posts: 699
Joined: 2003-03-04 12:16
Contact:

Post by Todi » 2004-04-23 02:58

First, you'd have to download the filelist of every user in the hub, not many scripts can do this so you'd need a client/bot. After you've done that, it shouldn't be too hard to parse them together, but would require some kind of custom program. The hard part would probably be to handle duplicates and actually making it useful (it would be pretty disorganized, and very, very hard to browse).

This site, it might give you a few ideas for the searching part (even if it's pretty different from your idea).

royen99
Posts: 4
Joined: 2004-04-23 00:32

Post by royen99 » 2004-04-23 03:49

Actually, the result of that site (its search output) is almost exactly what I wanted the output to be.
Although this site apparently searches multiple hubs, I just need it to search my own hub.

i.e. A script/bot that downloads the filelist of every user connected (few at a time) and stores all possible info (hash, user, slots etc etc) in the database.

The rest isn't that hard to do (the php frontend to read out the sql database and present the database to the end user). Its getting the files info that I can't :-)

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-04-26 22:08

if you want to show even offline user's files, that could be a real pain. Otherwise, you could write a special DC client and have a web front end. There's another PHP project that did similar - search in the forum archives and you'll find it.

Rodga
Posts: 12
Joined: 2003-02-24 19:16
Location: Norway

Post by Rodga » 2006-05-29 20:53

GargoyleMT wrote:if you want to show even offline user's files, that could be a real pain. Otherwise, you could write a special DC client and have a web front end. There's another PHP project that did similar - search in the forum archives and you'll find it.
Fortunately, at least if using YnHub (with SQL), you can always check in some table (in the database) which users are online, then exclude these users' filelists from the "all users" filelist.

I'll try to search for this PHP project you mentioned, thanks :)

Nick-V
Posts: 7
Joined: 2003-04-19 11:07

L

Post by Nick-V » 2006-10-07 13:56

I know the orginal question was posted a while back but our hub and site provides web pages connected to an SQL Server database of members and files in our hub which is updated using MS Access:

1) Manually download all filelists periodically.

2) MS Access routine run daily dealing with 1) filename (keyed on TTH) and 2) username (user, folder, filename) tables both locally and on SQL Server:
* decompress all file lists
* optionally (update the two local Access tables from the SQL Server)
* replace all username records for each user with a filelist adding unique entries into the filename table as required (some other checks and basic processing to meet our specific requirements)
* update the filename SQL Server table and replace the username SQL Server table from Access
* run a stored procedure that does a few final bits including a reindex

3) publish the SQL data using various search and Dreamweaver web pages designed to meet our specific requirement.

Note that the slightly strange update approach was chosen because it provides us with the best update performance.
Search results are published in about 3 seconds with 500,000 username records and 100,000 unique filename records.
The facility helps users identify unique (TTH) files meeting their criteria and not in their collection, identify duplicates cluttering their disk and see the various alternative names and sources by which a file is shared.

Let me know if this might help you or you want more info.

Locked