Spidering hubs

Technical discussion about the NMDC and <a href="http://dcpp.net/ADC.html">ADC</A> protocol. The NMDC protocol is documented in the <a href="http://dcpp.net/wiki/">Wiki</a>, so feel free to refer to it.

Moderator: Moderators

Locked
djechelon
Posts: 8
Joined: 2004-10-29 11:44

Spidering hubs

Post by djechelon » 2004-11-20 07:37

I want to create a program (C++ or VB can work good) that periodically checks the hubs for users, description and name.
This will be used to update a PublicList.

Any ideas about that? What does DC Hub protocol say about bots?
I don't want to always grab lists and paste them in a global database (what my site is doing now), I want to have REAL data on hubs (some lists are fake, as you may know).
Only Dreamland.gotdns.org and P2PItalia.com are using that kind of Spider-Bot, and they don't want to give others their codes. So I want to make new codes for that.
DJ ECHELON

MASTER OF BIT TORRENT

WEBMASTER http://www.p2pmania.it
My FAST PublicHubList http://www.p2pmania.it/hublists/PublicH ... config.bz2

yakko
Posts: 258
Joined: 2003-01-27 01:04
Contact:

Post by yakko » 2004-11-20 16:55

I'd like to see an easy way to opt out of these, kinda like robots.txt for websites. It's tough keeping a private hub private.

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2004-11-20 17:02

No hub putting itself on a hublist is trying very hard to be private.

yakko
Posts: 258
Joined: 2003-01-27 01:04
Contact:

Post by yakko » 2004-11-20 17:28

my hub isn't trying to be on a list. somehow spiders have found it and the hublist owners don't remove us when we ask.

djechelon
Posts: 8
Joined: 2004-10-29 11:44

Post by djechelon » 2004-11-20 17:34

yakko wrote:kinda like robots.txt for websites. It's tough keeping a private hub private.

First let's take care of hubs that want to be in a public list, and then of this. By the way, if you don't tell anybody with a apublic list the address of your site, it won't be added!
DJ ECHELON

MASTER OF BIT TORRENT

WEBMASTER http://www.p2pmania.it
My FAST PublicHubList http://www.p2pmania.it/hublists/PublicH ... config.bz2

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2004-11-20 17:47

It sounds like you (yakko) need an authentication system. Relying on politeness to secure an entity in a network under steady legal attack is shortsighted.

djechelon: I would support explicitly supporting no robots-like protocol. Don't let hubowners hide behind false security.

Todi
Forum Moderator
Posts: 699
Joined: 2003-03-04 12:16
Contact:

Post by Todi » 2004-11-20 18:05

Just set your hub adress to 127.0.0.1, usually works for me.

djechelon
Posts: 8
Joined: 2004-10-29 11:44

Post by djechelon » 2004-11-21 06:57

cologic wrote:djechelon: I would support explicitly supporting no robots-like protocol. Don't let hubowners hide behind false security.

???
You think hubowners should go to public list control panel and manually edit the number of their users?
The robot is the only way to get a correct value of users. I'm trying to Telnet some hubs to read their responses and send my commands.
I made a conclusion. This bot should be someway able to log in in the hub and retrieve user count. If hub is private, user count will be set to zero. The only way to fake user count is the mirroring of users, against which there is no way to defend.
By the way, I'm reading about hubs connecting to the hublist server. YHub itself has dreamland.gotdns.org integrated in the hublist field.
This way probabily hubs log in to a server like clients in order to be included in public list.
What can you tell me about it?
I've heard about hublist.org having a spider that scans each hub every 8 hours...
DJ ECHELON

MASTER OF BIT TORRENT

WEBMASTER http://www.p2pmania.it
My FAST PublicHubList http://www.p2pmania.it/hublists/PublicH ... config.bz2

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2004-11-21 10:49

Oh, there's ambiguity here between 'robots' and 'robots.txt', I guess, whoops. I'm referring to what yakko wants, the robots.txt protocol.

Locked