adding 302 support to HttpConnection

Problems compiling? Don't understand the source code? Don't know how to code your feature? Post here.

Moderator: Moderators

Locked
Sedulus
Forum Moderator
Posts: 687
Joined: 2003-01-04 09:32
Contact:

adding 302 support to HttpConnection

Post by Sedulus » 2003-01-18 17:51

as a solution to the huge bandwidth consumption of the PublicHubList.config downloads I tried to add http-302 support, so it could be used with the following cgi script:

Code: Select all

#!/usr/bin/perl -w
my @files = (
        # add mirrors here
        'http://wza.digitalbrains.com/DC/PublicHubList.config',
        'http://wza.digitalbrains.com/DC/PublicHubList2.config',
        'http://wza.digitalbrains.com/DC/PublicHubList3.config',
#       'http://dcplusplus.sourceforge.net/PublicHubList.config',
);
my $file = $files[ rand @files ];
print "Status: 302 Moved Temporarily\r\n" .
      "Location: $file\r\n" .
      "\r\n";
but this is my first crack at editing arne's code, and I'm totally new to windows programming, so I can't get it right.
what I tried was:

Code: Select all

--- HttpConnection.cpp.orig     2002-12-28 02:31:50.000000000 +0100
+++ HttpConnection.cpp  2003-01-18 23:32:30.000000000 +0100
@@ -56,17 +56,31 @@
        socket->write("GET " + file + " HTTP/1.1\r\n");
        socket->write("User-Agent: DC++ v" VERSIONSTRING "\r\n");
        socket->write("Host: " + server + "\r\n");
+       socket->write("Connection: close\r\n");
        socket->write("Cache-Control: no-cache\r\n\r\n");
 }

 void HttpConnection::onLine(const string& aLine) {
        if(!ok) {
-               if(aLine.find("200") == string::npos) {
+               if(aLine.find("302") != string::npos) {
+                       moved302 = true;
+               } else if(aLine.find("200") == string::npos) {
                        socket->removeListener(this);
                        socket->disconnect();
                        fire(HttpConnectionListener::FAILED, this, aLine);
                }
                ok = true;
+       } else if(moved302 && aLine.find("Location") != string::npos) {
+               socket->removeListener(this);
+               socket->disconnect();
+               location302 = aLine.substr(10, aLine.length() - 11);
+               // reset all settings (as in constructor)
+               moved302 = false;
+               ok = false;
+               //port = 80;
+               size = -1;
+               //socket = NULL;
+               downloadFile(location302);
        } else if(aLine == "\x0d") {
                socket->setDataMode(size);
        } else if(aLine.find("Content-Length") != string::npos) {
(note that moved302 is a bool-member and location302 is a(n unecessary) string)

this is obviously wrong..
I can't seem to clear the socket buffer, and the socket.disconnect() fires a FAILURE. and also I don't know if this is the correct way to do this (maybe a MOVED event has to be fired at somone?)

I'm hoping some of you know how to do this correctly :)

/sed
http://dc.selwerd.nl/hublist.xml.bz2
http://www.b.ali.btinternet.co.uk/DCPlusPlus/index.html (TheParanoidOne's DC++ Guide)
http://www.dslreports.com/faq/dc (BSOD2600's Direct Connect FAQ)

sandos
Posts: 186
Joined: 2003-01-05 10:16
Contact:

Post by sandos » 2003-01-20 12:17

It seems to me you should probably fire the event up from the beginning, its probably cleaner. For example what if youve got multiple chained 302´s, that wont work with your code, right?

sandos
Posts: 186
Joined: 2003-01-05 10:16
Contact:

Post by sandos » 2003-01-20 15:54

I did exactly what you did, and It only worked the first time around. It seems DC++ keeps the same object around till the next time it downloads, so the ok variable isnt reset. I just added ok = false; to the top of downloadFile(), and now it works every time.

This might also be causing bugs in normal DC++, I dunno?

sandos
Posts: 186
Joined: 2003-01-05 10:16
Contact:

Post by sandos » 2003-01-20 16:03

My diff:

Code: Select all

--- 0.22/client/HttpConnection.cpp      2003-01-20 22:02:18.000000000 +0100
+++ 0.22-orig/client/HttpConnection.cpp 2002-12-28 02:31:50.000000000 +0100
@@ -32,8 +32,6 @@
 void HttpConnection::downloadFile(const string& aUrl) {
        dcassert(Util::findSubString(aUrl, "http://") == 0);

-       ok = false;
-
        if(SETTING(HTTP_PROXY).empty()) {
                Util::decodeUrl(aUrl, server, port, file);
                if(file.empty())
@@ -64,31 +62,11 @@
 void HttpConnection::onLine(const string& aLine) {
        if(!ok) {
                if(aLine.find("200") == string::npos) {
-                       if(aLine.find("302") != string::npos){
-                               moved302 = true;
-                       } else {
-                               socket->removeListener(this);
-                               socket->disconnect();
-                               fire(HttpConnectionListener::FAILED, this, aLine);
-                       }
+                       socket->removeListener(this);
+                       socket->disconnect();
+                       fire(HttpConnectionListener::FAILED, this, aLine);
                }
                ok = true;
-       } else if(moved302 == true && aLine.find("Location") != string::npos){
-               socket->removeListener(this);
-               socket->disconnect();
-               location302 = aLine.substr(10, aLine.length() - 11);
-               // reset all settings (as in constructor)
-               moved302 = false;
-               ok = false;
-               port = 80;
-               size = -1;
-
-               if(socket) {
-                       socket->removeListener(this);
-                       BufferedSocket::putSocket(socket);
-               }
-               socket = NULL;
-               downloadFile(location302);
        } else if(aLine == "\x0d") {
                socket->setDataMode(size);
        } else if(aLine.find("Content-Length") != string::npos) {

sandos
Posts: 186
Joined: 2003-01-05 10:16
Contact:

Post by sandos » 2003-01-20 16:06

The header aswell:

Code: Select all

--- 0.22/client/HttpConnection.h        2003-01-20 18:48:36.000000000 +0100
+++ 0.22-orig/client/HttpConnection.h   2002-07-24 15:56:50.000000000 +0200
@@ -48,7 +48,7 @@
 {
 public:
        void downloadFile(const string& aUrl);
-       HttpConnection() : ok(false), port(80), size(-1), socket(NULL), moved302(false) { };
+       HttpConnection() : ok(false), port(80), size(-1), socket(NULL) { };
        virtual ~HttpConnection() {
                if(socket) {
                        socket->removeListener(this);
@@ -65,9 +65,7 @@
        bool ok;
        short port;
        int64_t size;
-       bool moved302;
-       string location302;
-
+
        BufferedSocket* socket;

        // BufferedSocketListener

sandos
Posts: 186
Joined: 2003-01-05 10:16
Contact:

Post by sandos » 2003-01-20 16:07

Ooops, seems my diffs were reversed. Ah well, just use -R. :)

Sedulus
Forum Moderator
Posts: 687
Joined: 2003-01-04 09:32
Contact:

Post by Sedulus » 2003-01-20 17:30

fantastic work sandos! :)
it works, great..

I added a "Connection: close" again... I suppose it might save the server some tiny bit of resources.
(I believe http1.1 has keep-alive as default)

Code: Select all

--- HttpConnection.cpp.orig     2002-12-28 02:31:50.000000000 +0100
+++ HttpConnection.cpp  2003-01-20 23:18:32.000000000 +0100
@@ -56,17 +58,38 @@
        socket->write("GET " + file + " HTTP/1.1\r\n");
        socket->write("User-Agent: DC++ v" VERSIONSTRING "\r\n");
        socket->write("Host: " + server + "\r\n");
+       socket->write("Connection: close\r\n");         // we don't intend to do more on same connection
        socket->write("Cache-Control: no-cache\r\n\r\n");
 } 
  
every mirror should easily be able to run a cron'ed wget every night to update their copy. they could even host multiple (i.e. different) copies, to increase the diversity of the results

now hope that arne likes this scheme, and we can start adding mirrors
...assuming someone has a box that can handle the cpu load of the above perl script (1e4 times a day, wasn't it?)

if you want to test this, I have the perl script set-up at:
http://wza.digitalbrains.com/cgi-bin/Pu ... ist.config
http://dc.selwerd.nl/hublist.xml.bz2
http://www.b.ali.btinternet.co.uk/DCPlusPlus/index.html (TheParanoidOne's DC++ Guide)
http://www.dslreports.com/faq/dc (BSOD2600's Direct Connect FAQ)

Meltingfire
Posts: 9
Joined: 2003-01-06 07:23
Location: Malmo, Sweden
Contact:

Post by Meltingfire » 2003-01-22 03:15

I was looking into apache's powerful module mod_rewrite, that is an URL Rewriting Engine.
http://httpd.apache.org/docs/mod/mod_rewrite.html

When looking throughout the docs, i found something called "load balancing"
http://httpd.apache.org/docs/misc/rewriteguide.html

*copied from webpage*

Load Balancing

Description:
Suppose we want to load balance the traffic to www.foo.com over www[0-5].foo.com (a total of 6 servers). How can this be done?

Solution:
There are a lot of possible solutions for this problem. We will discuss first a commonly known DNS-based variant and then the special one with mod_rewrite:

1. DNS Round-Robin
The simplest method for load-balancing is to use the DNS round-robin feature of BIND. Here you just configure www[0-9].foo.com as usual in your DNS with A(address) records, e.g.

Code: Select all

www0   IN  A       1.2.3.1
www1   IN  A       1.2.3.2
www2   IN  A       1.2.3.3
www3   IN  A       1.2.3.4
www4   IN  A       1.2.3.5
www5   IN  A       1.2.3.6


Then you additionally add the following entry:

Code: Select all

www    IN  CNAME   www0.foo.com.
       IN  CNAME   www1.foo.com.
       IN  CNAME   www2.foo.com.
       IN  CNAME   www3.foo.com.
       IN  CNAME   www4.foo.com.
       IN  CNAME   www5.foo.com.
       IN  CNAME   www6.foo.com.


Notice that this seems wrong, but is actually an intended feature of BIND and can be used in this way. However, now when www.foo.com gets resolved, BIND gives out www0-www6 - but in a slightly permutated/rotated order every time. This way the clients are spread over the various servers. But notice that this not a perfect load balancing scheme, because DNS resolve information gets cached by the other nameservers on the net, so once a client has resolved www.foo.com to a particular wwwN.foo.com, all subsequent requests also go to this particular name wwwN.foo.com. But the final result is ok, because the total sum of the requests are really spread over the various webservers.

2. DNS Load-Balancing
A sophisticated DNS-based method for load-balancing is to use the program lbnamed which can be found at http://www.stanford.edu/~schemers/docs/ ... named.html. It is a Perl 5 program in conjunction with auxilliary tools which provides a real load-balancing for DNS.

3. Proxy Throughput Round-Robin
In this variant we use mod_rewrite and its proxy throughput feature. First we dedicate www0.foo.com to be actually www.foo.com by using a single

Code: Select all

www    IN  CNAME   www0.foo.com.


entry in the DNS. Then we convert www0.foo.com to a proxy-only server, i.e. we configure this machine so all arriving URLs are just pushed through the internal proxy to one of the 5 other servers (www1-www5). To accomplish this we first establish a ruleset which contacts a load balancing script lb.pl for all URLs.

Code: Select all

RewriteEngine on
RewriteMap    lb      prg:/path/to/lb.pl
RewriteRule   ^/(.+)$ ${lb:$1}           [P,L]


Then we write lb.pl:

Code: Select all

#!/path/to/perl
##
##  lb.pl -- load balancing script
##

$| = 1;

$name   = "www";     # the hostname base
$first  = 1;         # the first server (not 0 here, because 0 is myself) 
$last   = 5;         # the last server in the round-robin
$domain = "foo.dom"; # the domainname

$cnt = 0;
while (<STDIN>) {
    $cnt = (($cnt+1) % ($last+1-$first));
    $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
    print "http://$server/$_";
}

##EOF##


A last notice: Why is this useful? Seems like www0.foo.com still is overloaded? The answer is yes, it is overloaded, but with plain proxy throughput requests, only! All SSI, CGI, ePerl, etc. processing is completely done on the other machines. This is the essential point.

4. Hardware/TCP Round-Robin There is a hardware solution available, too. Cisco has a beast called LocalDirector which does a load balancing at the TCP/IP level. Actually this is some sort of a circuit level gateway in front of a webcluster. If you have enough money and really need a solution with high performance, use this one.[/list]
Image

Locked