# Hashing--TTH

Archived discussion about features (predating the use of Bugzilla as a bug and feature tracker)

Moderator: Moderators

Locked
nagan
Posts: 28
Joined: 2004-09-05 12:11

# Hashing--TTH

Post by nagan » 2004-09-05 12:23

:roll: Some observations for your clarification.

1.Unfinished file name.In the previous versions the
unfinished file name had a identity and could be tracked
easily.Now files if they have a tth the file name is hidden
in a junk of a name.Moreover since the extensions are
concealed the icons are missing!It is difficult to run the
file through say realplay.

Since TTH has relevance only in the DC++ program why
not continue the previous nomenclature of names.

2.Let us say I have a file in the queue with aTTH.Will
DC++ allow a alternate download from a user who has
the exact file but not hashed it?? If not the purpose is
defeated uness everyone hashes the file?

Nagan

Punkishlyevil
Posts: 25
Joined: 2004-02-26 22:14
Location: Wisconsin, USA

Re: # Hashing--TTH

Post by Punkishlyevil » 2004-09-05 12:27

nagan wrote: If not the purpose is
defeated uness everyone hashes the file?

Nagan
Everyone should be having their files hashed. Thats why its mandatory in v. 0.401 and in v. 0.4032 only shares hashed files.

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-05 12:44

:roll: What if some users are not using >.401.Wil their shares not materialise?

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-09-06 00:44

No, but you shouldn't be downloading from them anyway since it can often cause corruption. My small private hub recently enforced a 0.401-minimum-version rule, and I'd like to see larger public hubs do the same. And why are you using realplayer when you could use a good player that ignores the filenames and looks at the actual header of the file to determine what it is, like VLC?

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-06 10:56

Till such time HUBS enforce hashing or minimum DC++ version , I think the later version should allow a choice to the user >>> Incase the file name and size matches ,DC++ should be able to dwld the files in case the ROLLBACK works fine incase the other user has not hashed it (as was being done prior to 401).Hashing is just another advantage and should not act as a drawback.

2.In case a user opts for the later version what happens to the lots of files in the old dwnload queue which does not have a TTH?

3.The previous version of naming incomplete files were ideal.Putting a 38 alphabet junk in front of the original file name and concealing the file extension is a irritant.Let the file name be first.Suppose I want to check the file from the incomplete dwnld folder I have trouble in finding them.Siince the TTH information resides in the DC++ queue transfering it to the file name seems a little trivial (is it really needed).I thought the old scheme of names would still work.TTH in the program is basically to identify the right files within the hub!!! Once identified the dwnld could take place as per the old procedure.Even prior to 401 each incompete file in the queue had a unique 8 letter junk (after file name) in the incomplete download folder.So it wont matter just because TTH is added because the main comparison and identity takes place within the program and not outside!!

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-09-06 17:28

nagan wrote:Siince the TTH information resides in the DC++ queue transfering it to the file name seems a little trivial (is it really needed)
If your Queue.xml is lost, you may very well appreciate that incomplete files have the TTH of the download that in them. That's why it was changed.

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-07 09:29

OK Sir! Let us assume that the queue.xml is lost!!.How do you think the incomplete files could be resumed with the tth.The tth has to be manually copied from the incomplete file name and searched for from the program?
What prevents the junk to follow the original file name (as done prior to ver 401).Checking of the files being downloaded becomes easy.LEaving the file extensions intact without a dctmp extn will also be an improvement.

What about query 1 of my previous post?

ivulfusbar
Posts: 506
Joined: 2003-01-03 07:33

Post by ivulfusbar » 2004-09-07 09:56

I for example would be able to recreate my queue directly. Its rather simple. Its a matter of wrinting a one-line shell.line which many are capable off.
Everyone is supposed to download from the hubs, - I don´t know why, but I never do anymore.

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-08 09:53

ivulfusbar wrote:I for example would be able to recreate my queue directly. Its rather simple. Its a matter of wrinting a one-line shell.line which many are capable off.
Kindly let us know thy secret of the shell?

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-09-09 22:15

nagan wrote:OK Sir! Let us assume that the queue.xml is lost!!.How do you think the incomplete files could be resumed with the tth.
Since this change, no, since one of the latest posts about a poor use losing his queue and manually trying to find the right source (if he's downloading a video, there may be a lot of sources for it) for it, I decided to change the name of the incomplete temp file, and then write a feature that will scan a given directory, pick out the incomplete DC++ files, and make a queue of their original names and hashes. This will be implemented in an upcoming version of DC++, as will the "Preview partial file" menu choice, which address the two biggest concerns about my change.

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-10 09:38

I decided to change the name of the incomplete temp file, and then write a feature that will scan a given directory, pick out the incomplete DC++ files, and make a queue of their original names and hashes. This will be implemented in an upcoming version of DC++, as will the "Preview partial file" menu choice, which address the two biggest concerns about my change.
Me thinks the file name at the beginning with a junk tail is well off for checking files as they download! Since you are writing a well defined program for it you could as well leave the file extensions intact.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-09-10 10:57

nagan wrote:Since you are writing a well defined program for it you could as well leave the file extensions intact.
.dctmp will stay on the end - it prevents incomplete files, if shared, from being returned with regular search hits (it's the extension that determines the type - an .avi.dctmp file will not be returned for Video searches, but a *dctmp.*.avi file will be) The preview function will work on the embedded extension, so I don't see any reason why incompletes should be changed in the filesystem as well.

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-11 07:25

Thnks Gargoyle for the reply. But in the future versions cant you make an incomplete folder mandatory and an algorithm to prevent it from being shared? Simple?

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-09-11 11:07

nagan wrote:Thnks Gargoyle for the reply. But in the future versions cant you make an incomplete folder mandatory and an algorithm to prevent it from being shared? Simple?
DC++ already has a mandatory incomplete folder and refuses to share your current incomplete folder.

Code: Select all

 -- 0.4032 2004-08-08 --
* Unfinished files now have a slightly different naming scheme (thanks garg)
* Added default unfinished folder (thanks garg)

 -- 0.302 2003-11-14 --
* Temporary downloads folder no longer shared
So sharing incomplete files isn't normally a problem. However, you can share old incomplete folders, and incomplete files moved out of the incomplete folder.

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-16 11:50

:) Ok nailed!! But you may consider having the file name at the beginning.
While on this subject ,it is common rule that dwnloaded files need to be sorted as per convenience that entails relegating files to different directories from the default download folder.Of course they will be shared.Now comes the issue! DC as per norms checks for any duplicate in the default dwnld directory.I may end up downloading ABC.mpeg again (which has already been dwnld and moved to another folder).With TTH in force can you perform a check on all the shared directories so as to quell duplicates?Herculean task?

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-09-16 15:10

nagan wrote:Now comes the issue! DC as per norms checks for any duplicate in the default dwnld directory.I may end up downloading ABC.mpeg again (which has already been dwnld and moved to another folder).With TTH in force can you perform a check on all the shared directories so as to quell duplicates?Herculean task?
This has been requested before, and is probably already in the bug tracker. If for some reason it's not, feel free to add it; I'll vote for it.

If I'm reading the recent CVS code right, the incomplete file naming scheme has indeed been switched around to $filename.$ext.$tth.dctmp

Code: Select all

string QueueManager::getTempName(const string& aFileName, const TTHValue* aRoot) {
	string tmp(aFileName);
	if(aRoot != NULL) {
		TTHValue tmpRoot(*aRoot);
		tmp += "." + tmpRoot.toBase32();
	}
	tmp += TEMP_EXTENSION;
	return tmp;
}

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-09-16 18:50

PseudonympH wrote:If I'm reading the recent CVS code right, the incomplete file naming scheme has indeed been switched around to $filename.$ext.$tth.dctmp
True, it was changed last Saturday.

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-18 12:41

:oops: Can you explain and abbreviate CVS.Do we expect to see the implementation in the later versions or a patch can be made up?

BSOD2600
Forum Moderator
Posts: 503
Joined: 2003-01-27 18:47
Location: USA
Contact:

Post by BSOD2600 » 2004-09-18 13:52

" CVS -- or Concurrent Versioning System -- is a system for managing simultaneous development of files. It is in common use in large programming projects, and is also useful to system administrators, technical writers, and anyone who needs to manage files."

"CVS stores files in a central repository, set (using standard Unix permissions) to be accessible to all users of the files. Commands are given to "check out" a copy of a file for development, and "commit" changes back to the repository. It also scans the files as they are moved to and from the repository, to prevent one person's work from overwriting another's."

"This system ensures that a history of the file is retained..."


You can browse the CVS here: http://cvs.sourceforge.net/viewcvs.py/d ... cplusplus/

joakim_tosteberg
Forum Moderator
Posts: 587
Joined: 2003-05-07 02:38
Location: Sweden, Linkoping

Post by joakim_tosteberg » 2004-09-18 13:54

CVS is the place there the lateset source code is beeing stored. This source is often unstable, contains bugs or half-iplemented features and thats why it's just stored in the CVD respitory. And then arne feels that the code is ready for a new binary realease he releases it.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-09-19 14:34

joakim_tosteberg wrote:his source is often unstable, contains bugs or half-iplemented
Let's just say it's usually got big changes, and putting those changes in CVS allows other people to test it. (It nearly always compiles and runs, just doesn't work the same way a release does.)

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-22 11:37

:) While on this subject:-
1.I find that files hashed on starting DC++ dont find a place in the file list cause the file list is refreshed later--- embarassing!! If I had to sort files and rearrange them into directories and then start DC++ hashing takes place and it excludes those files from the share and it could affect the share size.Same is the case when the user runs 403 for the first time, after hashing te file list will show zero bytes!!.He has to restart it to get all the files reflected in the share.Why not make DC++ a little courteous to remind the user on the start of DC++whether he intends refreshing file list in case there is hashing (necessiated due to above reasons).

2.Is there some sort of a log file for hashing to know only which files are indeed duplicates, to have them removed from the disk as well?

3.Can hashing (refreshing) be done as user list refreshing by the user?

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-09-22 14:12

nagan wrote:1.I find that files hashed on starting DC++ dont find a place in the file list cause the file list is refreshed late ... He has to restart it to get all the files reflected in the share.
Any refreshes in >= 0.4032 will list files that have been hashed since the last refresh, there's no need to restart. The development version updates your file list up to every 15 minutes to include files that have been hashed recently. So, there's nothing more to be done for the next version on that front.
nagan wrote:2.Is there some sort of a log file for hashing to know only which files are indeed duplicates, to have them removed from the disk as well?
Yes, this has been in DC++ for a while now - enable the "Log System Messages" in Logs and Sounds.
nagan wrote:3.Can hashing (refreshing) be done as user list refreshing by the user?
Yes, See point 1.

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-23 10:37

GargoyleMT wrote: Any refreshes in >= 0.4032 will list files that have been hashed since the last refresh, there's no need to restart. The development version updates your file list up to every 15 minutes to include files that have been hashed recently. So, there's nothing more to be done for the next version on that front.
Well what is the point if the user is not able to connect to the hubs for 15 minutes in case he runs it for the first time or he happens to rearrange files that could affect the share until refreshed?

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-09-23 15:26

He should twiddle his thumbs for 15 minutes, what else?

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-09-24 18:35

nagan wrote:Well what is the point if the user is not able to connect to the hubs for 15 minutes in case he runs it for the first time or he happens to rearrange files that could affect the share until refreshed?
Given that DC++ is sharing only hashed files, what do you expect the solution to be? At 10-30 megabytes per second, 9,000 - 27,000 mb will be done in those initial minutes.

I'd suggest writing an email to an old friend, going outside for a breath of fresh air, or registering to vote.

Xan1977
Forum Moderator
Posts: 627
Joined: 2003-06-05 20:15

Post by Xan1977 » 2004-09-24 19:17

GargoyleMT wrote:or registering to vote.
hear, hear!

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-09-24 19:26

Xan1977 wrote:hear, hear!
Two quips I saw today, one for each party:
"Boots or Flip-Flops?"
"Stop mad cowboy disease, vote for Kerry"

know the USA is one of the worst countries for voter turnout, but none of them are spectacular - so my original advice stands. ;)

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-28 11:53

GargoyleMT wrote:Given that DC++ is sharing only hashed files, what do you expect the solution to be? At 10-30 megabytes per second, 9,000 - 27,000 mb will be done in those initial minutes.
I'd suggest writing an email to an old friend, going outside for a breath of fresh air, or registering to vote.
Appreciate your way of making things clear that this point is not to be considered now!!
Well shares need to be so large 9000 -27000 mb ?.........Hmm....

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-09-28 13:19

nagan wrote:Well shares need to be so large 9000 -27000 mb ?.........Hmm....
9 - 27 gigabytes is not a large share, and that's what DC++ will hash in 15 minutes on an average system. (at 10-30 mb/s)

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-09-30 08:06

GargoyleMT wrote:
nagan wrote:Well shares need to be so large 9000 -27000 mb ?.........Hmm....
9 - 27 gigabytes is not a large share, and that's what DC++ will hash in 15 minutes on an average system. (at 10-30 mb/s)
Well for people with lesser shares hashing of which completes in 5 to 10 min ,they might have to wait for some time or restart DC++ again.

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-09-30 14:26

Or they could be clueful and use /refresh (assuming you're not talking about 0.4033, which is a little different)

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-09-30 19:10

It was my understanding that we were talking about the changes in CVS that became 0.4033.


For people with smaller shares who finish hashing in 5-10 minutes, refreshing is positively not an issue, if you've been paying attention to what I've been saying....

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-10-05 09:14

You have got a point!! Why not bring the "refresh" button to the main screen. That way the new users will have a "clue".

PseudonympH
Forum Moderator
Posts: 366
Joined: 2004-03-06 02:46

Post by PseudonympH » 2004-10-05 13:41

Maybe because it's already a menu option, a /command, and a keyboard shortcut?

cologic
Programmer
Posts: 337
Joined: 2003-01-06 13:32
Contact:

Post by cologic » 2004-10-05 23:30

No, I agree with Nagan; those provides too little clue. I propose:
Image
Image
In fact, not one, but fully five of my toolbar icons are "Refresh".

Todi
Forum Moderator
Posts: 699
Joined: 2003-03-04 12:16
Contact:

Post by Todi » 2004-10-06 03:07

I propose we put in a nag-screen every 10 minutes that asks if the user wants to refresh.

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-10-06 10:25

Posted by gargoyle on 22-09-04
GargoyleMT wrote:
nagan wrote:2.Is there some sort of a log file for hashing to know only which files are indeed duplicates, to have them removed from the disk as well?
Yes, this has been in DC++ for a while now - enable the "Log System Messages" in Logs and Sounds.
System logs grow in size daily with lots of average data ,dup files have to be searched for. Some log showing only the duplicate files will be more purposeful?Rather than a report (which may get deleted) if it is interactive (taking the details from the "Hash Data" ) on request it would be useful?

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-10-06 10:57

PseudonympH wrote:Maybe because it's already a menu option, a /command, and a keyboard shortcut?
Hear, hear.
nagan wrote:Some log showing only the duplicate files will be more purposeful?Rather than a report (which may get deleted)
If you delete the file that contains the information you want, that's too bad.

Eventually, the system log may have "areas" within it (anyone who has read /var/log/messages knows what I'm talking about), but splitting duplicates out into a separate log file is absolutely ludicrous - it's the exact opposite of the purpose of system.log.

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-10-07 09:33

Gargoyle wrote:
PseudonympH wrote:Maybe because it's already a menu option, a /command, and a keyboard shortcut?
Hear, hear..
Might as well hear the comment (with proof) just 2 posts earlier.
Gargoyle wrote:If you delete the file that contains the information you want, that's too bad.Eventually, the system log may have "areas" within it (anyone who has read /var/log/messages knows what I'm talking about), but splitting duplicates out into a separate log file is absolutely ludicrous - it's the exact opposite of the purpose of system.log.
Agreed.After every refresh all the duplicates are listed at the end.What I was talking of was a function to list all the duplicate files along with a checkbox The user can check them and DC can directly consign it to recycle bin .

As of the log messages rather than list all the files hashed (which is expected and obvious) can those which went contrary alone be listed ,ofcourse with the other necessary logs! If a user could read any useful data out of the log of 1000's of files hashed ,well no complaints.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-10-12 10:02

nagan wrote:Might as well hear the comment (with proof) just 2 posts earlier.
I saw nothing that would convince me. Want to repeat it?
nagan wrote:What I was talking of was a function to list all the duplicate files along with a checkbox The user can check them and DC can directly consign it to recycle bin .
Why add another feature to DC++ when specialized applications do the same thing, and do a better job of it?
Duplic8 - Duplicate File finder and manager

If users wanted to remove the duplicates from their share, a proper tool is only a google search away...

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-10-12 12:01

GargoyleMT wrote: I saw nothing that would convince me. Want to repeat it?
Well ,the post from 'Cologic" on 6-10-04 (dd-mm-yy).
GargoyleMT wrote: Why add another feature to DC++ when specialized applications do the same thing, and do a better job of it.If users wanted to remove the duplicates from their share, a proper tool is only a google search away...
Gee thanks Gargoyle! So long as the technology employed by the other tool is as perfect as hashing ,so that it does not contradict DC++.

Meanwhile I posted a request to Bugzilla with the theme "Resume function for file lists" as DC++ always starts dwnloading of file lists from the beginning even incase of interruption or disconnection. .Am not able to locate its status

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-10-12 14:10

nagan wrote:Well ,the post from 'Cologic" on 6-10-04 (dd-mm-yy).
He was mocking you, with the hope of making you see how rediculous your request was.
nagan wrote:Gee thanks Gargoyle! So long as the technology employed by the other tool is as perfect as hashing ,so that it does not contradict DC++.
Gee, if you want more bloat added to DC++, feel free to try to get arne to code that feature for you.
nagan wrote:Meanwhile I posted a request to Bugzilla with the theme "Resume function for file lists" as DC++ always starts dwnloading of file lists from the beginning even incase of interruption or disconnection. .Am not able to locate its status
Bug 194: Resume file list transfers?

It's in bugzilla. What more do you want to know?

nagan
Posts: 28
Joined: 2004-09-05 12:11

Post by nagan » 2004-10-22 05:28

Thanks cologic for taking pains to prove a point.But I stil feel adding a refresh file list icon to the toolbar icons will add more "Beauty" to the "Brain (dc++)"

Locked