improve the remove duplicate thing

Archived discussion about features (predating the use of Bugzilla as a bug and feature tracker)

Moderator: Moderators

Locked
bsbartmary
Posts: 5
Joined: 2004-06-14 19:56

improve the remove duplicate thing

Post by bsbartmary » 2004-06-27 01:34

ok since i went through and took out all the possible dupes of a certain file and it still says that it doesn't want to share it then there might be a problem with the feature. Since I've read that it goes by byte sizes and I have a lot of windows media and mp3s that are close to the same size then it might mistake one for the other.

But anyway ways that might improve this:

1. let us see what it sees as the same file (then users can determine if it is a dupe and delete it)

2. let us have the option of saying that I want to share this file and have it count towards my share

these are just suggestions but as it stands the feature is just frustrating b/c all I can do is see what it thinks of as dupes and have no clue what to do next to go through and find what it is a copy of. I don't want to go through all the folders and search for each file that is the exact size of another.

joakim_tosteberg
Forum Moderator
Posts: 587
Joined: 2003-05-07 02:38
Location: Sweden, Linkoping

Post by joakim_tosteberg » 2004-06-27 02:56

It did go by size in the past but not in the latest version. There is the TTH value of a file used for duplicate file matching instead so it'll only count files as dupes if they are identical.

TheParanoidOne
Forum Moderator
Posts: 1420
Joined: 2003-04-22 14:37

Re: improve the remove duplicate thing

Post by TheParanoidOne » 2004-06-27 04:29

bsbartmary wrote:Since I've read that it goes by byte sizes
As has already been mentioned, it doesn't. It goes by hash, ie. the actual file content. You basic assumption is incorrect, so you may want to revise your suggestions appropriately. I will answer your points though.
bsbartmary wrote:1. let us see what it sees as the same file (then users can determine if it is a dupe and delete it)
Enable system logging (Settings --> Logs and Sounds) to see what files are being marked as duplicates.
bsbartmary wrote:2. let us have the option of saying that I want to share this file and have it count towards my share
From what I know, there will never be the option to allow having duplicate files contribute to the share size, as it can so easily be abused. There is however the following option: Settings --> Advanced --> Keep duplicate files in your file list (duplicates never count towards your share size).

The option name is fairly self explanatory. Duplicates are always removed from your sharee size, but you can choose whether or not they appear in the file list.
bsbartmary wrote:these are just suggestions but as it stands the feature is just frustrating b/c all I can do is see what it thinks of as dupes and have no clue what to do next to go through and find what it is a copy of. I don't want to go through all the folders and search for each file that is the exact size of another.
Perhaps the information given here will reduce some of your frustration.

As an aside, where did you read this piece of information (regarding matching by byte size)? If it can be amended to show the correct information, it would stop any future confusion.
The world is coming to an end. Please log off.

DC++ Guide | Words

bsbartmary
Posts: 5
Joined: 2004-06-14 19:56

Re: improve the remove duplicate thing

Post by bsbartmary » 2004-06-27 10:44

TheParanoidOne wrote:
bsbartmary wrote:Since I've read that it goes by byte sizes
bsbartmary wrote:1. let us see what it sees as the same file (then users can determine if it is a dupe and delete it)
Enable system logging (Settings --> Logs and Sounds) to see what files are being marked as duplicates.
maybe i read something wrong

logging the files doesn't help me

here is my situation:

I have a set of online fan club videos of the Backstreet Boys and I might have dupes in other folders but it marks in my set of onfc vids as being dupes but I want to get rid of dupes in other folders. Plus I took out all of my windows media files in other folders to see if it would quit marking the olfc vids as dupes but it still logged one as a dupe and I have no clue how to get to the one it thinks of as the original.

If there was some way to log the original and the dupe it would be helpful, but right now it is frustrating. I don't want to take away from my olfc vids but I am not exactly sure what it wants me to delete other than that. :cry:

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-06-27 11:35

Search your own file list by tth, or use Windows Explorer to search by size. It's not easy, but you have enough information to get the job done.

bsbartmary
Posts: 5
Joined: 2004-06-14 19:56

Post by bsbartmary » 2004-06-27 11:40

what is tth

i know next to nothing about this program or comps in general

i so do not want to search by windows

do you know how many files this thing is taking out of my share and searching by individual files would take way too long


just out of curiousity is it likely that the hashing feature will be user friendly in the future

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-06-27 11:43

bsbartmary wrote:what is tth
http://www.dslreports.com/faq/9677
bsbartmary wrote:do you know how many files this thing is taking out of my share
Well, they're your duplicates. If you don't want to see the message that it's removed them, look for the configuration option you're given in Settings > Advanced. Those duplicate files will never count towards your share size.
bsbartmary wrote:just out of curiousity is it likely that the hashing feature will be user friendly in the future
I guess that depends on how you define friendly and unfriendly.

bsbartmary
Posts: 5
Joined: 2004-06-14 19:56

Post by bsbartmary » 2004-06-27 11:53

i'll read about tth when I'm awake enough to understand

friendly would be where instead of just picking out which files it wants to remove from the share count, the user can decide between the files to share

i don't mind getting rid of duplicates b/c i would like to delete them to make room on my hard drive it is just a matter of which file

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-06-27 12:00

We know that's something that could be improved upon.

bsbartmary
Posts: 5
Joined: 2004-06-14 19:56

Post by bsbartmary » 2004-06-28 01:06

weird

aparantly the olfc vid was a dupe of a small mpeg vid i couldn't play

if they were two different formats and yet the same size they appear to be the same for the hashing thing

joakim_tosteberg
Forum Moderator
Posts: 587
Joined: 2003-05-07 02:38
Location: Sweden, Linkoping

Post by joakim_tosteberg » 2004-06-28 01:12

bsbartmary wrote:weird

aparantly the olfc vid was a dupe of a small mpeg vid i couldn't play

if they were two different formats and yet the same size they appear to be the same for the hashing thing
THe hasing doesn't care about name or file size. What it looks at is the content of the file.

GargoyleMT
DC++ Contributor
Posts: 3212
Joined: 2003-01-07 21:46
Location: .pa.us

Post by GargoyleMT » 2004-06-28 10:43

bsbartmary wrote:if they were two different formats and yet the same size they appear to be the same for the hashing thing
Two different extensions, two different filenames. If it weren't for file hashing, which uniquely identifies the contents of the files, you'd never have known they were the same.

(Sidenote: you can put nearly any Video extension onto a video file, and the default media player in windows will play it just the same [unfortunately].)

Locked