ALERT!
Click here to register with a few steps and explore all our cool stuff we have to offer!

Jump to content



Photo

Comparing two text files and removing duplicate lines


  • Please log in to reply
Comparing two text files and removing duplicate lines

#1

Stella
Stella
    Offline
    77
    Rep
    1019
    Likes

    Ex nihilo nihil fit

Posts: 1152
Threads: 310
Joined: Aug 16, 2015
Credits: 0

Eight years registered
#1

Is there any tool on nulled that allows you to compare two text files (combolists), detect lines that appear in both files and then remove them with one click or create a new text file with only lines that weren't duplicated?

 

Compare plugin from Notepad++ allows you to the first two things, but doesn't give you a remove option.


  • 0

#2

Effervescence
Effervescence
    Offline
    234
    Rep
    801
    Likes

    Veteran

  • PipPipPipPipPipPipPip
Posts: 1265
Threads: 245
Joined: Nov 27, 2017
Credits: 0

Deal with caution
User has an open scam report.
Six years registered
#2

you can use the notepad++ with a plugin named compare

and you will simply hit alt+d and it will compare the different lines


  • 0

#3

Stella
Stella
    Offline
    77
    Rep
    1019
    Likes

    Ex nihilo nihil fit

Posts: 1152
Threads: 310
Joined: Aug 16, 2015
Credits: 0

Eight years registered
#3

you can use the notepad++ with a plugin named compare

and you will simply hit alt+d and it will compare the different lines

 

Compare plugin in Notepad++ allows you only to compare and see which lines are duplicated, it won't let you remove the lines that are duplicated.

You gotta do it yourself manually, and when working on files with 200k lines it's not the way to go.


  • 0

#4

Effervescence
Effervescence
    Offline
    234
    Rep
    801
    Likes

    Veteran

  • PipPipPipPipPipPipPip
Posts: 1265
Threads: 245
Joined: Nov 27, 2017
Credits: 0

Deal with caution
User has an open scam report.
Six years registered
#4

Compare plugin in Notepad++ allows you only to compare and see which lines are duplicated, it won't let you remove the lines that are duplicated.

You gotta do it yourself manually, and when working on files with 200k lines it's not the way to go.

i am not sure what you actually want to do, so i can not help you much

i can also suggest you to combine files and remove dublicates on the combined file, but this is probably not what you want

what i guess you actually want is to extract the entries in file 2 which are not in file 1

you can do this in the command line with this code

$ awk 'FNR==NR {a[$0]++; next} !a[$0]' file1 file2

as you can see , the code has $ in front so that means it should be bash

 

now to explain you what is written above

FNR is the current file's record number
NR is the current overall record number from all input files
FNR==NR is true only when we are reading file1
$0 is the current line of text
a[$0] is a hash with the key set to the current line of text
a[$0]++ tracks that we've seen the current line of text
!a[$0] is true only when we have not seen the line text

 

Now you can do this with another way aswell which is easier to comprehend 

sort file1 > file1.sorted
sort file2 > file2.sorted
comm -1 -3 file1.sorted file2.sorted

This will output duplicates, so if there is 1 3 in file1, but 2 in file2, this will still output 1 3. If this is not what you want, pipe the output from sort through uniq before writing it to a file:

sort file1 | uniq > file1.sorted
sort file2 | uniq > file2.sorted
comm -1 -3 file1.sorted file2.sorted

  • 1

#5

Stella
Stella
    Offline
    77
    Rep
    1019
    Likes

    Ex nihilo nihil fit

Posts: 1152
Threads: 310
Joined: Aug 16, 2015
Credits: 0

Eight years registered
#5

 

i am not sure what you actually want to do, so i can not help you much

i can also suggest you to combine files and remove dublicates on the combined file, but this is probably not what you want

what i guess you actually want is to extract the entries in file 2 which are not in file 1

you can do this in the command line with this code

$ awk 'FNR==NR {a[$0]++; next} !a[$0]' file1 file2

as you can see , the code has $ in front so that means it should be bash

 

now to explain you what is written above

FNR is the current file's record number
NR is the current overall record number from all input files
FNR==NR is true only when we are reading file1
$0 is the current line of text
a[$0] is a hash with the key set to the current line of text
a[$0]++ tracks that we've seen the current line of text
!a[$0] is true only when we have not seen the line text

 

Now you can do this with another way aswell which is easier to comprehend 

sort file1 > file1.sorted
sort file2 > file2.sorted
comm -1 -3 file1.sorted file2.sorted

This will output duplicates, so if there is 1 3 in file1, but 2 in file2, this will still output 1 3. If this is not what you want, pipe the output from sort through uniq before writing it to a file:

sort file1 | uniq > file1.sorted
sort file2 | uniq > file2.sorted
comm -1 -3 file1.sorted file2.sorted

 

First, let me say i appreciate you writing a detailed explaination on everything, especially that it's rather complicated for me (a person that doesn't have much of experience with cmd lines/coding).

To further explain my problem, lets say we're working on user:pw combos. Some combolists naturally share the same user:password combination with another combolist.

 

So for example, you're cracking a website and already checked one combolist through it.

Name it "combolist_A", now you want to run another one called "combolist_B", but the problem is a lot of lines from combolist_A are also in combolist_B.

So if you just do nothing and run combolist_B, you will be wasting a lot of time on checking lines from combolist_A that you already did check before.

 

It's especially a problem when you're running a selenium config with low cpm.

 

So for example if combolist A is 5 lines:
 

1:1
2:2
3:3
4:4
5:5

and combolist B is 5 lines:


3:3
5:5
6:6
7:7
8:8

And we checked comoblist_A already, meaning we checked 3:3 and 5:5 that is in combolist_B.

The idea is to make like a subtraction, remove from combolist_B lines that are in combolist_A so we are left only with lines that weren't yet checked.

 

I had an easy .bat file around a year ago that did exactly that but lost it.


  • 0

#6

Effervescence
Effervescence
    Offline
    234
    Rep
    801
    Likes

    Veteran

  • PipPipPipPipPipPipPip
Posts: 1265
Threads: 245
Joined: Nov 27, 2017
Credits: 0

Deal with caution
User has an open scam report.
Six years registered
#6

First, let me say i appreciate you writing a detailed explaination on everything, especially that it's rather complicated for me (a person that doesn't have much of experience with cmd lines/coding).

To further explain my problem, lets say we're working on user:pw combos. Some combolists naturally share the same user:password combination with another combolist.

 

So for example, you're cracking a website and already checked one combolist through it.

Name it "combolist_A", now you want to run another one called "combolist_B", but the problem is a lot of lines from combolist_A are also in combolist_B.

So if you just do nothing and run combolist_B, you will be wasting a lot of time on checking lines from combolist_A that you already did check before.

 

It's especially a problem when you're running a selenium config with low cpm.

 

So for example if combolist A is 5 lines:
 

1:1
2:2
3:3
4:4
5:5

and combolist B is 5 lines:


3:3
5:5
6:6
7:7
8:8

And we checked comoblist_A already, meaning we checked 3:3 and 5:5 that is in combolist_B.

The idea is to make like a subtraction, remove from combolist_B lines that are in combolist_A so we are left only with lines that weren't yet checked.

 

I had an easy .bat file around a year ago that did exactly that but lost it.

since you do not have worked very much with cmd, and assuming you have the combo_a and combo_b from the beginning

 

I would suggest that you use this tool. https://www.nulled.t...ate-combo-tool/

 

add combolist_a and combolist_b in one files, remove the dublicates with the program i linked.

 

and then you can split it again if you want


  • 0

#7

Buried
Buried
    Offline
    0
    Rep
    2
    Likes

    Member

Posts: 25
Threads: 4
Joined: Jan 23, 2019
Credits: 0
Five years registered
#7

Also Nulleds AIO Combo tools has this feature built in https://www.nulled.t...io-combo-tools/

 

My friend has tried it i haven't though


  • 0

#8

Meli0das
Meli0das
    Offline
    118
    Rep
    218
    Likes

    A real sin...Can't be erased no matter whaaaat you do

Posts: 395
Threads: 103
Joined: Feb 27, 2018
Credits: 0

Six years registered
#8

Is there any tool on nulled that allows you to compare two text files (combolists), detect lines that appear in both files and then remove them with one click or create a new text file with only lines that weren't duplicated?

 

Compare plugin from Notepad++ allows you to the first two things, but doesn't give you a remove option.

Hey there 

There are plenty of online tools that can help with that like
http://www.dedupelist.com ( removes duplicats)  and 
https://text-compare.com ( to compare the text)
Hope it helps  :D
  :D 


  • 0

gatasmbpl.gif


#9

Wingu
Wingu
    Offline
    0
    Rep
    0
    Likes

    Lurker

Posts: 7
Threads: 1
Joined: Jan 17, 2019
Credits: 0
Five years registered
#9

Use Dupli Find the pro version, you can find the key everywhere.


  • 0


 Users browsing this thread: