View Full Version : Find Duplicates, 2 Columns, Different Worksheets
ghendi
07-11-2013, 06:23 AM
Hello all,
I have worksheets that contain lists of addresses. My company receives a list of new members. The problem is, sometimes these new "members" are repeat members using a different name, but the same verified address. In the past they were able to get away, until we switched to excel. The problem is that the master list keeps growing and I want to make sure that these new members aren't repeat members. As of right now, their addresses are in Column D in both sheets (sheet 1 = master). Every 2 weeks we update the list.
The code needed:
I open the master sheet and run the following code to copy over the new data to the master workbook. As of right now, the code copies it to the master sheet (sheet1). Can you change it so it copies it to a new worksheet and titles the worksheet by today's current date (format: mm-dd-yy)? This is my current code:
Sub BulkImport()
Dim InFileNames As Variant
Dim fCtr As Long
Dim tempWkbk As Workbook
Dim consWks As Worksheet
Set consWks = ThisWorkbook.Sheets(1)
InFileNames = Application.GetOpenFilename _
(FileFilter:="Excel Files, *.*", MultiSelect:=True)
Application.ScreenUpdating = False
If IsArray(InFileNames) Then
For fCtr = LBound(InFileNames) To UBound(InFileNames)
With Workbooks.Open(Filename:=InFileNames(fCtr))
MsgBox "Number of cells that have data: " & Application.WorksheetFunction.CountA(.Sheets(1).Ra nge("A2:J" & .Sheets(1).Range("A" & .Sheets(1).Rows.Count).End(xlUp).Row))
.Sheets(1).Range("A1:J" & .Sheets(1).Range("A" & .Sheets(1).Rows.Count).End(xlUp).Row).Copy consWks.Range("A" & consWks.Rows.Count).End(xlUp)(2)
.Close 0
End With
Next fCtr
Else
MsgBox "No file selected"
End If
With Application
.StatusBar = True
.ScreenUpdating = True
End With
End Sub
Now that the data would be copied over, I would need to compare it with the unique list I already have in the master sheet (sheet1). It would need to compare against a list of over 10,000 (and growing), and then mark the duplicates found. I would then make sure manually that they really are duplicates (since sometimes people have the same address but a different unit number; unless you have a more efficient way to compare the two). I would delete the duplicates manually (most likely) and then have vba code copy over the new ones to the master sheet, continuing the growth of the unique list. I have a total of 10 columns (A-J).
Also, a previous admin poster explained to me the differences between ThisWorkbook and ActiveWorkbook. Are they interchangeable here? If I swap it out, will the code run as it should? Thanks for all your help!
I hope it's not too much!!
Respectfully,
Amit
Admin
07-11-2013, 07:29 AM
Hi
Which column holds the unique key to compare ?
ghendi
07-11-2013, 07:47 AM
It will always be Column D (which is the address column) on all sheets.
Admin
07-11-2013, 10:53 PM
Hi
try this. adjust the sheet names in the code.
Option Explicit
Sub kTest()
Dim ka, k(), i As Long, n As Long, c As Long, d As Object
Const MasterSheet As String = "Sheet1" '<< adjust
Const NewData As String = "Sheet2" '<< adjust
Set d = CreateObject("scripting.dictionary")
d.comparemode = 1
ka = ThisWorkbook.Worksheets(MasterSheet).Range("a1").CurrentRegion.Resize(, 10).Value2
For i = 2 To UBound(ka, 1)
If Len(Trim(ka(i, 4))) Then
d.Item(Trim(ka(i, 4))) = Empty
End If
Next
Erase ka
ka = ThisWorkbook.Worksheets(NewData).Range("a1").CurrentRegion.Resize(, 10).Value2
ReDim k(1 To UBound(ka, 1), 1 To UBound(ka, 2))
For i = 2 To UBound(ka, 1)
If Not d.exists(ka(i, 4)) Then
n = n + 1
For c = 1 To UBound(ka, 2)
k(n, c) = ka(i, c)
Next
End If
Next
If n Then
'append new record into the master sheet
With ThisWorkbook.Worksheets(MasterSheet)
.Range("a" & .Rows.Count).End(xlUp).Offset(1).Resize(n, UBound(k, 2)) = k
End With
End If
End Sub
ghendi
07-12-2013, 12:06 AM
Hmmm.... That code seems to just copy over the data from worksheet 2 to the master sheet. Is it removing duplicates along the way? In case I was unclear in my explanation, I'll attempt to explain in an organized manner:
Step 1: I receive a list of data with 65 columns and up to a 1000 rows of data. (many times a dozen or so of the rows won't have data in every column)
Step 2: I open the new list received and run the following code to clean up the data and keep only the relevant fields.
Sub MailOrganize()
Columns("C:C").Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
Columns("B:B").Select
Selection.Copy
Columns("C:C").Select
Selection.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
Set rngTemp1 = Range("B:B")
With Range("A1,B1,C1,D1,E1,F1,G1,H1,I1,J1").Font
.Bold = True
End With
Range("A1").Value = "SITE"
Range("B1").Value = "TEMP_1"
Range("C1").Value = "TEMP_2"
Range("D1").Value = "ADDRESS_1"
Range("E1").Value = "UNIT"
Range("F1").Value = "CITY"
Range("G1").Value = "STATE"
Range("H1").Value = "ZIP"
Range("I1").Value = "FIRST"
Range("J1").Value = "LAST"
Set rngTemp1 = Range("B1:B1000")
' On Error GoTo SkipTemp1
rngTemp1.SpecialCells(xlCellTypeBlanks).Select
Dim l1Cell
For Each l1Cell In Selection
l1Cell.Value = " --- "
Next
'SkipTemp1:
Set rngTemp2 = Range("C1:C1000")
' On Error GoTo SkipTemp2
rngTemp2.SpecialCells(xlCellTypeBlanks).Select
Dim l2Cell
For Each l2Cell In Selection
l2Cell.Value = " --- "
Next
'SkipTemp2:
Set rngSite = Range("A1:A1000")
' On Error GoTo SkipSite
rngSite.SpecialCells(xlCellTypeBlanks).Select
Dim sCell
For Each sCell In Selection
sCell.Value = " --- "
Next
'SkipSite:
Set rngName = Range("I1:I1000")
' On Error GoTo SkipName
rngName.SpecialCells(xlCellTypeBlanks).Select
Dim nameCell
For Each nameCell In Selection
nameCell.Value = " --- "
Next
'SkipName:
End Sub
Step 3: In case the list didn't have a complete 1 thousand entries, I would clear the contents from any "---" marks continuing past the data down to the 1000 row.
Step 4: I would then close the list and open the Master workbook.
Step 5: I would run the following import script:
Sub BulkImport()
Sheets(1).Select
Dim Srt As Worksheet
Set Srt = ActiveSheet
Dim InFileNames As Variant
Dim fCtr As Long
Dim tempWkbk As Workbook
Dim consWks As Worksheet
Dim szToday As String
szToday = Format(Date, "mm-dd-yy")
Sheets.Add
ActiveSheet.Move After:=Srt
ActiveSheet.Name = szToday
Set consWks = ActiveWorkbook.Sheets(2)
InFileNames = Application.GetOpenFilename _
(FileFilter:="Excel Files, *.*", MultiSelect:=True)
Application.ScreenUpdating = False
If IsArray(InFileNames) Then
For fCtr = LBound(InFileNames) To UBound(InFileNames)
With Workbooks.Open(Filename:=InFileNames(fCtr))
MsgBox "Number of cells that have data: " & Application.WorksheetFunction.CountA(.Sheets(1).Ra nge("A2:J" & .Sheets(1).Range("A" & .Sheets(1).Rows.Count).End(xlUp).Row))
.Sheets(1).Range("A1:J" & .Sheets(1).Range("A" & .Sheets(1).Rows.Count).End(xlUp).Row).Copy consWks.Range("A" & consWks.Rows.Count).End(xlUp)(1)
.Close 0
End With
Next fCtr
Else
MsgBox "No file selected."
End If
With Application
.StatusBar = True
.ScreenUpdating = True
End With
End Sub
Step 6: I would then need to compare the new list to the list on the master sheet. I would need to find duplicates based on the members' addresses. The members' addresses will be listed in Column D. Should it find a duplicate, it would mark all the duplicates (preferably the entire row).
Step 7: I would then need to remove the duplicates after assessing that they really are duplicates since sometimes clients have similar addresses but with a different unit or suite number. One way I thought of is to have the duplicates marked on the temp worksheet, and copied to a new worksheet where the entire row with the duplicate found would be a green color background while the entire row being checked against from the master sheet would be copied underneath it to the new worksheet, and would have a yellow background. Although, I'm not sure if it's attainable through the code to do.
Step 8: I would then remove the duplicates from the temp worksheet.
Step 9: I would copy the new unique data over to the master sheet to the bottom, continuing the growth of the sheet. Since all the headers are the same, I would not need the first row from the temp worksheet to be copied over to the master sheet.
I really really appreciate it. I can't stress that enough. Thank you for all the help you can give.
If there is anything else needed from me, please let me know.
Respectfully,
Amit
Admin
07-12-2013, 10:21 AM
Have you tried the code ? Try to run the code after importing the new list.
ghendi
07-13-2013, 02:05 AM
Hello Admin,
The code seems to copy data from the second worksheet to the master sheet. When scrolling through the data, there are many highlighted cells in Column C (?). Although, when searching for duplicates, there are none. So I'm guessing the code automatically removed the duplicates? If the code were to automatically remove duplicates, then I would need it to confirm that data in Column D are duplicates, as well as the data in Column E (which is the unit or apartment number) since some people may live in the same building. And since it would be automatically removing the duplicates, could it give me a msgbox stating how many duplicate rows were found at the end of the code? It is important that sheet2 not have the duplicates either since I will be needing to send a welcome mail to those new members. Also, the bulkimport macro creates a new sheet in the master workbook based on today's date. Could the code you wrote automatically look for sheet #2 instead of giving it a specific name? Bulkimport macro will make sure that the new sheet will always be sheet #2.
Thank you very much Admin.
Respectfully,
Amit
Admin
07-13-2013, 09:06 AM
Hi
OK. Try this. untested.
Option Explicit
Sub kTest()
Dim ka, k(), i As Long, n As Long, c As Long, d As Object
Dim DupeCount As Long, addr As String, wksNewData As Worksheet
Const MasterSheet As String = "Sheet1" '<< adjust
Set d = CreateObject("scripting.dictionary")
d.comparemode = 1
ka = ThisWorkbook.Worksheets(MasterSheet).Range("a1").CurrentRegion.Resize(, 10).Value2
For i = 2 To UBound(ka, 1)
If Len(Trim(ka(i, 4))) Then
d.Item(Trim(ka(i, 4))) = Empty
End If
Next
Erase ka
Set wksNewData = ThisWorkbook.Worksheets(2)
ka = wksNewData.Range("a1").CurrentRegion.Resize(, 10).Value2
ReDim k(1 To UBound(ka, 1), 1 To UBound(ka, 2))
For i = 2 To UBound(ka, 1)
If Not d.exists(ka(i, 4)) Then
n = n + 1
For c = 1 To UBound(ka, 2)
k(n, c) = ka(i, c)
Next
Else
DupeCount = DupeCount + 1
addr = addr & ",D" & i
If Len(addr) > 245 Then
wksNewData.Range(Mid(addr, 2)).Interior.Color = 65535
addr = vbNullString
End If
End If
Next
If Len(addr) > 1 Then
wksNewData.Range(Mid(addr, 2)).Interior.Color = 65535
addr = vbNullString
End If
If n Then
'append new record into the master sheet
With ThisWorkbook.Worksheets(MasterSheet)
.Range("a" & .Rows.Count).End(xlUp).Offset(1).Resize(n, UBound(k, 2)) = k
End With
End If
If DupeCount Then
MsgBox "There are " & DupeCount & " duplicates.", vbInformation
End If
End Sub
ghendi
07-13-2013, 10:45 PM
Admin,
I tested your code. It looks great! It finds the duplicates in sheet2 and appends the unique data to the master sheet. It seems to be highlighting fields in Column C of the master sheet and I'm not sure why. Excel also doesn't let me sort by Cell Color On Top for Column C, so I'm not sure what the reason is for them being highlighted.
Also, I'd just like to make sure that the duplicate finding process is looking for matches in both columns D and E, since it is appending the unique values automatically, I don't want to miss people's unit numbers.
THANK YOU VERY MUCH!!!! It looks great!
I have one other request to the appending process: can you make it so when the new data is appended, automatically in Column K a new data field is created with the value of 0 ?
Again, thank you very much Admin. Your help is deeply appreciated.
- Amit
Edit: I cleared all the rules that were on the master sheet and it seems to fix the problem of the highlighting in column c, although I don't remember every putting a rule on. So that issue is done :)
Admin
07-13-2013, 11:44 PM
Hi
replace
.Range("a" & .Rows.Count).End(xlUp).Offset(1).Resize(n, UBound(k, 2)) = k
with
.Range("a" & .Rows.Count).End(xlUp).Offset(1).Resize(n, 10) = k
ghendi
07-16-2013, 01:09 AM
Hey Admin,
Just gave it a shot and all is looking well.
Could you please add to the code the ability to add to Column K in the master sheet the value 0 when it is copied over?
Also, can you please confirm that both Column D and E are being checked in order to find the duplicate rows? I'm just worried 2 members with the same street address but different unit numbers will be considered duplicates.
Thank you!!
-Amit
Admin
07-16-2013, 08:02 AM
Hi
try thid.
Option Explicit
Sub kTest()
Dim ka, k(), i As Long, n As Long, c As Long, d As Object
Dim DupeCount As Long, addr As String, wksNewData As Worksheet
Dim UniqueString As String
Const MasterSheet As String = "Sheet1" '<< adjust
Set d = CreateObject("scripting.dictionary")
d.comparemode = 1
ka = ThisWorkbook.Worksheets(MasterSheet).Range("a1").CurrentRegion.Resize(, 10).Value2
'loop thru master sheet
For i = 2 To UBound(ka, 1)
UniqueString = Trim(ka(i, 4)) & Trim(ka(i, 5)) '<<< adjust the columns
If Len(UniqueString) Then
d.Item(UniqueString) = Empty
End If
Next
Erase ka
Set wksNewData = ThisWorkbook.Worksheets(2)
ka = wksNewData.Range("a1").CurrentRegion.Resize(, 10).Value2
ReDim k(1 To UBound(ka, 1), 1 To UBound(ka, 2))
For i = 2 To UBound(ka, 1)
UniqueString = Trim(ka(i, 4)) & Trim(ka(i, 5)) '<<< adjust the columns
If Not d.exists(UniqueString) Then
n = n + 1
For c = 1 To UBound(ka, 2)
k(n, c) = ka(i, c)
Next
Else
DupeCount = DupeCount + 1
addr = addr & ",D" & i
If Len(addr) > 245 Then
wksNewData.Range(Mid(addr, 2)).Interior.Color = 65535
addr = vbNullString
End If
End If
Next
If Len(addr) > 1 Then
wksNewData.Range(Mid(addr, 2)).Interior.Color = 65535
addr = vbNullString
End If
If n Then
'append new record into the master sheet
With ThisWorkbook.Worksheets(MasterSheet)
.Range("a" & .Rows.Count).End(xlUp).Offset(1).Resize(n, UBound(k, 2)) = k
End With
End If
If DupeCount Then
MsgBox "There are " & DupeCount & " duplicates.", vbInformation
End If
End Sub
ghendi
07-16-2013, 10:09 PM
UniqueString = Trim(ka(i, 4)) & Trim(ka(i, 5)) '<<< adjust the columns
I'm not sure what you mean by "adjust columns". What do I need to do?
Admin
07-16-2013, 10:15 PM
currently it's col D and col E. If it's not the actual columns replace 4 and 5 with appropriate col number.
ghendi
07-17-2013, 04:26 AM
Admin,
Thank you for your patience! The code is running perfectly. I appreciate all your help.
Powered by vBulletin® Version 4.2.5 Copyright © 2025 vBulletin Solutions Inc. All rights reserved.