Page 1 of 3 123 LastLast
Results 1 to 10 of 28

Thread: Remove Duplicates From Across Multipe Columns In A Single WorkSheet

  1. #1
    Junior Member
    Join Date
    Oct 2013
    Posts
    14
    Rep Power
    0

    Remove Duplicates From Across Multipe Columns In A Single WorkSheet

    Hi all,

    I am in need of help, that's the short story.

    I have almost 9 million cells with values in them and they occupy 17 columns in total (one sheet). Some of them take almost 1.04 mln rows, some 70k.

    All I have to do is to remove duplicate values. I've tried advanced copy/paste/only-unique-values feature on excel but even though I'm working on nice Dell i7 8gb ram etc., I have found excel crashed in the morning.

    Any help including vba will be very appreciated.

    Many thanks in advance
    T.

  2. #2
    Administrator Excel Fox's Avatar
    Join Date
    Mar 2011
    Posts
    1,402
    Rep Power
    10
    Do you want the duplicates in each column removed, or should the data in the entire sheet be unique? If it's for each column, try this

    Code:
    Sub ExcelFox()
            
        Dim lng As Long
        Dim wks As Worksheet
        
        Set wks = Worksheets("NameOfSheetWithDuplicateValues")
        For lng = 1 To 17
            With wks
                .Range(.Cells(1, lng), .Cells(.Rows.Count, lng).End(xlUp)).RemoveDuplicates Columns:=1, Header:=xlNo
            End With
        Next lng
        
    End Sub

    Note that this will only work for Excel 2007 and above.
    A dream is not something you see when you are asleep, but something you strive for when you are awake.

    It's usually a bad idea to say that something can't be done.

    The difference between dream and aim, is that one requires soundless sleep to see and the other requires sleepless efforts to achieve

    Join us at Facebook

  3. #3
    Junior Member
    Join Date
    Oct 2013
    Posts
    14
    Rep Power
    0
    They are already unique in each column, need to remove duplicates between columns now..

    Thanks anyway.

  4. #4
    Administrator Excel Fox's Avatar
    Join Date
    Mar 2011
    Posts
    1,402
    Rep Power
    10
    Try this

    Code:
    Sub ExcelFox()
            
        Dim lng As Long
        Dim wks As Worksheet
        Dim objDic As Object
        Dim var As Variant
        Dim varIndex As Variant
        Dim lngRow As Long
        Const clngSteps As Long = 100000
        
        Set objDic = CreateObject("Scripting.Dictionary")
        Set wks = Worksheets("NameOfSheetWithDuplicateValues")
        For lng = 1 To 17
            With wks
                .Range(.Cells(1, lng), .Cells(.Rows.Count, lng).End(xlUp)).RemoveDuplicates Columns:=1, Header:=xlNo
                var = .Range(.Cells(1, lng), .Cells(.Rows.Count, lng).End(xlUp)).Value2
                .Range(.Cells(1, lng), .Cells(.Rows.Count, lng).End(xlUp)).Clear
                For lngRow = 1 To UBound(var)
                    objDic.Item(var(lngRow, 1)) = 0
                Next lngRow
            End With
        Next lng
        Erase var
        var = objDic.keys
        Set objDic = Nothing
        If wks.Parent.FullName <> wks.Parent.Name Then
            wks.Parent.Save
        End If
        lng = 1
        For lngRow = 1 To UBound(var) + 1 Step clngSteps
            varIndex = Evaluate(clngSteps * (lng - 1) & "+ ROW(1:" & Application.Min(UBound(var) + 1, clngSteps) & ")")
            Cells(1, lng).Resize(Application.Min(UBound(var) + 1, clngSteps)).Value = Application.Index(var, varIndex)
            lng = lng + 1
        Next lngRow
        Erase varIndex
        
    End Sub
    A dream is not something you see when you are asleep, but something you strive for when you are awake.

    It's usually a bad idea to say that something can't be done.

    The difference between dream and aim, is that one requires soundless sleep to see and the other requires sleepless efforts to achieve

    Join us at Facebook

  5. #5
    Junior Member
    Join Date
    Oct 2013
    Posts
    14
    Rep Power
    0

    First result

    Thank you so much for your effort. The script ran for 2 hours then crashed at below line:

    Cells(1, lng).Resize(Application.Min(UBound(var) + 1, clngSteps)).Value = Application.Index(var, varIndex)

    Error message said 'mismatch error'

    also, the sheet has been cleared of all data - if it's of any help to you.

    Kind Regards

  6. #6
    Administrator Excel Fox's Avatar
    Join Date
    Mar 2011
    Posts
    1,402
    Rep Power
    10
    What's your formula separator? Is it : or something else?

    Also, can you test this on a much smaller sample.. maybe 10000 rows and 17 columns.... If it works, then I'll come up with something else. If it doesn't work, then something's wrong with the data.
    A dream is not something you see when you are asleep, but something you strive for when you are awake.

    It's usually a bad idea to say that something can't be done.

    The difference between dream and aim, is that one requires soundless sleep to see and the other requires sleepless efforts to achieve

    Join us at Facebook

  7. #7
    Administrator Admin's Avatar
    Join Date
    Mar 2011
    Posts
    1,123
    Rep Power
    10
    Hi

    Here is a different method.

    Code:
    Option Explicit
    
    Sub kTest()
        
        Dim strAll  As String
        Dim i       As Long
        Dim r       As Long
        Dim d       As Long
        Dim fd      As String
        Dim fn      As String
        Dim objFS   As Object
        Dim objFile As Object
        Dim adoConn As Object
        Dim adoRset As Object
        
        Const Block = 32000
        
        strAll = "Temp"
        
        For i = 1 To 5
            d = 0
            For r = 1 To 100000 Step Block
                strAll = strAll & vbCrLf & Join(Application.Transpose(Cells(r, i).Resize(Application.Min(Block, Abs(100000 - d))).Value2), vbCrLf)
                d = d + Block
            Next
        Next
        
        fd = Environ("temp") & "\"
        fn = "Test.txt"
        
        Set objFS = CreateObject("scripting.filesystemobject")
        Set objFile = objFS.opentextfile(fd & fn, 2, 1)
        
        objFile.write strAll
        objFile.Close
        
        Set adoConn = CreateObject("ADODB.Connection")
        
        adoConn.Open "Driver={Microsoft Text Driver (*.txt; *.csv)};" & _
                    "Dbq=" & fd & ";Extensions=txt;"
            
        Set adoRset = CreateObject("ADODB.Recordset")
            adoRset.Open "SELECT [Temp] FROM [" & fn & "] GROUP BY [Temp]", adoConn, 3, 1, 1
                
        d = adoRset.RecordCount
        
        ActiveSheet.UsedRange.ClearContents
        
        If d > Rows.Count Then
            i = 1
            While Not adoRset.EOF
                Cells(1, i).CopyFromRecordset adoRset, 1000000
                i = i + 1
            Wend
        Else
            Range("a1").CopyFromRecordset adoRset
        End If
        
        adoRset.Close
        adoConn.Close
        
        Kill fd & fn
        
    End Sub
    Cheers !

    Excel Range to BBCode Table
    Use Social Networking Tools If You Like the Answers !

    Message to Cross Posters

    @ Home - Office 2010/2013/2016 on Win 10 (64 bit); @ Work - Office 2016 on Win 10 (64 bit)

  8. #8
    Junior Member
    Join Date
    Oct 2013
    Posts
    14
    Rep Power
    0

    10000 rows test

    Quote Originally Posted by Excel Fox View Post
    What's your formula separator? Is it : or something else?

    Also, can you test this on a much smaller sample.. maybe 10000 rows and 17 columns.... If it works, then I'll come up with something else. If it doesn't work, then something's wrong with the data.

    On 10000 rows it removed all the content and then displayed the same error in the same line.

    My formula separator seems to be , and not :

  9. #9
    Junior Member
    Join Date
    Oct 2013
    Posts
    14
    Rep Power
    0
    Hi,

    This is regarding 'different method':

    Tried the code, worked for about 20-30 seconds and left me with only column A populated to around 440k rows. Not just unique values cause I can't find some entries from the full 17 columns table.

    Thank you for all your help as I still can't figure this out on my own

  10. #10
    Administrator Admin's Avatar
    Join Date
    Mar 2011
    Posts
    1,123
    Rep Power
    10
    Quote Originally Posted by cyphrevil View Post
    Hi,

    This is regarding 'different method':

    Tried the code, worked for about 20-30 seconds and left me with only column A populated to around 440k rows. Not just unique values cause I can't find some entries from the full 17 columns table.

    Thank you for all your help as I still can't figure this out on my own
    My bad

    I tested this code with 5 columns of data.

    replace
    Code:
    For i = 1 To 5
    with
    Code:
    For i = 1 To 17
    Cheers !

    Excel Range to BBCode Table
    Use Social Networking Tools If You Like the Answers !

    Message to Cross Posters

    @ Home - Office 2010/2013/2016 on Win 10 (64 bit); @ Work - Office 2016 on Win 10 (64 bit)

Similar Threads

  1. Find Duplicates, 2 Columns, Different Worksheets
    By ghendi in forum Excel Help
    Replies: 14
    Last Post: 07-17-2013, 04:26 AM
  2. Remove Special Characters From Text Or Remove Numbers From Text
    By Excel Fox in forum Excel and VBA Tips and Tricks
    Replies: 5
    Last Post: 05-31-2013, 04:43 PM
  3. Print Nth Worksheet To Mth Worksheet using VBA
    By Ryan_Bernal in forum Excel Help
    Replies: 2
    Last Post: 02-28-2013, 06:57 PM
  4. Converge Data From Multiple Columns To Single Column
    By ayazgreat in forum Excel Help
    Replies: 3
    Last Post: 12-14-2012, 10:55 PM
  5. Detect Duplicates In Named Ranges With Cross Check
    By Yegarboy in forum Excel Help
    Replies: 3
    Last Post: 10-09-2012, 11:02 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •