PDA

View Full Version : Looping Through String Using Excel VBA!



Jdean
09-07-2014, 05:40 AM
Hello!


I hope someone can help me with this.

Assume:
String A contains "5, 45, 100, 113, 160"
String B contains "2, 4, 45, 160, 189"

If want to compare the two strings to check whether they have two integers in common, in this case it will be "45" and "160".
How I have approached my solution is to use the split function to split each string into 2 arrays. Then using two For-loops I compare each token of each string array to the tokens of the other array. Then using a counter if there is a match then I increment the counter. If the counter is >=2 then it means the two share at least two numbers.

This approach works fine except I will be doing this for two very large sets of strings and it is very time-consuming in execution. Is there a faster VBA method or data structure to use instead of breaking the two strings up into two arrays and creating two For-loops?

Any suggestions in VBA code would be appreciated.

Thanks!
Jdean

https://www.youtube.com/channel/UCnxwq2aGJRbjOo_MO54oaHA (https://www.youtube.com/channel/UCnxwq2aGJRbjOo_MO54oaHA)
https://www.youtube.com/watch?v=QjEWAJ3d-jw (https://www.youtube.com/watch?v=QjEWAJ3d-jw)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwCGCesYkcmCcv7tzx4AaABAg (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwCGCesYkcmCcv7tzx4AaABAg )
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwCGCesYkcmCcv7tzx4AaABAg.9wbCfWMaaLa9wbLcUjbP CV 3 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwCGCesYkcmCcv7tzx4AaABAg.9wbCfWMaaLa9wbLcUjbP CV 3)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwCGCesYkcmCcv7tzx4AaABAg.9wbCfWMaaLa9wbLmasNy aX 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwCGCesYkcmCcv7tzx4AaABAg.9wbCfWMaaLa9wbLmasNy aX 1)
https://www.youtube.com/channel/UCnxwq2aGJRbjOo_MO54oaHA (https://www.youtube.com/channel/UCnxwq2aGJRbjOo_MO54oaHA)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgzxkJD1jksXet8AZYB4AaABAg.9p3jaxCq0AG9wbF__jtm 9w 2 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgzxkJD1jksXet8AZYB4AaABAg.9p3jaxCq0AG9wbF__jtm 9w 2)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgxePNoJ9lMOZZIxSI54AaABAg.9n_K6OLzSGt9wbFsaPa2 ym 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgxePNoJ9lMOZZIxSI54AaABAg.9n_K6OLzSGt9wbFsaPa2 ym 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwUIM7LhCvJkBpHL4N4AaABAg.9j-vSfzAHrw9wbFzCwVRUo 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwUIM7LhCvJkBpHL4N4AaABAg.9j-vSfzAHrw9wbFzCwVRUo 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwQ_hGXSa1PNKbT-r94AaABAg.9hmiz-Qc-bq9wbG1qa8wKO 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwQ_hGXSa1PNKbT-r94AaABAg.9hmiz-Qc-bq9wbG1qa8wKO 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwqWxGrYGjtUAJG6aF4AaABAg.9hI9sgAhykQ9wbG4KJfN 91 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwqWxGrYGjtUAJG6aF4AaABAg.9hI9sgAhykQ9wbG4KJfN 91 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgxJL5BeSLbJ-m7BWW54AaABAg.9euWbYmFb169wbG8eMb5Wb 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgxJL5BeSLbJ-m7BWW54AaABAg.9euWbYmFb169wbG8eMb5Wb 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwaEHwTeazYGD7xHmN4AaABAg.9eWJC0jtPrJ9wbGCRm3I O6 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwaEHwTeazYGD7xHmN4AaABAg.9eWJC0jtPrJ9wbGCRm3I O6 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgySibJeWUXeEn3qez14AaABAg.9dj9CcZAzcq9wbGH5Fhl qO (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgySibJeWUXeEn3qez14AaABAg.9dj9CcZAzcq9wbGH5Fhl qO )
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgyrMrxE5-AP81sgU8V4AaABAg.9aoKBx9yaE89wbGOGcNnKy 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgyrMrxE5-AP81sgU8V4AaABAg.9aoKBx9yaE89wbGOGcNnKy 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=Ugw5b6kCEckEbGTccxp4AaABAg.9_Sbwexq-co9wbGW8LbhKp 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=Ugw5b6kCEckEbGTccxp4AaABAg.9_Sbwexq-co9wbGW8LbhKp 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgyCQp_ShaVxQui5hJh4AaABAg.9ZBRfgBVmcd9wbGdP0tn Ci 2 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgyCQp_ShaVxQui5hJh4AaABAg.9ZBRfgBVmcd9wbGdP0tn Ci 2)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=Ugz_lKW2DNBax4Aemst4AaABAg.9Xjhb-fv4pt9wbGgysEibx (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=Ugz_lKW2DNBax4Aemst4AaABAg.9Xjhb-fv4pt9wbGgysEibx )
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgxguKtw3d8jE8bkGTB4AaABAg.9UuGKC386629wbGl32wv jC 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgxguKtw3d8jE8bkGTB4AaABAg.9UuGKC386629wbGl32wv jC 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwLt2hK6AcHVnVlaUl4AaABAg.9HKd-ioHqxM9wbH2o6HYsJ 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwLt2hK6AcHVnVlaUl4AaABAg.9HKd-ioHqxM9wbH2o6HYsJ 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=Ugw-IPT7RwxyRo4cbqd4AaABAg.9GqtD5j30Wp9wbH6q7RTJa 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=Ugw-IPT7RwxyRo4cbqd4AaABAg.9GqtD5j30Wp9wbH6q7RTJa 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgzLnQG1_LQtmvLQoot4AaABAg.9FvawuMTb-k9wbHFrsug5Z 1 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgzLnQG1_LQtmvLQoot4AaABAg.9FvawuMTb-k9wbHFrsug5Z 1)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=Ugys6Ur7BNsRFbH_f_B4AaABAg.9DhZy5EEpKY9wbHfyJkV MG 3 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=Ugys6Ur7BNsRFbH_f_B4AaABAg.9DhZy5EEpKY9wbHfyJkV MG 3)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgxJLVpwY8fIla7G-pN4AaABAg.9BLeCWVhxdG9wbILDvziWr 2 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgxJLVpwY8fIla7G-pN4AaABAg.9BLeCWVhxdG9wbILDvziWr 2)
https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwgzeOLschepoIO3gx4AaABAg.97v7ND4_6p298-gyUz3MY7 2 (https://www.youtube.com/watch?v=QjEWAJ3d-jw&lc=UgwgzeOLschepoIO3gx4AaABAg.97v7ND4_6p298-gyUz3MY7 2)
https://www.youtube.com/channel/UCnxwq2aGJRbjOo_MO54oaHA (https://www.youtube.com/channel/UCnxwq2aGJRbjOo_MO54oaHA)

Admin
09-07-2014, 10:20 AM
Hi

Welcome to board.

You could try these methods.


Option Explicit

Sub kTest()

Dim x, y, i As Long, c As Long
Dim s1 As String, s2 As String

s1 = "5, 45, 100, 113, 160"
s2 = "2, 4, 45, 160, 189"

x = Split(s1, ",")
y = Split(s2, ",")

With CreateObject("scripting.dictionary")
.comparemode = 1
For i = 0 To UBound(x)
If Len(Trim(x(i))) Then
.Item(Trim(x(i))) = 1
End If
Next

For i = 0 To UBound(y)
If .exists(Trim(y(i))) Then
c = c + 1
End If
Next
MsgBox "There are " & c & " item(s) in common."
End With
c = 0
For i = 0 To UBound(y)
If InStr(1, ", " & s1 & ", ", ", " & Trim(y(i)) & ", ", 1) Then 'careful about the delimiter
c = c + 1
End If
Next
MsgBox "There are " & c & " item(s) in common."

End Sub

Jdean
09-07-2014, 02:08 PM
Thanks, Admin. I created a function using the Dictionary data structure method you provided. It worked but when i tested it for large datasets (in the hundreds of thousands) it didn't run any faster than the good ole fashioned two For-loops array method I was using.

Your second method using the Instr function looks interesting, but before I try it I would like for you to slowly explain the arguments for me. I really want to understand what is in the Instr() parentheses. I know it has the structure InStr( [start], string, substring, [compare] ) but I couldn't easily sync it up with your coded arguments.
Also, will the number of delimiters stay the same regardless of the length of S1 and S2 for the Instr() method? (Of course, I assume each input string will always contain more than one token).

Jdean

Admin
09-07-2014, 03:02 PM
I used the delimiter to surround the string just to ensure we won't get a incorrect result when you search for a sub-string within the String. For example, if you don't use that delimiter, Instr(1,s1,"4",1) would give you 4 which is incorrect.

snb
09-07-2014, 04:53 PM
Where do those strings reside ? ( in cells, in ranges, txt-files, clipboard, somewhere else..)
Why is this an Excel question in your opinion ?

Jdean
09-07-2014, 09:02 PM
Where do those strings reside ? ( in cells, in ranges, txt-files, clipboard, somewhere else..)
Why is this an Excel question in your opinion ?

The strings reside in XL Ranges ("A:A") and ("B:B"). So String A and B (or S1 and S2) are just a comparison of two cell values - one from each column range.
My current approach has been to load both into separate vectors (or 1D arrays), iterate each slot, break each slot into tokens and use two For-Loops to check and count common tokens. This is what the Admin suggested two other methods for checking two strings.

Jdean
09-07-2014, 09:07 PM
Okay, thanks, Admin. I tested your Instr() function method and it also worked and ran almost as fast as the two For-loops method I was using. It was definitely faster than the dictionary data structure approach. Still playing around with variations of both to see if the execution time will be better.

snb
09-07-2014, 09:38 PM
Is this correct ?

cell A1 contains: "5, 45, 100, 113, 160"
cell B1 contains: "2, 4, 45, 160, 189"

Jdean
09-07-2014, 10:08 PM
Is this correct ?

cell A1 contains: "5, 45, 100, 113, 160"
cell B1 contains: "2, 4, 45, 160, 189"

Correct... and I want to count how many tokens A1 and B1 have in common.

snb
09-07-2014, 11:51 PM
Do you want to compare only A1 to B1 or A1 to every string in column B ?

I would split the values in column A to colum A:E (texttocolumns)

Then it's easy to compare the values in A1:E1 to the string in column F and put the result of the comparison in column G

See the attachment.

Jdean
09-08-2014, 12:35 AM
Do you want to compare only A1 to B1 or A1 to every string in column B ?

I would split the values in column A to colum A:E (texttocolumns)

Then it's easy to compare the values in A1:E1 to the string in column F and put the result of the comparison in column G

See the attachment.


I would like to compare A1 to every string in column B.

snb
09-08-2014, 02:02 AM
Did you look at the attachment ?

Jdean
09-08-2014, 05:56 AM
Did you look at the attachment ?

Yes, and I am confused. It doesn't look like what I am looking for.
Using text to columns is equivalent to splitting each string into an array with the delimiter "," using the split() function. I executed your code and it still confuses me.

The task I am trying to solve is: I have two strings. I need a quick way to find how many tokens or numbers they have in common.

p45cal
09-10-2014, 04:26 PM
If each list conatains say 1000 items, with a nested loop you could be making 1000 x 1000 = 1 million comparisons.
Have you considered using an Exit For/Exit Do once a single match is found?
Are both lists already sorted (it looks like they may be)? If so, instead of comparing all values in one list with all values in the other, you could walk down both lists at the same time:
First compare both first items in the lists and see which one is smaller, then iterate down the list which had the smaller value until you get a match or the value is bigger, then stop running down that list and continue running down the other list until again, a match or bigger value is encountered etc. That way only 2000 comparisons get made.
Eg. using your example:
list A: "5, 45, 100, 113, 160"
list B: "2, 4, 45, 160, 189"

compare 5 and 2. 2 is smaller so run down list B:
compare 5 and 4
compare 5 and 45
stop at the 45 and start running down list A:
compare 45 and 45 (match, so add to the count)
compare 45 and 100
stop at 100 to flip over:
compare 100 and 160
stop at 160 to flip over:
compare 160 and 113
compare 160 and 160 (match, so add to the count)
compare 160 and 189
stop at 189 to flip over (or since you've come to the end of one of the lists there's no need to continue).

Using your example we need only make 8 comparisons instead of 25.

Perhaps post the code you already have?
Perhaps attach a workbook?

edit post posting:
something along these lines (neither debugged nor streamlined):
Sub blah()
s1 = "5, 45, 100, 113, 160"
s2 = "2, 4, 45, 160, 189"
x = Split(s1, ",")
y = Split(s2, ",")
ix = 0: iy = 0
Do Until ix > UBound(x) Or iy > UBound(y) 'put ubounds into variables so that they don't need to be calculated in each iteration.
'Debug.Print x(ix), y(iy), IIf(CLng(x(ix)) = CLng(y(iy)), "Match!", "")
If CLng(x(ix)) = CLng(y(iy)) Then Count = Count + 1 'I've assumed whole numbers; requires a tweak for singles or doubles.
If CLng(x(ix)) >= CLng(y(iy)) Then iy = iy + 1 Else ix = ix + 1
Loop
MsgBox Count & " item(s) in common"
End Sub

Rick Rothstein
09-12-2014, 08:25 AM
What about doing it with no loops? Here is a one-liner (albeit a long one) UDF (user defined function) that you can use directly in a formula on a worksheet...

Function CountDupes(R1 As Range, R2 As Range) As Long
CountDupes = Evaluate("SUM(0+ISNUMBER(TRANSPOSE(FIND("", ""&TRIM(MID(SUBSTITUTE(" & _
R1.Address & ","", "",REPT("" "",99)),ROW(A1:A" & UBound(Split(R1.Value, _
",")) + 1 & ")*99-98,99))&"", "","", ""&" & R2.Address & "&"", ""))))")
End Function


HOW TO INSTALL UDFs
------------------------------------
If you are new to UDFs, they are easy to install and use. To install it, simply press ALT+F11 to go into the VB editor and, once there, click Insert/Module on its menu bar, then copy/paste the above code into the code window that just opened up. That's it.... you are done. You can now use CountDupes just like it was a built-in Excel function. For example,

="Number of matching numbers = " & CountDupes(A1,B1)

If you are using XL2007 or above, make sure you save your file as an "Excel Macro-Enabled Workbook (*.xlsm) and answer the "do you want to enable macros" question as "yes" or "OK" (depending on the button label for your version of Excel) the next time you open your workbook.

Rick Rothstein
09-12-2014, 09:24 AM
What about doing it with no loops? Here is a one-liner (albeit a long one) UDF (user defined function) that you can use directly in a formula on a worksheet...

Function CountDupes(R1 As Range, R2 As Range) As Long
CountDupes = Evaluate("SUM(0+ISNUMBER(TRANSPOSE(FIND("", ""&TRIM(MID(SUBSTITUTE(" & _
R1.Address & ","", "",REPT("" "",99)),ROW(A1:A" & UBound(Split(R1.Value, _
",")) + 1 & ")*99-98,99))&"", "","", ""&" & R2.Address & "&"", ""))))")
End Function


As a follow up to my one-liner above, I should mention two things. First, although I presented the above function as a UDF (in Message #15), it can be called from other VBA procedures as well, just make sure to pass both arguments as Ranges, not text Strings. Second, there is a worksheet formula equivalent to the one-liner function above... instead of installing the code and calling the CountDupes formula, you can use this array-entered** formula directly on the worksheet

=SUM(0+ISNUMBER(TRANSPOSE(FIND(", "&TRIM(MID(SUBSTITUTE(A1,", ",REPT(" ",99)),ROW(INDIRECT("A1:A"&LEN(A1)-LEN(SUBSTITUTE(A1,",",""))+1))*99-98,99))&", ",", "&B1&", "))))

** Commit this formula using CTRL+SHIFT+ENTER and not just Enter by itself.