'4b)= =Building Array from HTML Table
There could be some erroneous / irrelevant 'comments in this particular coding as I am trying to keep iit comparable with some others that I have for web scrapping. So apologies for that. The walkthrough here should be specific for the issue at hand and approximately in line with the ' comments in the coding
I am doing a working example using my old Acer Aspire Lap top 4810TZG with Vista Operating System. If I use the WinUpdatesList software from NirSoft whilst my Google Chrome Browser is open then a new tab appears with a table in it. If I click on that, do a Ctrl+a , followed by a Ctrl+c, then click in the first cell of a spare Excel worksheet, and finally do a Ctrl+v , than I finally get a full worksheet looking something as shown here:
Worksheet report_html.JPG : https://imgur.com/FieBLkF
An example is in worksheet “report_html” in this file
“WMI Query.xlsm” : https://app.box.com/s/z15s0syizlct1nufhivl2yu4iupek8pi
In fact, that final table is more or less what I finally want. I forgot why I am trying to get that now in a complicated way, but never mind..
That table/range comes from the html coding string file , “report.html” . That html file is what my coding uses. So observation of that Excel worksheet table/range characteristics will be useful to have when developing and explaining the current code section, '4b) , under discussion. In particular it will be helpful to note the following characteristics of that excel worksheet table/range
_ It uses columns A-K. So we have 11 columns
_ In total there are 953 rows used.
_ For the actual data, rows 7 to 953 are used. So we have ( 953-7)+1 = 947 rows of data ( … that calculation comes from like say I have 10 rows of data from row 1 to row 10, then … (10-1)+1=10 )
Note: To get this routine to work, you will need to have 2 Files in the same ( any ) Folder …._
_... the file with the coding in it ,
as well as _..
_... the “report.html” file , within a folder “Updates”
ThisWorkbook Udates report_html.JPG : https://imgur.com/3udCqdW
The code line that requires this organisation of files is this:
_ Let strURL = ThisWorkbook.Path & "\Updates" & "report.html"
So if you have your coding and file organisation in some other way then you will need to adjust that code line appropriately : The full path and file name given in that line must be that of where the file "report.html" is
Remember:
To get that file, "report.html" , you need to run wul.exe and then select the HTML Report – All Items icon
HTML Report - All Items .JPG : https://imgur.com/OQRzvrZ
Re read again from here for more detail http://www.excelfox.com/forum/showth...0893#post10893
'4b)= =Building Array from HTML Table
'4b(i) returns us the table characteristics of
rowCnt = 948
colCt = 10428
colCnt = 11
For example, click in left margin to put stop on code, and run code, and hover over the variables with the mouse to see the contents of a variable:
rowCnt947.JPG : https://imgur.com/jEXoHrJ
'4b(ii) Looping through rows to build output array BASIC IDEA
At this point it might be worth a quick glance at a more simple code such as from Pike here : ( https://www.mrexcel.com/forum/excel-...ml#post4026613 )
My coding in section '4b is in effect doing nothing more than that. I am basically looping through all HTML Cell s and preparing from that an output. The above code pastes out each HTML Cell content to a worksheet Cell as it goes along each cell in a row of the HTML tableCode:Set objTable = html.getElementsByTagName("table") For lngTable = 0 To objTable.Length - 1 For lngRow = 0 To objTable(lngTable).Rows.Length – 1 ' For every row in a table .._ For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length – 1 ' _.. Go along every cell in that row ThisWorkbook.Sheets("Sheet2").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText Next lngCol Next lngRow ActRw = ActRw + objTable(lngTable).Rows.Length + 1 Next lngTable End Sub
My coding differs in that it first fills an array completely, which is the main purpose of '4b(ii)a - '4b(ii)b , and then finally does the pasting out in one go in '4b(ii)c.
My coding has additionally a more complicated way to get each array element which allows for the case , ( which we don’t actually have, but never mind ) of a html table within a html table, or similar , and additionally we can do some post processing for example selectively on each column. That can be useful for example to change frequently used long text to an abbreviation to make the final table more readable.
'4b(ii) Looping through rows to build output array. Walkthrough coding
__Rows are looped through, and at each row
____the cells in each row, which are effectively the columns in a classic sense, are looped through
All that is basically required at this point to fill our array , Data( , ) , would be a simple code line of like
_____ Data(r, C)= oTable.Rows(r).Cells(C).innerText
In such a code line, oTable.Rows(r).Cells(C) returns the ”HTML dom object model “ cell object, ( HTMLTableCell ) :
HTMLTableCell object.JPG : https://imgur.com/UT66C6D
In most simple coding the .innerText Property could be used to get the actual information that we want, for example in the first cell , we see “Name “ :
innerText.JPG : https://imgur.com/UT66C6D
( This would tend to suggest that the table we have found is , in fact the actual data, rather than the final output given by the WinUpdatesList software.
Name at top left of data.JPG : https://imgur.com/yU4jd7Z
The discrepancy in the total number of rows, I am not sure about at this stage )
Brief description of Sub GetElemsText
( I have this Sub GetElemsText from Leith Ross : https://www.mrexcel.com/forum/excel-...ml#post4031122 )
You will see that I do not do the simple code line of
_____ Data(r, C)= oTable.Rows(r).Cells(C).innerText
The code line I do have , will in the vast majority of situations do exactly the same:
_____Call GetElemsText(oTable.Rows(r).Cells(C), Data(r, C))
The routine Sub GetElemsText is given the HTMLTableCell object coresponding to the r C values and from that puts the innerText for that cell in the array element Data(r, C)
It does it in a somewhat indirect way. Briefly in words: In the “html node tree structure” we have many nodes , and we can “go down” to the . ChildNodes(0) . ChildNodes(1) , ChildNodes(2) …. for as many as there are at the “next level down”. Even for the case of our simple cells we have two levels. The first does not refer to the level where the text is. The routine detects that and will keep “going down” until a node level corresponding to a text is found. This does not have much relevance for our situation, other than each Call of the routine results in it being done twice: the routine “Calls itself” which actually means that the routine pauses whilst a second copy of the routine is made. For our simple data example, the text is actually got on the second copy run of the routine. !!
This general process of the routine “calling itself” is the classic recursion process. This takes some careful and long explaining and is better demonstrated with a more complicated cell example using a reduced test data sample, see here for example: #####
Because of how this routine works, it does not actually get text from the run of the routine taking in the HTMLTableCell object, because the node associated with that is not a text. It is the “next node down” that is recognised as a text node !!
In addition, a further extra function of the routine is that when it does get at a text node, it first does an additional test to see if text was present as the current running copy of the routine. This would be the case of something like a paragraph: Since text will have been added before we finally come back up all the levels of the routine calling itself. The new text will then be concatenated with a "~" so that we could have the possibility to re split this text later. The second copy run of the routine at each Cell will take the same empty string with it which was supplied as the next array element to be filled in the previous ( fist copy) run of the routine
For the purposes of our requirement , the extra abilities of the routine should never be needed, but you never know… it does no harm to have it there , just in case….
_.___
'4b(ii)c Output from Array
I have left this section fairly simple for now as I may come back to it later, and adjust once I have used the routine a few times
_.________________________________________________ ________________________
Here is an initial File. I will likely customise it and the coding in it the next few days
WMI Query.xlsm : https://app.box.com/s/z15s0syizlct1nufhivl2yu4iupek8pi
Here again is the links to the initial coding:
http://www.excelfox.com/forum/showth...0896#post10896
http://www.excelfox.com/forum/showth...0897#post10897
http://www.excelfox.com/forum/showth...0898#post10898




Reply With Quote
Bookmarks