Results 1 to 10 of 52

Thread: ब्लॉग कोशिश कर रहा है بلاگز کی ک*Trying Blogs

Threaded View

Previous Post Previous Post   Next Post Next Post
  1. #35
    Fuhrer, Vierte Reich DocAElstein's Avatar
    Join Date
    Aug 2014
    Posts
    10,457
    Rep Power
    10
    '4b)= =Building Array from HTML Table

    There could be some erroneous / irrelevant 'comments in this particular coding as I am trying to keep iit comparable with some others that I have for web scrapping. So apologies for that. The walkthrough here should be specific for the issue at hand and approximately in line with the ' comments in the coding

    I am doing a working example using my old Acer Aspire Lap top 4810TZG with Vista Operating System. If I use the WinUpdatesList software from NirSoft whilst my Google Chrome Browser is open then a new tab appears with a table in it. If I click on that, do a Ctrl+a , followed by a Ctrl+c, then click in the first cell of a spare Excel worksheet, and finally do a Ctrl+v , than I finally get a full worksheet looking something as shown here:
    Worksheet report_html.JPG : https://imgur.com/FieBLkF
    An example is in worksheet “report_html” in this file
    WMI Query.xlsm” : https://app.box.com/s/z15s0syizlct1nufhivl2yu4iupek8pi

    In fact, that final table is more or less what I finally want. I forgot why I am trying to get that now in a complicated way, but never mind..
    That table/range comes from the html coding string file , “report.html” . That html file is what my coding uses. So observation of that Excel worksheet table/range characteristics will be useful to have when developing and explaining the current code section, '4b) , under discussion. In particular it will be helpful to note the following characteristics of that excel worksheet table/range
    _ It uses columns A-K. So we have 11 columns
    _ In total there are 953 rows used.
    _ For the actual data, rows 7 to 953 are used. So we have ( 953-7)+1 = 947 rows of data ( … that calculation comes from like say I have 10 rows of data from row 1 to row 10, then … (10-1)+1=10 )

    Note: To get this routine to work, you will need to have 2 Files in the same ( any ) Folder …._
    _... the file with the coding in it ,
    as well as _..
    _... the “report.html” file , within a folder “Updates”
    ThisWorkbook Udates report_html.JPG : https://imgur.com/3udCqdW

    The code line that requires this organisation of files is this:
    _ Let strURL = ThisWorkbook.Path & "\Updates" & "report.html"
    So if you have your coding and file organisation in some other way then you will need to adjust that code line appropriately : The full path and file name given in that line must be that of where the file "report.html" is

    Remember:
    To get that file, "report.html" , you need to run wul.exe and then select the HTML Report – All Items icon
    HTML Report - All Items .JPG : https://imgur.com/OQRzvrZ
    Re read again from here for more detail http://www.excelfox.com/forum/showth...0893#post10893

    '4b)= =Building Array from HTML Table
    '4b(i) returns us the table characteristics of
    rowCnt = 948
    colCt = 10428
    colCnt = 11

    For example, click in left margin to put stop on code, and run code, and hover over the variables with the mouse to see the contents of a variable:
    rowCnt947.JPG : https://imgur.com/jEXoHrJ

    '4b(ii) Looping through rows to build output array BASIC IDEA
    At this point it might be worth a quick glance at a more simple code such as from Pike here : ( https://www.mrexcel.com/forum/excel-...ml#post4026613 )
    Code:
     Set objTable = html.getElementsByTagName("table")
     For lngTable = 0 To objTable.Length - 1
            For lngRow = 0 To objTable(lngTable).Rows.Length – 1    '       For every  row in a table .._ 
                For lngCol = 0 To objTable(lngTable).Rows(lngRow).Cells.Length – 1 '  _.. Go along every cell in that row
                    ThisWorkbook.Sheets("Sheet2").Cells(ActRw + lngRow + 1, lngCol + 1) = objTable(lngTable).Rows(lngRow).Cells(lngCol).innerText
                Next lngCol
            Next lngRow
            ActRw = ActRw + objTable(lngTable).Rows.Length + 1
        Next lngTable
    End Sub
    My coding in section '4b is in effect doing nothing more than that. I am basically looping through all HTML Cell s and preparing from that an output. The above code pastes out each HTML Cell content to a worksheet Cell as it goes along each cell in a row of the HTML table

    My coding differs in that it first fills an array completely, which is the main purpose of '4b(ii)a - '4b(ii)b , and then finally does the pasting out in one go in '4b(ii)c.
    My coding has additionally a more complicated way to get each array element which allows for the case , ( which we don’t actually have, but never mind ) of a html table within a html table, or similar , and additionally we can do some post processing for example selectively on each column. That can be useful for example to change frequently used long text to an abbreviation to make the final table more readable.

    '4b(ii) Looping through rows to build output array. Walkthrough coding
    __Rows are looped through, and at each row
    ____the cells in each row, which are effectively the columns in a classic sense, are looped through
    All that is basically required at this point to fill our array , Data( , ) , would be a simple code line of like
    _____ Data(r, C)= oTable.Rows(r).Cells(C).innerText
    In such a code line, oTable.Rows(r).Cells(C) returns the ”HTML dom object model “ cell object, ( HTMLTableCell ) :
    HTMLTableCell object.JPG : https://imgur.com/UT66C6D
    In most simple coding the .innerText Property could be used to get the actual information that we want, for example in the first cell , we see “Name “ :
    innerText.JPG : https://imgur.com/UT66C6D
    ( This would tend to suggest that the table we have found is , in fact the actual data, rather than the final output given by the WinUpdatesList software.
    Name at top left of data.JPG : https://imgur.com/yU4jd7Z
    The discrepancy in the total number of rows, I am not sure about at this stage
    )

    Brief description of Sub GetElemsText
    ( I have this Sub GetElemsText from Leith Ross : https://www.mrexcel.com/forum/excel-...ml#post4031122 )
    You will see that I do not do the simple code line of
    _____ Data(r, C)= oTable.Rows(r).Cells(C).innerText
    The code line I do have , will in the vast majority of situations do exactly the same:
    _____Call GetElemsText(oTable.Rows(r).Cells(C), Data(r, C))
    The routine Sub GetElemsText is given the HTMLTableCell object coresponding to the r C values and from that puts the innerText for that cell in the array element Data(r, C)
    It does it in a somewhat indirect way. Briefly in words: In the “html node tree structure” we have many nodes , and we can “go down” to the . ChildNodes(0) . ChildNodes(1) , ChildNodes(2) …. for as many as there are at the “next level down”. Even for the case of our simple cells we have two levels. The first does not refer to the level where the text is. The routine detects that and will keep “going down” until a node level corresponding to a text is found. This does not have much relevance for our situation, other than each Call of the routine results in it being done twice: the routine “Calls itself” which actually means that the routine pauses whilst a second copy of the routine is made. For our simple data example, the text is actually got on the second copy run of the routine. !!
    This general process of the routine “calling itself” is the classic recursion process. This takes some careful and long explaining and is better demonstrated with a more complicated cell example using a reduced test data sample, see here for example: #####
    Because of how this routine works, it does not actually get text from the run of the routine taking in the HTMLTableCell object, because the node associated with that is not a text. It is the “next node down” that is recognised as a text node !!
    In addition, a further extra function of the routine is that when it does get at a text node, it first does an additional test to see if text was present as the current running copy of the routine. This would be the case of something like a paragraph: Since text will have been added before we finally come back up all the levels of the routine calling itself. The new text will then be concatenated with a "~" so that we could have the possibility to re split this text later. The second copy run of the routine at each Cell will take the same empty string with it which was supplied as the next array element to be filled in the previous ( fist copy) run of the routine

    For the purposes of our requirement , the extra abilities of the routine should never be needed, but you never know… it does no harm to have it there , just in case….

    _.___

    '4b(ii)c Output from Array
    I have left this section fairly simple for now as I may come back to it later, and adjust once I have used the routine a few times



    _.________________________________________________ ________________________

    Here is an initial File. I will likely customise it and the coding in it the next few days
    WMI Query.xlsm : https://app.box.com/s/z15s0syizlct1nufhivl2yu4iupek8pi

    Here again is the links to the initial coding:
    http://www.excelfox.com/forum/showth...0896#post10896
    http://www.excelfox.com/forum/showth...0897#post10897
    http://www.excelfox.com/forum/showth...0898#post10898


    Last edited by DocAElstein; 01-10-2019 at 01:28 AM.
    ….If you are my competitor, I will try all I can to beat you. But if I do, I will not belittle you. I will Salute you, because without you, I am nothing.
    If you are my enemy, we will try to kick the fucking shit out of you…..
    Winston Churchill, 1939
    Save your Forum..._
    KILL A MODERATOR!!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •