Main Routine in support of these Threads Part 1
http://www.excelfox.com/forum/showth...0893#post10893
http://www.eileenslounge.com/viewtopic.php?f=21&t=31572
The coding is split into 2 parts to fil it into a Forum Post. But this and the coding in the next post form a single routine. That forms the main routine. In addition, a routine Called by the Main routine is required, Public Sub GetElemsText( ) , which is posted in the over next post.
Code:Option Explicit
Sub EP() ' http://www.excelforum.com/showthread.php?t=1148621&page=7&p=4452110&highlight=#post4452110
Rem 1)File Info
'Dim wsLkUp As Worksheet: Set wsLkUp = ThisWorkbook.Worksheets("Tabelle1"): wsLkUp.Activate
Dim strURL As String ' File with Page ' file:///G:/Excel0202015Jan2016/OffenFragensForums/eileenslounge/XP/Updates/report.html
Let strURL = ThisWorkbook.Path & "\Updates\" & "report.html" ' '"http://www.ernaehrung.de/lebensmittel/de/W233000/Fleischkaese.php" ' "http://www.ernaehrung.de/lebensmittel/de/W233000/PloppyPooFukYou"
' Application.Wait Now + TimeValue("00:00:02") '
Rem 2) '
'2a xmlHTTP stuff MSXML2.XMLHTTP.6.0 IXMLHTTPRequest Alan: "simple xml request here, so you could give URL a simple File of the HTML code" 'Dim Request As Object: Set Request = CreateObject("MSXML2.XMLHTTP") 'Late Inding https://msdn.microsoft.com/en-us/library/ms759148(v=vs.85).aspx
Dim request As MSXML2.XMLHTTP: Set request = New MSXML2.XMLHTTP 'Early Binding Requires --- TOOLS --- REFERENCES -- tick Microsoft XML, v6.0 http://www.mrexcel.com/forum/excel-questions/759592-help-createobject-msxml2-xmlhttp-macro.html
'Application.Cursor = xlWait'cursor disable..just to be on the safe side???
With request '(or With CreateObject("msxml2.xmlhttp"))'By virtue of GET this is a simplified "xml" request
.Open bstrmethod:="GET", bstrURL:=strURL, varasync:=True ' ("GET", strURL, True) 'just preparing the request type, how and what type. The second argument determines type. This may then require further info in next lines Only diferrence to pike's and Kyle's opening and sending stuff is argument:- Leith: "The True/False argument of the HTTP Request is the Asynchronous mode flag. If set False then control is immediately returns to VBA after Send is executed. If set True then control is returned to VBA after the server has sent back a response. I prefer to use asynchronous mode and test if my timeout period has expired to prevent the code from hanging due to an unresponsive server. In the example I provided I used synchronous mode to reduce the amount the code and keep it easier to understand."
'No extra info here for type GET ' ' '.setRequestHeader "DNT", "1"
'.setRequestHeader "Content-Type", "application/x-www-form-urlencoded" ' Content-Type is the property name, x-www-form-urlencoded is the value (content type in the html is "text/html" not "x-www-form-urlencoded" - that is something diifferent) You can have different request header properties and pass different values. This isn't unusual, just not required in this case When you POST data to a server, you need to tell it what format you are sending it in. So the Type of Content sent in the body of the request (the send bit) is application/x-www-form-urlencoded
.setRequestHeader bstrheader:="Ploppy", bstrvalue:="Poo"
.send ' varBody:= ' No extra info for type GET. .send actually makes the request
While .readyState <> 4: DoEvents: Wend ' Allow other processes to run while the web page loads. Think this is part of the True option
Dim PageSrc As String: Let PageSrc = .responseText ' Save the HTML code in the (Global) variable. ': Range("P1").Value = PageSrc 'For me for a print out copy to text file etc. The responseText property returns the information requested by the Open method as a text string
End With
Set request = Nothing ' This section is finished. We no longer need the Library. Optionally can therefore Set request = Nothing, a step most appropriate if required for some reason. Previous arguments of good practice to prevent memory leaks and data corruption appear outdated in favour of only using when a good reason is apparent to avoid masking when it is a good idea.
'_..EP2ab Explicit Pedantry. We intend using PagrSrc through a method to produce a model Object Orientated stylio for later use through use of its Methods and Properties. This model is frequently referred to as a Document Orientated Model, DOM. Some steps in this creation of the "DOM" can frequently be confused with the processes in '2a which are in fact now finished. Part of the .Send , "finishes all processes. We move on to '2b. Only PagrSrc is required to be "taken over" as it were
'2b DOM stuff' Make OOP type model of HTML code, using Microsoft HTML Office Library
'Dim HTMLdoc As HTMLDocument: Set HTMLdoc = New HTMLDocument 'Early binding - will not work with .Write:- Leith "This is a case where late binding has to be used. The htmlfile is an ActiveX object that is a wrapper function for the IHTMLDocument2 interface in MSXML2. This gets into a lot of low level system operation......." https://www.mrexcel.com/forum/excel-questions/367030-copy-table-website-into-excel-vba-2.html#post4031122 https://www.excelforum.com/excel-programming-vba-macros/1214789-late-binding-2.html#post4820307 'Early binding TOOLS >>> REFERENCES >>Microsoft HTML Object Library
Dim HTMLdoc As Object: Set HTMLdoc = CreateObject("htmlfile") 'Late Binding, ' Create an empty HTML Document.
HTMLdoc.Open 'EP2b(i) This clears the values in the HTMLdoc. Complete Explicit Pedantry. in usage outside VBA, Methods for an instance will often be required which require a clearing of an instance before "using". Approximately in VBA this can be considered putting the DOM back to as if it were at the point just before it is given "loaded" with the PageSrc String. Effectively in VBA doing a pair of Set = Nothing , with either a Dim and Create Dom or Set = New type code line It serves no purpose usually in VBA. Effectively we reset a situation back to as it is. It can however be used through .Open
HTMLdoc.Write PageSrc 'EP2b(ii). Convert the HTML code into an HTML Document Object Model, DOM 'give it somehow the info it needs to work further? ---- Fills the DOM HTML .. Wiki Dom http://www.excelforum.com/showthread.php?t=1148621&page=3#post4441761
'HTMLdoc.body.innerHTML = PageSrc ' Most people do that, but The Write method of an HTML file is designed to convert the page source text into an HTML DOM document. Both methods achieve the same results. The more common way Body of the Page Source code when converting it to an HTML DOM document oustside of VBA. Withiin VBA it just works harder to achieve the same. This excludes the Meta data, Java scripts, and Class information from being converted. Generally speaking, this information is not used when retrieving only text data from a web page.
HTMLdoc.Close 'EP2b(iii) _ 2 b or not in 2b , that was the ?? http://www.excelforum.com/showthread.php?t=1148621&page=6.. Briefly When used outside VBA, some processes started by .Open() can or should be finished after the corresponding outside VBA .write(). This is done using .Close(). Once again this can be used in VBA through .Close. It has no conceivable merit or known as yet reason to use it in VBA. Pike thinks it It closes the document you have just written. As such he describes it as optional. He would also not have the HTMLdoc.Open. Kyle thinks nothing is open. Leith uses it but has made no comment to Date. This may be just his style, like my EP's just not including the HTMLdoc.Open 'EP2b(i)
Rem 3
Rem 3a) Directly