The HTML text of the content – our stuff as seen in HTML coiding.
The last few postings seem to tell us that the content we are interested in, that is to say what we put in the document seems to lie in between the HTML text of
<div class=WordSection1>
And something of the form
<p class=MsoNormal><span lang=EN-US><o:p> </o:p></span></p>
</div>
Or
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal><o:p> </o:p></p>
<p class=MsoNormal><b><span style='color:#D99594;mso-themecolor:accent2;
mso-themetint:153'>X <span class=SpellE>x</span> <span class=SpellE>x</span> <span
class=SpellE>x</span> <span class=SpellE>x</span> <o:p></o:p></span></b></p>
</div>
The end point might not be too clear, but if we take all the text from <div class=WordSection1> to </div> then it looks like we will catch all of “our stuff”
In that text of our stuff we see 6 bits with a .jpg and 6 bits with .png
Comparing that with the typical files produced .._
_.. , and considering that the document has clearly 6 images.._
_.. we see that we appear to get for each image a .jpg and a .png
( By the way, We can see some info from Microsoft on this …. To extract embedded images from a Word document …… Save As ….. Web Page (*.htm; *.html)
Images will be extracted from the document and placed in the folder named <DocumentName>_files in the same location as the saved web page .
https://support.microsoft.com/en-us/...c-3eeb284af36b
http://web.archive.org/web/202103021...c-3eeb284af36b )
Macro to get this image info.
It’s not too difficult to write a macro to list those files
Example
VBA seems to recognise in the following macro type both .htm and .txt files as just long strings of text.
So I can do a macro to import that text string, then do some simple string manipulation to get the text bits looking like .jpg or .png file names
I will do that in the next post






Reply With Quote
Bookmarks