Mass conversion from word to HTML
(This is actually a big book scanned into 1-file-per-page format)
I need to extract all the images. What do I do?
Python can do some automation using COM. (or something like that)
1 2 import pythoncom, win32com.client 3 4 app = win32com.client.gencache.EnsureDispatch("Word.Application") 5 6 doc = 'C:\\lang\\try\\bdham\\p1' 7 app.Documents.Open(doc + '.doc') 8 app.ActiveDocument.SaveAs(doc + '.html', FileFormat=win32com.client.constants.wdFormatHTML) 9 app.ActiveDocument.Close() 10 # now repeat with p2, p3, etc.
Actually, I should put it in a loop. But this non-loop version
is easier to read and remember.