Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Mass conversion from word to HTML (See related posts)

I got 1000 word files. Each contains 1 main image and some decorations.
(This is actually a big book scanned into 1-file-per-page format)
I need to extract all the images. What do I do?

Python can do some automation using COM. (or something like that)
import pythoncom, win32com.client

app = win32com.client.gencache.EnsureDispatch("Word.Application")

doc = 'C:\\lang\\try\\bdham\\p1'
app.Documents.Open(doc + '.doc')
app.ActiveDocument.SaveAs(doc + '.html', FileFormat=win32com.client.constants.wdFormatHTML)
app.ActiveDocument.Close()
# now repeat with p2, p3, etc.

Actually, I should put it in a loop. But this non-loop version
is easier to read and remember.

You need to create an account or log in to post comments to this site.


Click here to browse all 4858 code snippets

Related Posts