mswordview.1()mswordview.1()NAMEmswordview - convert word 8 files to html
SYNOPSISmswordview [-v] [--version] [-n] [--nocredits] [-c] [--corehtmlonly] [-t sec‐
onds] [--timeout seconds] [-f points] [--defaultfontsize points] [-w type]
[--horizontalwhite type] [-u type] [--verticalwhite type] [-s url] [--symbolurl
url] [-p url] [--patternurl url] [-d url] [--wingdingurl url] [-h] [--ignore‐
headings] [-a] [--noannotations] [-m] [--mainonly] [-b] [--riskbadole] [-e]
[--nofontfaces] [-o filename] [--outputfile filename] [-g erroroutputfile]
[--errorfile erroroutputfile] [-y tabvalue] [--tabsize tabvalue] [-i dir]
[--imagesdir dir] [-j url] [--imagesurl url ] [-k] [--notablewidth] filename
DESCRIPTIONmswordview breaks the OLE word document into its component streams, and
then converts the document and its graphics to html.
OPTIONS
-v, --version
Output program version.
-n, --nocredits
Dont append credits at end of the html output.
-c, --corehtmlonly
Dont put <html> and </html> around output.
-f points, --defaultpointsize points
The base pointsize for mswordview is 10 (like ms word), you can
change this to a different size if you feel that your output is
too large, otherwise e.g a 12 point font becomes a html font+2,
which can look too big. An aside here... many of the files that
mswordview outputs are tagged as being in unicode, often this
turns out to be unnecessary, but theres no sure way to know
sometimes (short of examining every single character in advance
to see if it falls into the ascii range) if this header is nec‐
essary. Thus netscape will use a unicode font, as most european
readers wont ever have read a document in this font they wont
have customized the unicode base font size as they might have
done the western font size. So if you have set your usual lan‐
guage encoding fontsize away from the default, then do the same
for your unicode font, sorry about the long entry ;-)
-w type, --horizontalwhite type
attempting to convert formatting done in word with whitespace
such as space and tab is quite difficult. In html output theres
no easy way to get nice lined up text using spaces so white
space padding looks awful but of course so does no attempt to do
formatting. So i have given five options, the default type is 0
but i am beginning to think that 2 is the best option really.
0 convert runs of more than one space into hardcoded spaces i.e.
and convert tabs into a clear gif with width equal to the
tabsize option.
1 convert runs of more than one space into hardcoded spaces i.e.
and convert tabs into a run of 's
2 convert runs of more than one space into hardcoded spaces i.e.
but dont convert tabs into anything.
3 dont convert spaces into anything at all but convert tabs into
a clear gif with width equal to the tabwidth option.
4 dont convert spaces into anything at all but convert tabs into
a run of 's
5 dont convert spaces into anything at all and dont convert tabs
into anything at all.
-u type, --verticalwhite type
what to do with multiple line breaks is set here. There are
three options for type i.e
0 the default, a single line break becomes a <br>, but if theres
a run of more that one, then the first two are transformed into
a <p>, if theres more then they are outputted as <br>, the
intention here is to retain the meaning that word usually asso‐
ciates with two linebreaks, which is that thats the end of the
paragraph, but to be aware and support the fact that the users
of word often whack away madly at the return key to try and
force formatting decisions by that mechanism.
1 replaces each line break one for one with a <br>
2 replaces a single line break with <br>, and a run of more that
one (no matter how long) with a single <p>
-t seconds, --timeout seconds
time out after so many seconds
, useful if you use this as a web gateway, coz theres noone watching
the conversion process to
reaslize that iits gone into a busy loop. -s url, --symbolurl
url this is the url that will be used to find the gif pics that
are used for displaying the ms symbol font. Not the tidiest of
solutions for the problem, but it works.
-d url, --wingdingurl url
this is the url that will be used to find the gif pics that are
used for displaying the ms wingding font. Not the tidiest of
solutions for the problem, but it works.
-p url, --patternurl url
this is the url that will be used to find the background pat‐
terns that msword can use as backdrops for cells of a table,
this is hardly the most important of msword features, but theres
always someone bleating for some feature that appears ridiculous
to me to be included, so here this one is in all its glory.
This dir is also used for any extra graphics that mswordview
might use, e.g. the clear gif optionally used for tab.
-h, --ignoreheadings
dont convert msword heading types into html heading levels,
sometimes users use heading types inappropiately, if the user
used heading types but changed the attributes to make the head‐
ing type inappropiate for html heading levels, use this option.
-a, --noannotations
By default mswordview will output annotations, but msword itself
doesnt print annotations when outputting to paper, so to not
have them included use this option.
-m, --mainonly
With this option then no footers or headers are shown.
-b, --riskbadole
With this option on then mswordview will attempt to decode files
whose ole tables are corrupt, more than likely the broken word
file will crash mswordview, and crash it hard.
-e, --nofontfaces
With this option set mswordview wont insert fontface tags, as it
stands fontfaces are on by default, but this feature is alpha so
it is only supported for ascii based languages (i.e western
european only) and then only under certain conditions , as it is
suprisingly difficult to be sure which of a few choices is the
correct font to use otherwise.
-o filename, --outputfile filename
set the filename to place output in, use - as the filename to
output to standard output (the screen). The default is that out‐
put is put into a file the same name as the input file with a
.html ending. Any graphic files created have the same prefix as
this file.
-g filename, --errorfile filename
set the filename to place error messages in. The default is the
stderr (the screen)
-y tabvalue , --tabsize tabvalue
specifies either the amount of pixels of indentation that a tab
should be translated into, or the amount of hard spaces to
replace one with, multiples of 8 only work in the second case.
read the horitontalwhite entry to understand which one will get
used. Pixels is the default measurement. This is messy because
tabs are obviously messy things under html, and wed all be bet‐
ter off if they didnt exist at all, but we live in a world where
they get used for indentation, and worse, alignment, which youll
basically just be damn lucky if you see any hint of that in the
html output :-) Tabs basically just dont work.
-i directory, --imagesdir directory
Specifies the dir into which the graphics will be saved into,
the default is the same dir that the html file is placed in. If
you use this but intend to move the graphics before viewing the
html information, or for some other reason you want the html to
link to the graphics with some custom img src url then use
--imageurl in conjunction with this
-j url, --imagesurl url
Specifies the url in which the graphics from the word doc can be
found, the default is the same dir that mswordview put the
graphics itself.
-k, --notablewidth
With this on, table widths are not specified.
BUGS
I appear to have gone a little mad on the number of command line
options, i have only 4 letters left l,q,x & z. Some of these options
arent really needed, i dont use any of them myself :-)
mswordview can be incredibly slow when a document is fastsaved and has
many tables.
MORE INFORMATION
More information may be got at http://www.gnu.org/~caolan/docs/MSWord‐
View.html or http://skynet.csn.ul.ie/~caolan/docs/MSWordView.html
SEE ALSOlaola(1), lls(1), elser(1), catdoc(1), word2x(1)AUTHOR
Caolan McNamara
WWW: http://www.csn.ul.ie/~caolan/
Mail: Caolan.McNamara@ul.ie
mswordview.1()