This Document looks best in as CSS aware browser

W2CSS:
Translating MS Word Documents
To CSS Compliant HTML

Table Of Contents

 

Overview of Version 2 Features and Changes

W2CSS is an HTML translator that creates CSS compliant HTML from MS Word documents.

Version 2 now includes translation of Word tables, creation of character level styles, creation of hyperlinked Tables of Contents from Word’s TOC fields, and handling of embedded and linked objects. The program is shareware and for unregistered users get use of a limited feature set (ie it is not time-limited).

W2CSS is not a general purpose translator for any Word document. It’s probably most useful to those translating major sections of text from Word into HTML who wish to have clean, easy to read HTML that is tagged to simple CSS style definitions.

The table of contents feature makes it easy to create Hypertext tables of contents, providing that the Word document uses styles appropriately.

 

Overview of the translator program

The W2CSS translator is a Visual Basic program that runs within MS Word. It takes an MS Word document as input and produces an HTML file as output. The HTML generated is relatively simple, and every element in the HTML (lists, paragraphs, headings) is tagged with a CSS “class”. These classes are created by translating the style definitions in the Word document into equivalent CSS class definitions.

In short, W2CSS:

 

Speaking generally, the translator attempts to create as exact an HTML replica of your Word document as possible (within limits). It translates style definitions and preserves style tagging. This can be of great benefit to those who wish to use CSS styles but have no decent editor in which to create and tag styles to HTML text. The HTML that is generated is meant to stand on its own, or it can be processed further through other HTML tools.

Who will appreciate this program:

This is not a program for the naïve word user. It expects a certain level of expertise and understanding about the Word program. To fully maximize the translator, you should be familiar with both paragraph and character level styles; also familiarity with Word’s Outline view and its implications, some understanding of the general concept of Field codes, Captions, Table of Contents and Tables of Figures is also helpful.

If you don’t utilize styles in Word...

If you are a Word user who mostly utilizes direct formatting and doesn’t take much advantage of styles in your documents, you may not see much benefit from this translator. If, on the other hand, you understand and regularly utilize styles in your Word documents, and wish to use CSS in your HTML documents, this translator may be of great benefit.

What is “direct formatting”?

Many first time and naïve users are unaware that by default, every Word paragraph is tagged to some style definition. All paragraphs in the standard (Normal) Word document are tagged with the style “Normal”. Users need not be aware as they work, that paragraphs are attached to underlying styles.

Direct formatting is the font and paragraph formatting that is applied “on top of” the underlying style. Word allows direct formatting on an as-you-go, willy-nilly basis. This can result in a confusing mix of global, style-based formatting together with local, direct formatting. Or, with self-discipline, it can produce consistent formatting—much is up to the individual user.

More experienced Word users who are style aware, usually know when formatting is a result of globally defined style as opposed to the local, “direct” formatting.

What this program requires of you, the Word user

This program requires a certain “self-discipline” when using Word, in that you must use styles somewhat strictly in documents that are to be translated. In short, you should refrain from direct formatting (see explanation of direct formatting) except for making words bold, italic or into hyperlinks (see explanation of character formatting). While you are not required to do this, you will see more accurate results from the translator if you do.

Tutorials Available

A series of Tutorials is available on the Internet at www.g-foods.com/styles/ that assist Word users in learning about styles These tutorials cover not only Word styles, but also explain CSS styles.

A note about mapping Word documents into CSS compliant HTML

Word is a near “state of the art” word-processing program. It allows users to create quite complex looking documents. HTML doesn’t offer as rich a set of possibilities. You should bear in mind that while the mapping of a Word document into HTML is, on the surface, not complex, the actual details can get sticky. For best results in using the translator, keep your use of styles straightforward. And generally, refrain from too much in the way of formatting acrobatics.

For the W2CSS translator to preserve document “structure” you must encode structure using Word’s conventions. This is most easily accomplished by utilizing Word’s built-in styles: Headings 1 through Heading 9, the List Styles, and a few others ...

This user manual was created as a Word document and translated into HTML using the W2CSS translator. (see About This Document)


A look at what the translator does

NOTE: The following example uses some features only available in the registered version.

Example One

In the example following, an original Word document (shown in Figure A), is translated into CSS compliant HTML. Figure B lists a portion of the HTML generated; Figure C, the CSS styles generated. Other figures illustrate how the resulting HTML is rendered in both a CSS aware browser (Figure D) and a non-CSS aware browser (Figure E)

 

Figure A: Word document for Example One

Figure A

 

Figure A shows the document as it appears in Normal View in MS Word. The style name area is open so that you can clearly see the styles attached to each paragraph.

Notice that some paragraphs are tagged with a style named “htmlCode.” Using this reserved stylename you can interlinearly include HTML code into the Word document. On translation, these lines will pass directly into the HTML output.

Figure B is an excerpt of the HTML BODY created in the translation. Figure C is an excerpt of the CSS style definitions generated. In the BODY, each HTML element is connected to a CSS style definition via its “class” attribute. For instance, the first element in the body is a level one heading, <H1>, attached to CSS class “.Heading-1” via the attribute “class=”Heading-1”.

The translator also creates an additional style definition:

.Heading-1 STRONG {font-weight: normal;}
 

This “contextual selector” for STRONG is created because the style for .Heading-1 is formatted in Word as BOLD. In the same way that Word treats bold and italic as toggles, the translator creates a definition which you can later use to control how STRONG text within .Heading-1 elements will appear (see the topic Reversing EM etc).

A number of <SPAN> elements are apparent in the HTML listing (Figure B). Each of these corresponds to a portion of the Word document that’s tagged with a Word character style. Although W2CSS doesn’t fully translate the hierarchy of Word’s styles, character styles are understood as “laying atop” paragraph styles, and so inherit the characteristics of their underlying paragraphs.

An instance of this is the text encoding the cookie prices. Notice that this text is colored maroon. A close inspection of the HTML shows that the maroon color change comes from the class “.price-char”. This class is tagged to HTML using the <SPAN> element. The class itself only expresses color change:

.price-char {
  color: maroon;
  }
 

If you compare the HTML generated by the W2CSS translator with that generated in Word’s own HTML conversion (when you pick “Save as HTML” from the File menu), you will find quite a difference. For starters, you should notice the lack of <FONT> tags in the W2CSS output. Also, you’ll find that in the Word generated HTML, there’s no mention of Headings (H1, H2 etc). This is a critical loss of information about the structure of the document. In contrast, the W2CSS translator maps Word styles “Heading 1” and “Heading 2” as <H1> and <H2>, respectively.

Most probably, the reason that Word’s own HTML translation creates such “messy” HTML is that it’s meant to accommodate the lowest common denominator. That is, since many Word users don’t use styles or, if so, use direct formatting in addition to styles, Word’s translation makes an attempt to capture as much direct formatting as possible. As noted above, if you wish to use the W2CSS translator to advantage, you’re urged to forego all direct formatting except for bold and italic (see what this program requires of you).

Figure B: HTML code generated by the translator for the above Word document

<BODY>
<H1 class="Heading-1"><IMG SRC="G1.GIF" ALT="G! FOODS: "> PRODUCTS</H1>

<PRE class="Normal"> </PRE>

<H2 class="Heading-2">Biscotti</H2>

<P class="explanation">These cookies are made from whole organic* brown rice flour, and are sweetened with  Sucanat&reg;.</P>

<hr>
<H3 class="Heading-3"><EM>Almond Biscotti</EM></H3>

<P class="ingred-para"><STRONG><SPAN CLASS="ingred-char">Ingredients:  </SPAN></STRONG>Organic* Brown Rice Flour, Sucanat&reg;, Whole Eggs, Maple Syrup, Almonds, Spices</P>

<P class="price-para">One LB Bag: <SPAN class="price-char">$11.25 ea </SPAN>&#151; 1/2 LB Bag: <SPAN class="price-char">$5.90 ea </SPAN></P>

<hr>
<H3 class="Heading-3"><EM>Chocolate-Chip Almond Biscotti</EM></H3>

<P class="ingred-para"><STRONG><SPAN CLASS="ingred-char">Ingredients: </SPAN></STRONG>Almond Biscotti (above) with the addition of Chocolate Chips</P>

<P class="price-para">One LB Bag: <SPAN class="price-char">$12.25 ea </SPAN>&#151; 1/2 LB Bag: <SPAN class="price-char">$6.25 ea </SPAN></P>

<PRE class="Normal"> </PRE>

<HR SIZE=12 WIDTH="100%">
<H2 class="Heading-2">Butter Cookies</H2>

<P class="explanation">Made with ghee (clarified butter), a blend of brown and white rice flours and plain old white sugar, this cookie is low in lactose (ghee has most of the butter's milk solids removed). 
</P>
 
 

<P class="Body-Text">The odd backward P symbol <IMG SRC="sty-t1P.gif" ALT="P Mark"> known as the "paragraph mark", usually is simply <EM>tolerated </EM>by most Word users. One of these is deposited in your document every time you hit Enter (the Return key). Life would be simple if a paragraph mark was simply equivalent to what WordPerfect users used to call a "hard return". </P>

 

In this example, the CSS styles generated are fully scalable. That is, the font sizes are expressed as percentages of the parent font (the font of the HTML BODY element). Similarly, margins are expressed in EM units, which are also scalable.

Another thing to notice are the entity translations. The register mark is translated to entity “&reg;”. The long dash, which doesn’t have an entity name, is translated as “&#151;”. (there may be a recent definition but it’s not yet well supported) The translator provides options to control translation of each of these entities (an improvement over version 1 of W2CSS).

Close observation will tell you also that blank paragraphs are rendered here as <PRE> elements holding one blank character. This featured is controllable via a translation option. Another way to do this is to encode empty paragraphs with &nbsp. But this was found to yielded bad results in the Lynx browser.

In a CSS aware browser, the HTML appears close in appearance to the original Word document (Figure D). In a non-CSS browser (Figure E), the text is still readable, with the structure of headings and sub-headings clear. (The <HR> elements were inserted to clarify the document’s appearance in such browsers.)

Figure C: Excerpt of CSS stylesheet generated from (above) Word document

.Heading-1 {
  font-family: Verdana, Arial, Helvetica, Sans-Serif;
  font-size: 300%;
  color: maroon;
  font-weight: bold;
  text-align: left;
  margin-top: 0.33em;
  margin-bottom: 0.08em;
  margin-right: 3.39%;
  margin-left: 0%;
  }

.Heading-1 STRONG {font-weight: normal;}

.Normal {
  font-family: Times New Roman, Arial, Helvetica, Sans-Serif;
  font-size: 83.33%;
  color: black;
  text-align: left;
  margin-right: 0%;
  margin-left: 0%;
  }

.Heading-2 {
  font-family: Verdana, Arial, Helvetica, Sans-Serif;
  font-size: 183.33%;
  color: black;
  font-weight: bold;
  text-align: left;
  margin-top: 0.55em;
  margin-bottom: 0.14em;
  margin-right: -0.18%;
  margin-left: 0%;
  }

.Heading-2 STRONG {font-weight: normal;}

.explanation {
  font-family: Georgia, Arial, Helvetica, Sans-Serif;
  font-size: 91.67%;
  color: black;
  text-align: left;
  margin-right: 0%;
  margin-left: 0%;
  }

.Heading-3 {
  font-family: Verdana, Arial, Helvetica, Sans-Serif;
  font-size: 100%;
  color: red;
  font-weight: bold;
  text-align: left;
  margin-top: 1em;
  margin-right: 0%;
  margin-left: 0%;
  }

.Heading-3 STRONG {font-weight: normal;}

.ingred-para {
  font-family: Georgia, Arial, Helvetica, Sans-Serif;
  font-size: 83.33%;
  color: black;
  font-style: italic;
  text-align: left;
  margin-right: 0%;
  margin-left: 0%;
  }

.ingred-para EM {font-style: normal;}

.ingred-char {
  font-family: Verdana, Arial, Helvetica, Sans-Serif;
  color: black;
  font-weight: bold;
  }

.price-para {
  font-family: Verdana, Arial, Helvetica, Sans-Serif;
  font-size: 83.33%;
  color: teal;
  font-weight: bold;
  text-align: left;
  margin-right: 0%;
  margin-left: 0%;
  }

.price-para STRONG {font-weight: normal;}

.price-char {
  color: maroon;
  }

.nutr-para {
  font-family: Verdana, Arial, Helvetica, Sans-Serif;
  font-size: 83.33%;
  color: black;
  text-align: left;
  margin-right: 0%;
  margin-left: 0%;
  }
 
 

Figure D: View of the HTML document in a CSS compliant browser (MSIE4)

Figure D

 
 

Figure E: View of the HTML document in a non-CSS compliant browser (Netscape Navigator 3)

Figure E

 

Example Two

In this example, the same Word document presented in Example One, above, is modified with the addition of a Table. An automatic Table of Contents is generated as well.

Figure F: Word document for Example 2

Figure F

 

Figure F shows the document as it appears in Normal View in MS Word. This document uses a table to create a strip of yellow down the left-hand edge. Also, a special TOC (Table of Contents) field code is visible.

The output HTML appears in a CSS-aware browser in Figure G, and in a graphical non-CSS aware browser in Figure H.

Seeing a document in text only version is a good test of how structurally strong the HTML is. Figure I shows the text version of the document in the Lynx browser. Since the original Word document properly utilized Heading styles, the resulting HTML conveys this structure using HTML heading styles, and the document structure is clearly evident in text only Lynx browser.

The hypertext table of contents is here created from Word’s TOC field. The Word TOC field code in this example (“TOC \t “Heading 2,1,Heading 3,2””) specifies that Headings 2 and 3 will be used to create TOC levels 1 and 2 respectively. The translator creates the hypertext TOC as a bulleted list (an option setting allows for non-bulleted TOCs).

Figure G: View of the HTML document for Example 2 in a CSS aware browser (MSIE4)

Figure G

 

Figure H: View of the Example 2 HTML document in a non-CSS aware browser (Netscape Navigator 3)

Figure H

 

Figure I: View of the HTML document in the DOS LYNX browser.

Figure I

 
 

How to install the translator

NOTE: The W2CSS translator requires you to have Word for Office 97 (also known as Version 8 of MS Word) installed on your computer. aCSShtmlTranslator, the translator program, is actually a Word macro written in Visual Basic for Word; older Word versions don’t contain this language and so cannot run the translator. The macro resides in the template W2CSS.dot. To use the macro, you must first install the template on your computer. To install the template:

  1. Place the file W2CSS.zip into a temporary directory on your computer and run it. This file is a ZIP archive and requires WINZIP or a similar tool to extract it. After un-zipping it, you will find the following files in the temporary directory:
    W2CSS.dot
    W2CSSfnt.csv
    W2CSS.cfg
    W2CSSdoc.doc
    W2CSSdoc.zip
  2. Locate the Templates directory for your Word installation. For typical users this can be found at:
    c:\Program Files\Microsoft Office\Templates
  3. Place the following 3 files into the Templates directory:
    W2CSS.dot
    W2CSSfnt.csv
    W2CSS.cfg
  4. The installer basically copies all the necessary W2CSS files into your current Templates directory (For typical users this can be found at c:\Program Files\Microsoft Office\Templates). If you wish to change the options, you can control which files will be copied.
  5. The documentation file, W2CSSdoc.zip is a ZIP archive that contains the file W2CSSdoc.html in addition to a number of GIF files. To expand the archive file, place it where you wish the W2CSS documentation to reside and expand it using WINZIP ; after that you will be able to browse the documentation in either a CSS or non-CSS browser.
 

Doing a translation

In order to use the translator, you must open it from within Word.

REGISTERED USERS can use the translator to convert any Word document.

UN REGISTERED USERS are limited to translating only the active Word document. For them, the following instructions (re attaching a document to the template, and creating a document based on the template) are given.

Attaching the template to an existing document

One way to translate any existing document is to attach the template W2CSS.dot to that document. Here’s how:

  1. After opening the document you wish to convert, go to Templates and Add-Ins which is found under the Tools menu.
  2. The first time you do this, you will need to go into the Add button. There you should be able to locate the template W2CSS.dot that you placed in your Templates directory when you installed the translator.
  3. The template, once on the list of Add-Ins, is attached to the current document by checking the box next to its name in the list labeled “Checked items are currently loaded.”

This sounds a lot harder in writing than it actually is in practice.

Once attached to the active document, (the topmost open document is called the “active document”) the converter is called by choosing Tools, (on the menu bar) then Macro, then “Macros...“. This brings up a list of all available macros. Choose “aCSS2htmlTranslator” and then click Run and you should see the opening dialog.

Creating a new document based on the Styles Converter template

Once you’ve installed the translator template, you can get to it at the time you create a new document. Choosing New from the File menu, Word presents you with an array of templates, among which you should find W2CSS.dot. Pick this and your document is linked to the styles converter.

If you don’t find W2CSS.dot listed as an available template, check under Tools, Options, File Locations and make sure that the location for User Templates points to where you installed the template.

Once attached to the active document, the converter is called by choosing Tools, (on the menu bar) then Macro, then “Macros...”. This brings up a list of all available macros. Choose “aCSShtmlTranslator” and then click Run and you should see the opening dialog.

 

Motivating Factors

My intention with Version 2 of W2CSS is to enhance the feature set of version 1 to include

 

These goals are built on the initial intentions of the translator, as stated in the Version 1 documentation, which are

 

This program started out as a fun project in using Visual Basic and is still a “work in progress.” I urge users to get in contact with me if they have suggestions or bugs to report.

Hopefully, this low-cost translator will allow more people to publish books, reports, articles and the like as HTML documents. The ideal of free access to all people is a goal some cherish regarding the Internet.

Unfortunately, I must ask for a donation of money ($) in order to allow you the full use of this program — the rent must be paid and the wolf kept from the door. If you are cash strapped or a non-profit institution helping low-income (for instance), write me regarding your needs and a donation can possibly be arranged. For those not wanting to pay for registration, the translator still offers a number of useful features. Also, it is not time-limited as much shareware these days is.

Originally a simple language, HTML today has been stretched to the limits by the introduction of physical formatting elements. CSS aims to address and correct a number of these limits of HTML.

VB can be hell

After over 6 months of intensive use of VB for Applications (VBA), I can tell you that, it’s not always a “pretty story” there under the hood. Weird things happen. If you are an experienced Word and/or Win95 user (and put up with the daily system crashes and lock ups), I’m sure you know what I’m talking about — there almost daily kind of quirky things you run into with these systems; the kind of stuff that prompts such book titles as “Windows 95 Annoyances” In other words, there are bugs. Windows in general is now getting so complex that as some commentators have remarked, it may well implode on itself if Microsoft adds yet more complexity.

Maintaining equivalent print and HTML documents

Another major motivation in this project was the need to maintain equivalent print and HTML documents. This is actually a very thorny problem once you get into it. Hopefully, CSS when it comes to full fruition will resolve current problems in this area.

Buggy is as buggy does

If you think MS office 97 is buggy for the external user, you’ve not seen buggy. From the inside, all kinds of expected results don’t appear as you’d expect. It’s a machine with still a bunch of loose wires hanging about on the inside.

Disclaimer

I am in no way linked to Microsoft Corporation or any of its subsidiaries or contractors; nor is this project related to Microsoft Corporation.

 

Program features

Version 2

See the first part of this document for a review of features new to Version 2. If you find bugs, odd anomalies, or would like to suggest improvements or changes, please contact me at the address listed at the end of this file.

Files included

W2CSS.dot

This Word document template file contains the translator macro “aCSShtmlTranslator”.

W2CSSfnt.csv

This plain text file is a comma delimited list of fonts used in the translation process (see Support for Font Substitution).

W2CSS.cfg

This plain text file contains configuration settings for the translator (see section on the configuration file).

W2CSSdoc.zip

This contains documentation for the translator in CSS compliant HTML. The documentation file, W2CSSdoc.zip is a ZIP archive that contains the file W2CSSdoc.html in addition to 10 GIF files. To expand the archive file, place it where you wish the W2CSS documentation to reside and expand it using WINZIP or PKUNIP; after that you will be able to browse the documentation in either a CSS or non-CSS browser.

W2CSSdoc.doc

This is the Word file from which the file W2CSSdoc.html was created. It is included as an example how the translator actually works.

Other documentation files may be included in the package you download

 

The translator interface

Opening dialog

The opening dialog appears as shown below.

Figure J: The opening dialog to W2CSS, the translator program

Figure J

 

NOTE: Registered Users will have access to all the features discussed here. Un Registered Users will only be able to translate the active document and will not be able to perform batch processing.

“Word Document to Translate” textbox:
The full pathname of the source Word document that is to be translated. If you don’t know the source name, you can use the Browse button to find a file.

‘Active Document” checkbox:
When checked, will default the translation to the Word document that was active at the time the translator was called.

“CSS Stylesheet” textbox:
If you wish to create a linked stylesheet, this textbox should contain the full pathname locating where the stylesheet will be created. By default, it will be in the same folder as the HTML output file. The browse button next to this box allows you to shop for a location.

“Linked Stylesheet” checkbox:
If unchecked then no separate stylesheet will be created.

“HTML Output Name and Location” textbox:
The full pathname of the destination HTML document that is to be created. By default, this is located in the same folder as the source Word document. The browse button next to this box allows you to shop for a location.

“HTML Title” textbox:
This is the title of the HTML document to be created. If you’ve not created a name for your Word document (under File Menu, Properties, Title), then you may need to fill in a name here.

The HTML document title is culled from the Word document title, and can be changed in this dialog. (see section document title)

 

Figure K: The second tab of opening dialog, which allows for batch processing,.

Figure K

 

The second tab of the opening dialog allows you to process batches of documents. A batch can be specified in two different ways: 1) as a list of document names stored in a file, or 2) as a list of document names collected together in the dialog.

“Translate the documents listed in this File” textbox:
The filename specified in this box is expected to contain a list of Word document names. The translator will open this file and process each name listed. See ___ for details of the conventions for this batch file.

“Translate the documents listed below” textbox:
In this part of the dialog, you can add Word document names to a list. The browse button allows you to shop for files. Or you can type in a name and just click the Add button.

If you choose the Translate button when this tab is showing, the program will act on either the list you’ve typed in, or it will open the document that contains a list of filenames. Which will be processed depends on whichever choice is specified (by way of the option buttons).

The options dialog

NOTE: Registered Users will have access to all the features discussed here. Un Registered Users can only set translation options by editing the file w2css.cfg.

The options dialog allows you to configure how the translator will convert the Word document into HTML.

The various options are grouped into sub-categories, as shown, below. For a complete discussion of each translation option, see the further sections of this document.

 

Figure L: The General Tab of the Options Dialog

Figure L

 
 

Figure M: The Characters Tab of the Options Dialog

Figure M

 
 

Figure N: The Measures and Fonts Tab of the Options Dialog

Figure N

 
 

Figure O: The Tables Tab of the Options Dialog

Figure O

 
 

Figure P: The TOC and Captions Tab of the Options Dialog

Figure P

 
 

Registration dialog

The registration dialog appears as shown in Figure Q.

This dialog gives information about registering and allows the user to input registration information. Once registered, the initial appearance of the About box goes away and all references to registration disappear.

You are kindly urged to register. Sending $15 in check or money order (no cash or credit cards) with a SASE (self addressed stamped envelope) wil result in your getting a code that will remove the “nag” dialog and other refeences to registering the program. You will also have access to the full features of the translator.

If you Registering you won’t have to pay for upgraded versions of this program. In other words, your key will work on versions beyond Version 2.

Figure Q: The registration dialog

Figure Q

Program speed

The W2CSS program is not necessarily a fast program. This is primarily due to the fact that it’s running as interpreted Visual Basic (VB) Code within the Office 97 environment. VB also lacks the numerous facilities that a language such as C provide that allow programmers to condense statements and increase CPU efficiency. It is hoped that future versions will be better optimized for speed. The program was tested on both 120 MHz and 200 MHz Pentium machines and, although slower on the first, was still considered acceptable on both

 

Specifics of HTML translation

The translator program (referred to as W2CSS throughout this text) endeavors to preserve as much of a Word document’s style information as possible for two intended audiences: users with CSS compliant browsers and those without. To accomplish this, the translator maps Word objects into HTML elements, which isn’t, on the surface, a difficult problem. The specifics of which Word elements are mapped into particular HTML elements is discussed below.

In addition to mapping Word objects into HTML, the translator creates CSS class definitions that are equivalent to Word style definitions. The translator also tags HTML elements with CSS classes. This means that all HTML elements output by the translator will be tagged with some class (the exception is LI elements, which get their formatting from the enclosing UL or OL element).

Special “reserved stylenames” are used for controlling the behavior of the translator. They’re also used to add HTML and/or to affect the output CSS style definitions. These features are provided for the not faint-of-heart who wish to try special effects.

 

Headings

Word’s built-in heading styles

Word has 9 built-in styles named “Heading 1”, “Heading 2” etc... up through “Heading 9.” The first 6 of these correspond almost exact to HTML’s H1 through H6, and the translator treats them as such. Word’s level 7, 8, and 9 headings are all folded into H6.

Table 1: Word Heading Styles mapped to HTML and CSS

Word Style

HTML Element

Attached CSS Class

Heading 1

H1

.Heading-1

Heading 2

H2

.Heading-2

Heading 3

H3

.Heading-3

Heading 4

H4

.Heading-4

Heading 5

H5

.Heading-5

Heading 6

H6

.Heading-6

Heading 7

H6

.Heading-7

Heading 8

H6

.Heading-8

Heading 9

H6

.Heading-9

 

Styles named H1 through H6

Word styles named H1, H2, H3, H4, H5, and H6 will be translated into corresponding HTML elements as follows:

Table 2: Word Styles H1 to H6 mapped to HTML and CSS

Word Style

HTML Element

Attached CSS Class

H1

H1

.H1

H2

H2

.H2

H3

H3

.H3

H4

H4

.H4

H5

H5

.H5

H6

H6

.H6

 

Preformatted text

The HTML PRE element allows text to appear on a web page as typed in; meaning that carriage-returns and spaces will be recognized by the browser. Most browsers interpret PRE using a monospaced typeface such as Courier.

W2CSS will translate a paragraph into an HTML PRE element if the paragraph is tagged with a style named “PRE” or “Preformatted” or “Preformatted and more text” where and more text can be anything the user wishes. For instance, a Word style named “Preformatted ONE” will become an HTML <PRE> element attached to the class “Preformatted-ONE”.

The class definition attached to PRE elements has the white-space property set to “PRE”.

In tagging PRE to a series of lines, best result are obtained by using Word’s manual line break (shift + enter) instead of hitting Enter at the end of each line. (In Word, the Enter key creates a new paragraph mark, which will be translated into a new <P> element in the output HTML).

Tabs

Tabs are generally not recognized in HTML except when they occur within a PRE element (although no exact tab stops can be specified).

If an element is a PRE type, then tabs will be passed through into the html output. If the style is not pre, then tabs will be interpreted depending on the setting nonPreTabs

If nonPreTabs is set to “asSpaces” then tabs will become spaces, otherwise, they’ll be passed through into the HTML.

The ADDRESS element

In keeping with the desire to create semantically sensible HTML, any Word paragraph tagged with a style named “ADDRESS” will be translated into an HTML <ADDRESS> element. The address element usually appears at the end of an HTML document and can assist Web robots in making better sense of a Web page.

The BLOCKQUOTE element

Any Word paragraph tagged with a style named “Blockquote” will be translated into an HTML <BLOCKQUOTE> element. .

Lists

HTML provides 2 basic list styles: ordered lists (tag OL) and unordered lists (tag UL). These are the most popular; there are some other tags that act like lists. Word provides 2 classes of lists that correspond, respectively, to OL and UL, and these are Numbered Lists and Bulleted Lists.

The translator recognizes lists by reading the name of the style tagged to a Word paragraph.

Word’s built-in list styles

Word provides a number of built-in list styles. These include the following style names:

List

List Number

List Bullet

List 2

List Number 2

List Bullet 2

List 3

List Number 3

List Bullet 3

List 4

List Number 4

List Bullet 4

List 5

List Number 5

List Bullet 5

 

W2CSS translates the built-in styles depending on the words used in the style name. Those styles named List Number become HTML ordered lists (OL). Those styles named List Bullet become HTML unordered lists (UL). Styles List, List 2, List 3, etc become HTML unordered lists (UL). The CSS list type (such as decimal, lower-roman, upper-roman, lower-alpha, upper-alpha) is determined from reading the Word style definition.

List Indents

Because the browsers tested compound the indents of nested lists (since technically a nested list is an element that’s “within” another, parent element).the translator suppresses the margin setting for second level and further nested list levels

Continued list numbering

Word permits ordered lists to continue numbering, even when interrupted by non-list type paragraphs. This is not an option in HTML.

Paragraphs

The HTML Paragraph element (<P>) is used for all document elements that don’t qualify as any of the other HTML elements previous discussed (Headings, Preformatted text, Address, Blockquote, or Lists.

Empty Paragraphs

The way that empty paragraphs (Word paragraphs that contain no text) are translated is controlled by the option nonBreakingSpaces. The default case is for empty paragraphs to be expressed as PRE elements since this works best with all browsers tested including Lynx.

User defined stylenames

Word users are free to create their own styles and name them as they wish. However, Word paragraphs will map into HTML elements depending on the name of the style attached to a paragraph. This should be clear with reference to the built-in styles, discussed above.

However, the translator further allows users to control how a paragraph maps into HTML by allowing for style names that contain portions of the built in names. For instance, a style named “Heading 1 Mine” will map into an <H1> element linked to class Heading-1-Mine. The general principle is most succinctly explained by reference to the VB code that recognizes the HTML element type:

      For i = 1 To 9
         aName = UCase(.Styles(wdHeadingStyles(i)).NameLocal)
         If Left(UCase(aRangeStyle), Len(aName)) = aName Then
            setAsHeading i, Trim(str(i))
            setAsWordHeading
            Exit Function
         End If
      Next i
 

This basically says that the stylename attached to a Word paragraph will be searched to see if it contains a built-in stylename. The search requires a match from the left end (start of the string) and is case insensitive.

Version 2 now searches based on VBA’s constants for Word’s styles. These change from language to language. Hopefully, this method will facilitate use of W2CSS in languages other than English.

Internationalization Issues with built-in styles

As mentioned above Version 2 of the translator now checks style names based on VBA’s constants for Word’s styles. These change from language to language. This method should facilitate use of W2CSS in languages other than English.

Tables

REGISTERED USERS will be able to translate Word tables into HTML tables. Some restrictions apply, as discussed below. (For NON-REGISTERED USERS, the translator ignores tables and translates the contents of cells as paragraphs.

Kinds of Word Tables

 

Figure R shows examples of these.

W2CSS can handle the first two kinds of tables well. The third it cannot. The reason it can’t handle non-regular tables is that from Visual Basic, tables with vertically merged cells are reported inaccurately. (Programmer’s note: MAXINT was reported for cell heights of vertically merged cells. Emails to Microsoft and user groups, yielded no help.)

See Translation Options for Tables for tabular presentation of table specific options.

Regular and Uniform Tables

These tables are the most common and simple type of table. They consist of no merging of cells across columns or rows (in HTML parlance, no spanning of cells). Each columns contains the same number of cells as any other column; each row the same number of cells as other rows.

Regular but with horizontally merged cells


These tables can have variable numbers of cells in each row, resulting in column spanning.

Non-Regular, with vertically merged cells


These tables, as mentioned, are not properly handled by the translator.

Figure R: Various Types of Tables

Figure R

Increased translation time with tables

Processing tables involves a greater amount of time since more VBA calls are made, each call slowing down the translation.

Indents and Alignment

The translator creates a special <DIV> tag for tables that are indented. And for centered tables, special <CENTER> tags are generated in conformance with standard HTML practice.

Details

Most commonly, tables are located either at the left, center or right of the browser window — by default, to the left. Word allows you to specify left, center and right (under Tables, Cell Heights & Widths, Row, Alignment). The translator first checks whether the table is center or right aligned. If so, it outputs an corresponding align attribute. If it’s left align and zero indent then no align attribute is output. If its left align but indented, then a DIV element surrounds the table with a style attribute set properly to indent the table from the left edge of the window.

For centered tables, the <CENTER> element is placed around the <table> tags; this allows backward compatibility with Netscape’s way of handing centered tables (tests verified this)

Deriving the Borders and inside borders of HTML Tables

The outside border property of the HTML table derives from the setting of the VB variables Borders.OutsideLineStyle and Borders.OutsideLineWidth; If the linestyle is not NONE, then the width value is set from outsidselinewidth.

The inside borders derive from the VB variables Borders.InsideLineStyle and borders.InsideLineWidth; However, for HTML tables, if there is no outside border, then there is no inside one.

Breaks at the end of cells

Each cell is treated as consisting of one or more paragraphs. The last paragraph of a cell always closes with a <BR> element. This not only gives better results in current browsers (version 4’s) but also better results in Lynx.

Column properties

Widths can be as a percent or as a number of pixels.

Rows marked in word as heading rows (by highlighting and picking Headings from under the Table menu) will be marked with the TH element instead of the TD element. Many browsers now recognize these tags.

Row properties

Row height

Although this isn’t supported in all browsers, the HEIGHT is set in pixels (this is where the points per pixel configuration setting comes into play); since this is non-standard, you have the option of turning this on and off in the translation. (see translation option tableRowHgtAttribute)

Table Captions

The translator will create Table Captions using the caption that occurs immediately before the table. Specifically, this is a paragraph in Word’s built-in Caption style. This feature can be turned off with translation option makeTableCaptions.

Images

Linked And Embedded Graphics — OLE Technology

Version 2 of the translator is stronger with regard to translating images. By default, the translator only recognizes linked GIF and JPEG graphics. This option that causes translation of linked and embedded objects is available under the General tab of the Options dialog, via the check “Process Linked & Embedded Objects”.

Word allows the user to instance images in documents in a number of ways including as inline images and as floating (above the text layer). Images can also be “embedded” from other applications. For instance, you can embed an Adobe Illustrator image or a Visio image directly into your Word document, without the need to save these objects as separate documents. You also have the option of “linking” graphics into your Word document. Both linking and embedding are part of Microsoft’s OLE (Object Linking & Embedding) technology. Although W2CSS doesn’t itself contain code that directly translates images, it does call upon Word’s own built-in capabilities to do so.

Warning — Increased translation time

Translating Linked and Embedded Objects will slow down your translation markedly. This is because a lot of extra stuff is done behind the scenes. For those of you that are curious, the following describes what occurs when you choose to translated embedded objects:

  1. The file will be saved to a temporary name
  2. All floating shape objects in this file will be changed to inline, and all frames will be removed.
  3. The file is then saved using Word’s own “save as HTML”, causing translation of all embedded and linked graphic objects into GIF equivalents. Once created, W2CSS refers the HTML to these GIFs.
  4. The all inline-graphic file is then re-opened and translated. This is so that all previously floating graphics can be located inline.
  5. After translation, the original file, if it was open prior to translation, is re-opened.
 

Generally OLE works fairly well. However, you may run into trouble. I have although it’s hard to put my finger on exactly what causes the crash or lockup. Basically, my observation is that OLE really slows down the whole computer... you are forewarned!)

 

For non-registered users, the only inline GIFs and JPEGs will be recognized.

The translator recognizes images that meet the following criteria

Pathnames

If the image IS NOT in the same directory as the Word file, the translator will add the pathname to the image tag.

If the image IS in the same directory as the Word file, the name will be added with no path. It is highly advised that you put all images in the same directory as the source Word file.

ALT text for images

It’s generally agreed that decent HTML contains IMG references that include alternate text . This allows not only for disability access to a site, but also provides clarity for web surfers who have graphics turned off.

In W2CSS, ALT text for images can be specified via three different methods:

1. ALT text using style htmlcode with directive #ALT=””

You can specify alt text by including a paragraph in style htmlcode immediately before the image. The following example should clarify:

Example:

The alt text “P Mark” will be applied to the inline image shown highlighted with handles (see below). The resulting HTML is shown below.

 

Figure R

 
<H3 class="Heading-3">Paragraphs in Word</H3>

<P class="Body-Text">The odd backward P symbol 
<IMG SRC="sty-t1P.gif" ALT="P Mark"> known as the "paragraph mark", 
usually is simply <EM>tolerated </EM>by most Word users. ...</P>
 

2. ALT text using a private field

You can specify ALT text by creating a private field in the Word document. Follow the pattern shown in the example below.

This method follows Word’s own convention (this is evident if you open an existing HTML document, and save it as a Word document. You’ll see that ALT text has been turned into private fields.

Example:

{PRIVATE "TYPE=PICT; ALT=P Mark"}
 

To create a private field, go to the Insert Menu, pick Field and then, within ALL Categories, pick PRIVATE. Before OKing the dialog, be sure to type, with quotes, the text “TYPE=PICT;” and “ALT=alt text” where alt text is the alternate text for the image in question. This field must appear in the Word document before the occurrence of the actual inline image.

3. ALT text using the caption style

You can specify alt text by creating a Word caption (tagged with the style name “Caption”), and placing it somewhere ahead of the image for which it carries the ALT text. The various caption settings control use captions. If you use captions in your Word document solely for the purpose of creating image ALT text, you can set the Caption style to invisible and it shouldn’t affect your document’s output.

How to insert linked inline GIFs and JPEGs

To insert a GIF or JPEG image inline

The instructions below detail inserting an image using standard Word menu commands. The translator provides an extra macro that, when called, guarantees inserting only GIFs and JPEGs as inline. (See below).

 

Special insert inline image macro

In an effort to make insertion of images easier, the translator includes a special “inline image insertion command”. This command is available via the macro insertInlinePic that comes with the template W2CSS.dot. When you call insertInlinePic, you will get Word’s Insert File dialog. The settings for “Link to File” “Float over text” and “Save with file” may appear on or off — unfortunately this is misleading. At this writing, I’ve had a hard time controlling these settings from VB, and it seems to be a problem with VB. Nonetheless, whatever image you pick will be inserted as linked to the file and not floated over the text, and also, not saved in the file. This is guaranteed (even though the dialog makes it seem like it’s not the case) because after you make a selection in the dialog, the information is routed through VB code that controls how the image is inserted into the Word document. In addition, going through this button will disallow insertion of any images that don’t have extensions GIF, JPG or JPEG.

 

Character Styles

Word provides what are called character level styles. These are styles tagged to strings of characters as opposed to being attached to paragraphs.

CHARACTER STYLE TRANSLATION IS AVAILABLE ONLY REGISTERED USERS: The translator will recognize character level Word styles and create equivalent CSS Classes for them. Using the HTML <SPAN> element, these classes are attached to selected strings in the HTML. You can control whether character styles are translated via the Options dialog.

If you are using a non-registered version, the translator disregards all of Word’s character level styles.

Font size for Character styles

The percentage value font size of a character style is built on the size of the parent; that is, the % value is a percentage of the parent’s font size. Also, for character styles, the font size can only be expressed either as a percentage or as pts. The “absolute” scale doesn’t make much sense here since the translator is basing the size on the size of the parent.

Character styles only apply on a word by word basis. (see below for more discussion)

 

Bold Italic Text, and hyperlinks can be considered a special case of character-level formatting . See discussion below. These character level formatting types are recognized whether or not you are using a REGISTERED copy of the translator.

Other Character Formatting (not character styles)

Word-by-word basis

The translator works through paragraphs on a word-by-word basis, not a character-by-character basis. This limitation means that bold, italic, hypertext elements, and character styles will only be recognized on word boundaries. The main reason for this limitation is speed. The alternative, of processing the Word document on a character-by-character basis would be intolerably slow, given the already sluggish performance of VB.

(As to what exactly a “word” is, the translator defers to whatever VBA’s internal structures define “a word” to be).

Turning off word-by-word processing

The translation options paragraphsOnly allows you to control the way the translator works its way through your document. By default, W2CSS moves through each paragraph on a word-by-word basis. If you don’t care about processing character information (character styles, EM, STRONG, hyperlinks, or inline images), then you can change the setting paragraphsOnly to true or yes. See configuration file settings and the specifics of the setting paragraphsOnly).

EM and STRONG

In Word, the equivalents to HTML’s EMPHASIS and STRONG are (as usually interpreted) italic and bold (respectively). W2CSS maps all strings in bold into the HTML <STRONG> element, and all those in italic into the HTML <EM> element.

Reversing EM when a whole paragraph is emphasized

In cases where a style definition specifies all italic text (that is, the class definition states font-style as italic) the translator creates an additional CSS class for EM that reverses the otherwise italic text. This is in keeping with the way Word treats italic — as a toggle that allows for emphasized text in an already italic paragraph to be non-italicized.

W2CSS accommodates the same reversal for STRONG, or bold text, within a paragraph tagged with a style specifying all bold text (for which font-weight is bold).

Hyperlinks

Hyperlinks are standard fare on Web pages but only recently introduced into the latest versions of Word. Word hyperlinks translate into the HTML anchor (<A>) element. Word allows you to create hyperlinks to both Web addresses (via a specified URL) and also to bookmarks within a given document.

Translation option makeHyperlinkStyle controls whether a SPAN element will be created for hyperlinks.

Creating a hyperlink

To create a hyperlink:

  1. Highlight the text that is to appear to the user as the link
  2. Pick Insert from the menu bar, then Hyperlink
  3. Then enter either a URL name or an already defined bookmark. (Bookmarks allow you to jump to a position within a file).

Using images as Hyperlinks

To use an image as a hypertext link:

  1. Insert the inline image as per the instructions above.
  2. After the image is inserted, highlight it and pick Insert, Hyperlink. Then proceed as for normal insertion of a hyperlink (see above)

HTML document title

The content of the HTML <TITLE> element is gotten from the Title property of the Word document. This can be set by picking File (on the menu bar), and then Properties, then the Summary Tab, under which will be found Title. More advanced users, can take advantage of the configuration settings generateHTMLtitle used in conjunction with the reserved stylename htmlHead.

Special HTML characters

This area of the translator is significantly improved over version 1.

Ian Graham’s HTML Sourcebook is an excellent source for a concise discussion of this issue.

HTML documents are ASCII text file descriptions that, when interpreted, become fully formatted quasi “desktop published” pages. To communicate various “special characters”, a conventional set of “entity references” has been established. These entity references stand in for otherwise non-standard characters such as long dash, curly quotes, copyright, registration mark, etc.

The entity references supported by the translator are those beginning at ASCII 160 and ending with ASCII 255.

Table 3: ISO Latin-1 HTML entity reference supported by W2CSS

ASCII Number

Entity Ref.

Comment

160

&nbsp;

the way nbsp is treated depends on the setting of the ___ option

161

&iexcl;


162

&cent;


163

&pound;


164

&curren;


165

&yen;


166

&brvbar;


167

&sect;


168

&uml;


169

&copy;

see option ___ for more details

170

&ordf;


171

&laqno;


172

&not;


173

&shy;


174

&reg;

see option ___ for more details

175

&hibar;


176

&deg;


177

&plusmn;

see option ___ for more details

178

&sup2;

see option ___ for more details

179

&sup3;

see option ___ for more details

180

&acute;


181

&micro;


182

&para;


183

&middot;


184

&cedil;


185

&sup1;


186

&ordm;


187

&raquo;


188

&frac14;


189

&frac12;


190

&frac34;


191

&iquest;


192

&Agrave;


193

&Aacute;


194

&Acirc;


195

&Atilde;


196

&Auml;


197

&Aring;


198

&AElig;


199

&Ccedil;


200

&Egrave;


201

&Eacute;


202

&Ecirc;


203

&Euml;


204

&Igrave;


205

&Iacute;


206

&Icirc;


207

&Iuml;


208

&ETH;


209

&Ntilde;


210

&Ograve;


211

&Oacute;


212

&Ocirc;


213

&Otilde;


214

&Ouml;


215

&times;

see option ___ for more details

216

&Oslash;


217

&Ugrave;


218

&Uacute;


219

&Ucirc;


220

&Uuml;


221

&Yacute;


222

&THORN;


223

&szlig;


224

&agrave;


225

&aacute;


226

&acirc;


227

&atilde;


228

&auml;


229

&aring;


230

&aelig;


231

&ccedil;


232

&egrave;


233

&eacute;


234

&ecirc;


235

&euml;


236

&igrave;


237

&iacute;


238

&icirc;


239

&iuml;


240

&eth;


241

&ntilde;


242

&ograve;


243

&oacute;


244

&ocirc;


245

&otilde;


246

&ouml;


247

&divide;

see option ___ for more details

248

&oslash;


249

&ugrave;


250

&uacute;


251

&ucirc;


252

&uuml;


253

&yacute;


254

&thorn;


255

&yuml;


 

For ASCII codes between 128 and 159 (inclusive) the translator generates a numeric character reference. So, for instance, Word’s long dash, which is ASCII character 151, will be output as “&#151;”. (a new entity has been defined for long dash but may not yet be widely supported)

Special Fixes for characters

Even with the existence of the entity reference convention, support varies. So, for instance, the registered mark, ASCII 174, which is represented by entity “&reg;”, will not appear as a register mark in the Lynx browser. Similarly, Lynx doesn’t understand “&nbsp;”.

The translator addresses a number of these problems by offering translation options that create plain ASCII equivalents to a number of commonly found word processing characters. So, for instance, Word users who regularly place “smart quotes” in their documents (also called “curly quotes”) can specify that the translator substitute the plain straight double quote (the one that looks like an inch mark) for curly quotes.

A complete set of these options is listed below. Registered Userscan set these via the Options dialog.

Table 4: Provision for translating special characters

Character name

Special character

ASCII #

Plain ASCII Substitution

Translation option

Ellipsis


133

...
[3 periods]


convertEllipsis

Curly single quotes

‘ ’


145, 146

' '


convertSmartQuotes

Curly double quote

“ ”


147, 148

" "


convertSmartQuotes

En dash


150

-
[one hyphen]


convertENdash

Em dash


151

--
[2 hyphens]


convertEMdash

Trademark


153

(tm)


convertSpecialMarks

Copyright mark

©


169

(c)


convertSpecialMarks

Registered mark

®


174

(r)


convertSpecialMarks

One-quarter fraction

¼


188

1/4


convertMathSymbols

One-half fraction

½


189

1/2


convertMathSymbols

Three-quarters fraction

¾


190

3/4


convertMathSymbols

Times symbol

×


215

x
[lowercase x]


convertMathSymbols

Divide symbol

÷


247

/ [forward slash]


convertMathSymbols

 

In short, many of the above options undo Word’s “Smart formatting”.

Characters that are always converted

It’s understood that the following characters will be converted to their respective entities in order to avoid problems in most browsers’ handling of HTML:

   <     becomes      &lt;
   >     becomes      &gt;
   &     becomes      &amp;

These translations are suppressed in outputting special htmlCode paragraphs (see below).

 

Provision for placing other HTML and other CSS codes in documents

The translator provides support for adding other HTML and CSS codes into the output HTML. Using certain “reserved” stylenames, you can tell the translator to insert HTML code into the Head or Body of the output document. You can also tell the translator to insert CSS statements before or after the generated class definitions. The reserved style names are:


Each of these reserved stylenames is discussed below. Another section of this document also discusses reserved stylenames.

#include directive

An additional provision, new with version 2, is the special keyword “#include”. When this directive is placed as the first text of a paragraph in the any of the styles htmlHead, htmlCode, htmlBody, cssBefore, or cssAfter, followed by a filename in quotes, then the text in the filename will be send directly into the appropriate HTML output. These files cannot be Word files, but must be pure ASCII text files.

Example

The following text

#include �c:\styles\include-botA.txt�

in line tagged as style htmlcode will result in the contents of file c:\styles\include-botA.txt being sent directly into the html output stream.

This directive allows you to interleave huge portions of html, javascript ,etc into your documents

Stylename htmlHead—Adding other codes to the <HEAD> of the HTML document

If you wish to include other HTML codes in the HEAD of the HTML output, you can do the following:

Any paragraphs tagged with the Word style “htmlHead” (case insignificant) will be passed directly through the translator and into the <HEAD> of the output HTML. This feature is especially helpful for maintaining documents that have META elements and the like in the <HEAD> area.

SPECIAL PRECAUTION: Text tagged with style htmlHead must appear only at the very top of the document. As soon as paragraphs in styles other than htmlHead are detected, lines tagged with htmlHead will be ignored. This limitation keeps the translator from having to do a second pass through the Word document.

htmlHead Example

Given the Word document shown in Figure S, the translator creates the code shown in Figure T. The paragraphs tagged with style “htmlCode” pass directly into the HTML output.

Notice in this example that an HTML <TITLE> element is being inserted in the head area. The W2CSS translator also will generate a TITLE element. To avoid this conflicting situation, use the configuration setting generateHTMLtitle to tell the translator to NOT generate a title element.

Figure S: Word document for HTMLHEAD example

Figure S
 

Figure T: HTML output for Word document shown above example

   <HTML>
   <HEAD>
   <TITLE>Selling A Boat</TITLE>
   <META CONTENT="FOR SALE, BOATS">
 
 

Stylename htmlCode—Adding other codes to the <BODY> of the HTML document

If you wish to include other HTML codes in the BODY of the output HTML, you can do the following: (htmlCode is the same, or synonymous with, stylename htmlBody)

Any paragraphs tagged with the Word style “htmlCode” (case insignificant) will be taken as HTML codes. The translator will pass any text in these paragraphs directly through and into the output HTML.

Smart Quotes (curly single and double quotes) will be straightened out in any paragraphs tagged as htmlCode. However, text within quote marks will not be effected because these objects can be special filenames and the like. (For instance win95 allows for curly quotes in filenames.)

 
 

htmlCode Example

Given the Word document shown in Figure U, the translator creates the code shown in Figure V. Notice that the paragraph tagged with style “htmlCode” appears in the HTML output, which, when viewed in a browser, results in the appearance of a Horizontal Rule (HR). Notice also that the style “htmlCode” is hidden text, so it won’t interfere with the printing of the Word document. You can format the style htmlCode any way you want, (hidden or not), without affecting the effect of style htmlCode.

Figure U: Word document for HTMLCODE example

Figure U

 

Figure V: HTML output for Word document, above

   <H1 class="Heading-top">
   <EM>W2CSS: </EM>A File Converter </H1>

   <HR>
   <P class="BodyText1">
   Table Of Contents</P>
 
 

Stylename htmlBody

If you wish to include other HTML codes in the BODY of the output HTML, you can use this reserved stylename. It is actually an alias name for reserved stylename htmlCode.

 

Stylename cssBefore—Inserting CSS statements before those generated by the translator

If you wish to include other CSS statements in the STYLE area, before the class definitions, you can do the following:

Any paragraphs tagged with the Word style “cssBefore” (case insignificant) will be passed directly through the translator and will appear before the class statements generated for each Word style.

cssBefore Example

Given the Word document shown in Figure W, the translator creates the code shown in Figure X. Note that paragraphs tagged with style “cssBefore” are placed between the preface codes and the class definitions generated by the translator. The reason that cssBefore text is placed after the preface codes is so that you can override them if you wish.

Figure W: Word document for example of “cssBefore” style

Figure W

 

Figure X: The stylesheet generated for the above Word document

Figure X

 

Stylename cssAfter—Inserting CSS codes after those generated by the translator

If you wish to include other CSS codes in the STYLE area after the class definitions, you can do the following:

Any paragraphs tagged with the Word style “cssAfter” (case insignificant) will be passed directly through the translator and will appear after the classes generated for each Word style. Placing style codes here will allow you to override or augment the style instructions generated by the translator. This can create unusual effects such as text floating over other text, as shown in the following example.

cssAfter Example

Given the Word document shown in Figure Y, the translator creates the code shown in Figure AA. The result, when viewed in a CSS-aware browser is shown in Figure Z. Note that paragraphs tagged with style “cssAfter” are placed after all the class definitions. In this example, W2CSS translates Word style “whatAre” into CSS class “.whatAre”. Adding the line .whatAre {margin-top: -2em} using the reserved stylename cssAfter results in augmenting the definition of class whatAre. The result is the floating effect shown in Figure Z.

Figure Y: Word document for “cssAfter” example

Figure Y

 

Figure Z: View of “cssAfter” example in CSS-aware browser

Figure Z

 

Figure AA: CSS created from Word document shown above example

Figure AA

 
 
 

Word styles and CSS equivalents

Speaking generally, it can be said that Word styles map somewhat easily into CSS styles. The W2CSS translator assumes a certain “point of view” regarding which Word properties map into which CSS properties. Basically, the translator recognizes all paragraph and character level styles, otherwise ignoring localized “direct formatting”.

Creating CSS class names from Word style names

This is a relatively easy task. One small problem is that Word styles are allowed to have spaces in their names, whereas CSS class names cannot. The translator works around this by substituting the dash character (‘-‘) where spaces are used in the Word style name (this is consistent with CSS naming conventions, e.g. “border-color”). So, for example, Word’s “Heading 1” style becomes the CSS class name “Heading-1”. If the Word user creates a style called “Heading-1” and this name has already been used (because “Heading 1” has already occurred in the document), then the translator will name the CSS style “Heading-1-A”. If “Heading-1-A” has already been used in the Word document, then the translated name becomes “Heading-1-B”. Unless the user is hell-bent on crashing W2CSS (which is possible), this scheme should resolve namespace problems.

 

Paragraph Styles

Word paragraph style properties fall into groups of properties corresponding to the way Word itself groups these properties (see Figure BB). These include

Figure BB: Word paragraph style formatting properties

Figure BB

 

Character Styles

(NOTE: Character styles will not be translated in UNREGISTERED VERSIONS OF W2CSS)

Word character style properties fall into groups of properties corresponding to the way Word itself groups these properties. These include

 

Character styles are almost a subset of paragraph styles. The translator treats character styles as “overlaid” on top of underlying paragraph styles. Although W2CSS sidesteps the issue of dealing with Word’s style hierarchy (the ‘based on’ aspect of styles), the CSS equivalents to character styles specify only those characteristics that reflect the properties specified by Word. (In other words) Character styles are expressed using the HTML <SPAN> element. In this way, they inherit the properties of their underlying elements much as Word character styles build upon underlying paragraph formatting.

In the following section, these characteristics are discussed in relation to their CSS equivalents.

A note about scalability

For all of the CSS properties below (except font size, color and border specifications) the translator provides the following scalability options:

Fonts

Font sizes can be expressed either as


Percentages and relative values are both scalable; Points is a fixed measure that isn’t scalable. In tests using MSIE4 it was found that percentages and relative values are scalable. In tests using Netscape Navigator 4, all three font measures are scalable. The CSS specification promises only the first two to be scalable.

Left and right margins

Left and right margin sizes can be expressed either as

Other measures

Other measures, including border widths, padding, space before and after, etc, can be expressed either as

 

Font level formatting in MS word

Font-size

A Word style’s font size, specified in points, translates directly into the CSS font-size property.

Percentages

If the translator’s fontMeasures option (which is a configuration setting) is set to percentages, then all font measures are figured as percentages of the font size of parent element. For all classes generated by W2CSS, the parent element is the <BODY> element. W2CSS generates a statement at the beginning of the style definitions section which sets the default body font to 12 points. This number can be changed using the configuration setting defBodyFontSize.

“Absolute”

If the translator’s fontMeasures option (which is a configuration setting) is set to “absolute”, then font sizes are translated into the CSS “absolute” scale: xx-small, x-small, small, medium, large, x-large, xx-large. See CSS sources for a more thorough explanation of this “absolute scale.”

“in Points”

If the translator’s fontMeasures option (which is a configuration setting) is set to “inPoints”, then font sizes are expressed as point values..

Font-family

A Word style’s font name translates directly into the CSS font-family property. However, CSS provides the option for font family to be given as a list of font names, and includes the possibility for naming a generic family (such as serif, sans-serif, etc). W2CSS accommodates this capability via the additional text file W2CSSfnt.csv. See support for CSS font substitution for details on this feature. Also, the W2CSS translator provides the option to choose a default generic family if a listing for a font is not otherwise given in the file W2Cssfnt.csv. (see the fontSubstitution configuration option)

Font-style

The CSS font-style property is equivalent to Word’s font italic style. The option for oblique is not handled by the translator.

Font-variant

The CSS font-variant property is equivalent to Word’s font effect, small caps. (This CSS property is not yet supported by many browsers).

Color

A Word style’s font color translates into the CSS color property. Word provides 16 built-in names that roughly correspond to standardized CSS color names. The only differences are that Word provides 2 shades of gray (gray25 and gray50). By default, W2CSS maps gray50 to CSS gray and gray25 to CSS silver. W2CSS provides a configuration option for users to change which colorname or hex number they would prefer to have output for the various Word built-in colors. See color equivalences in the discussion of configuration settings.

The following table lists Word internal color names and the matching CSS color names that are output by the translator

 
    CSS name   Hex #     Word internal name
    ========   =====     ==================
    aqua       00FFFF    wdTurquoise
    black      000000    wdBlack
    blue       0000FF    wdBlue
    fuschia    FF00FF    wdPink
    gray       808080    wdGray50
    green      008000    wdGreen
    lime       00FF00    wdBrightGreen
    maroon     800000    wdDarkRed
    navy       000080    wdDarkBlue
    olive      808000    wdDarkYellow
    purple     800080    wdViolet
    red        FF0000    wdRed
    silver     C0C0C0    wdGray25
    teal       008080    wdTeal
    white      FFFFFF    wdWhite
    yellow     FFFF00    wdYellow

Word-spacing

CSS’s word-spacing property is not handled by the translator.

Letter-spacing

The CSS letter-spacing property is equivalent to Word’s character-level letter spacing setting.

Text-transform

Word only offers the setting all caps. If set, then the CSS text-transform property is set to capitalize. Otherwise it is set to none.

There is a small anomaly in the translator regarding this property in that once the text is set to all caps, it is output to the HTML file in all caps, not in a mixture of upper and lowercase.

Text-decoration

The CSS text-decoration property is translated depending on Word’s font level settings for strikethrough, double-strike-through and underline. The CSS overline and blink properties are not handled by W2CSS. Word’s strikethrough and double-strikethrough both fold into the CSS property line-through.

Vertical-align

CSS’s vertical-align property is derived from either the super or sub script property of a character, or, if characters are “raised” or “lowered” by a specific number of points, then this is translated to a percentage value.

Paragraph Level Formatting in MS Word

Line-height

The CSS line-height property is roughly equivalent to Word’s paragraph level line spacing property. Word offers a number of line spacing options and only one of these, the at least setting, is not handled by the translator. The others are handled as follows:

Word Line Spacing INTO CSS Line Height

Word Value

CSS Value

Single

becomes

Normal

1.5

becomes

1.5

2

becomes

2

Multiple

becomes

A Number (3, 3.5,Or Whatever)

Exact

becomes

A Value in Ems or Pts

 

Text-align

The CSS text-align property is translated directly from Word’s setting for paragraph alignment (left, center, right, justified).

Text-indent

The CSS text-indent property is translated from Word’s setting for paragraph first line indent.

Margin-left, margin-right, margin-top, margin-bottom

The CSS margin properties are translated from Word’s left and right paragraph indents and from settings for space before and space after. The left and right indents become the CSS left and right margins; the space before and space after become the CSS top and bottom margins, respectively.

W2CSS defaults to outputting left and right values as percents of the page width or your Word document. (This is a configuration setting; see the leftRightMargins setting). You are advised to experiment with margins and see what the various results will be.

A very sticky problem is calculating the left and right margins of a nested list. In the browsers tested, these margins are apparently calculated as offsets from the margins of enclosing lists. Thus, for list styles that are used as nested lists, the CSS style definition created compensates for these effects.

Padding-left, padding-right, padding-top, padding-bottom

The translator handles CSS padding as roughly equivalent to the space between a paragraph border and the paragraph text (in Word, this is set in the Borders and Shading dialog, by choosing the Options button). If there are no borders on a paragraph, no padding will be indicated by the translator in the output CSS style.

Border formatting (paragraph and character level) in MS Word

The translator now handles both character and paragraph level border formatting. Below

Border-left, Border-right, Border-top, Border-bottom

Word offers a different set of border options than does CSS. The following chart summarizes how these have been mapped into CSS equivalents. Bear in mind that many browser currently don’t handle this property well.

 
   Word internal name                 CSS border style
   ==================                 ================
   wdLineStyleDot.......................dotted

   wdLineStyleDashSmallGap
   wdLineStyleDashLargeGap
   wdLineStyleDashDot
   wdLineStyleDashDotDot
   wdLineStyleDashDotStroked............dashed

   wdLineStyleDouble
   wdLineStyleTriple
   wdLineStyleDoubleWavy................double
   
   wdLineStyleThinThickSmallGap
   wdLineStyleThickThinSmallGap
   wdLineStyleThinThickThinSmallGap.....double
   
   wdLineStyleThinThickMedGap
   wdLineStyleThickThinMedGap
   wdLineStyleThinThickThinMedGap.......double
   
   wdLineStyleThinThickLargeGap
   wdLineStyleThickThinLargeGap
   wdLineStyleThinThickThinLargeGap.....double

   wdLineStyleEmboss3D..................ridge
   
   wdLineStyleEngrave3D.................groove
   
   All other borders....................solid
 

Background-color

The CSS background-color setting for the classes the translator generates is derived from the setting of paragraph or character shading. Besides offering standard color constants (see the list of color constants), Word offers shading options in 2.5% increments.

If you specify a shade percentage, then you get a hex number for the background color; if you choose one of Word’s 16 preset colors, you get a color name. The color names can be changed by using the color equivalences configuration setting.

Document level settings in MS Word

Document margins

Page margin settings for the active Word document, (available under File, Page Setup), are translated into margin values for the HTML BODY element. The measuring system used depends on the configuration setting leftRightMargins, which can be as percents, ems or points. The top and bottom margins for the Word document are ignored; you are advised to create space at the top and bottom of the document in other ways.

Background color

Background colors other than the 16 predefined constants shown in the chart (above) are not currently handled by W2CSS. Instead, no background color will be specified.

Prefaces: zeroing elements

From empirical testing with MSIE4 and Netscape Navigator 4, it was found that a more accurate rendition of CSS equivalents to Word styles results if the base HTML elements in the document are first set to zero. By default, the W2CSS program outputs a “preface” that zeros the margins for the following HTML elements: <P>, <H1> through <H6> and <ADDRESS> (see configuration setting zeroHtmlElementMargins). Another configuration setting allows for the output of preface that will zero out the margin settings for <BODY> element (see zeroHtmlBodyMargins).

Reserved Stylenames

Certain stylenames are interpreted by the translator NOT as links to Word style definitions, but rather as denoting special instructions to the translator. Just as many programming languages utilize “compiler directives” to tell the compiler how to do its job, W2CSS uses special “reserved stylenames” to communicate to the translator. The following are reserved stylenames:

Using reserved stylenames is not required to achieve good results with the translator. They are provided for those who wish to have more control over the translator’s behavior.

Figure CC: An example using various “reserved stylenames”.

Figure CC
 

Tables of Contents

W2CSS now includes creation of automatic hyperlinked Tables of Contents for the same reason that Word itself does: to reduce the tedium of manually creating such lists.

Word allows users to create automatic Tables of Contents and Tables of Figures. A number of the variations are discussed below. W2CSS treats a number of these variations, but not all of them.

The varieties of Tables of Contents and Tables of Figures

A Word Table of Contents (TOC for short) is created by including a field code into the Word document. The following are examples of the TOC field code variations handled by the translator:

TOC \O

Outline Level TOC

Creates a TOC that lists all lines tagged with the reserved OUTLINE styles — namely “Heading 1” through “Heading 9”

TOC \O “1-2”

Outline level TOC

Creates a TOC listing OUTLINE styles but limited to levels 1 and 2 — namely “Heading 1” and “Heading 2”

TOC \T “Topic1, 1, Topic2, 2”

Style based TOC

Creates a TOC based only on the named styles and maps those line into the TOC levels specified. In this case, Topic1 styled paragraphs map into TOC level 1, Topic2 paragraphs into level 1

TOC \C “Figure”

Caption based TOF

Creates a TOF based on Captions that are labeled with the word “Figure”. Entries in the TOF will include the label, e.g. “Figure 3: List of Objects”

TOC \A “Table”

Caption based TOF

Creates a TOF based on Captions that are labeled with the word “Table”. Entries in the TOF will not include the label, e.g. “List of Objects”

 

TOC Limitations

TOC’s (Tables of Contents) and TOF’s (Tables of Figures) only find headings and captions from the point where they’re located in the document downward. This is different than the way Word works. This could be considered a disadvantage. For instance if you’re using a Heading 1’s for the title of the document but don’t want the title to appear in the TOC.

Because of problems with VB internals, complex TOC expressions such as \O “2-2” \t “Heading 3,3” don’t work — only the \O code is recognized.

Bulleted TOCs and TOFs

Via the Options dialog (TOC tab) you can control if you want TOCs and TOFs to be bulleted. The default is for them to appear as bulleted lists.


Configuration settings

Translation options, also called configuration settings in this manual, allow you to control the particular aspects of the translation process. Registered users can control most of these settings from the Options dialog (see pictures of interface dialogs). Non-registered users will have to resort to editing the configurations setting file W2CSS.cfg. This file contains keywords that describe and control aspects of the translation.

Another way to control translation is by instancing various configuration setting keywords throughout your document using the reserved style #directive (you can also use style w2cssSetting, which has the same effect). See below for details.

The configuration file, W2CSS.cfg

The W2CSS translator allows the user to control its behavior via configuration settings. These setting are found in the file W2CSS.cfg, located in the user’s Templates folder. (see installation procedure regarding placing files in the Templates directory). This is a plain text file that can edited by Window’s Notepad or any other text editor.

The translator will work even if the file W2CSS.cfg is not present. In that case, the settings that are marked as default, below, will be in effect. If W2CSS.cfg is present, then by editing it, you can customize the translator’s behavior.

Technical aspects of the configuration file

Another method of controlling translation options

Oftentimes, you may want to use special configurations settings when you process a particular Word file. For instance, you might want to just process the HTML and not generate any CSS classes (see example below, which demonstrates this). The W2CSS translator allows you to embed configuration settings in the text of a Word document To do this, use the reserved stylename “#directive” (previously “w2cssSetting”) as follows:

Reserved stylename #directive (previously w2cssSetting)

[In Version 2, reserved name w2cssSettingis replaced by the name #directive. Both are supported and are synonymous.]

Any paragraph tagged with the stylename “#directive” will be understood by the translator as containing instructions that control a translation option . The line will be parsed and a match attempted on the text. If the text contains a keyword that matches a translation option keyword, that setting will be modified accordingly.

SPECIAL NOTE: Certain configuration settings only work from the configuration file and will have no effect when placed inline in a document: for example, you cannot tell the translator to start processing embedded objects from within the Word doc since this procedure initiates earlier than the opening of the document.

w2cssSetting Example 1

Given the Word document shown in Figure DD, the translator generates the HTML shown in Figure EE. The settings at the top, generateStyleDefs and tagClassesToHTML, both set to false, result in suppressing all class and style information and related attributes. The inclusion of the LINK element in an htmlHead paragraph, results in inclusion of the LINK statement in the output HTML. In this case, any style information will be controlled by the CSS definitions in file stylesheet1.css.

Figure DD: Top lines of a Word document that contains an embedded option settings. The settings shown generate plain HTML with no class information and no CSS styles. A LINK line connects the document to an existing stylesheet.

Figure DD
 

Figure EE: HTML generated with the above option settings. Note the LINK element that attaches the page to a stylesheet.

<HTML>
<HEAD>
<TITLE>

W2CSS: Converting Word Documents to CSS compliant HTML
</TITLE>
<LINK REL="stylesheet" type="text/css" href="stylesheet1.css">
</HEAD>
<BODY>

<P>
W2CSS: A File Converter</P>

<P>
Table Of Contents</P>

<UL>

<LI>

<A NAME="Overview"><A HREF="#Overview">OVERVIEW OF THE TRANSLATOR PROGRAM</A></A><BR></LI>

<LI>
<A NAME="Overview"><A HREF="#AQuickLook">A QUICK LOOK AT WHAT THE TRANSLATOR DOES</A></A><BR></LI>

<LI>
<A NAME="Overview"><A HREF="#HowToInstallTemplate">HOW TO INSTALL THE TRANSLATOR</A></A><BR></LI>
</UL>
   etc...
 

w2cssSetting Example 2

Given the Word document shown in Figure FF, the translator generates the HTML shown in Figure GG. This is a variation on Example 1, above, except here generateStyleDefs is false and tagClassesToHTML is false. Also, a LINK element is included pointing to an existing stylesheet named “stylesheet2.css.” This combination of settings results in suppressing the output of CSS style definitions but still generates class attributes for each HTML element. You might do this if you have many documents that all point to the same stylesheet.

 

Figure FF: Word document for Example 2, a variation on using the reserved style w2cssSetting to link to an existing stylesheet.

Figure FF
 
 

Figure GG: HTML generated with the above option settings.

<HTML>
<HEAD>
<TITLE>
W2CSS: Converting Word Documents to CSS compliant HTML
</TITLE>
<LINK REL="stylesheet" type="text/css" href="stylesheet2.css">

</HEAD>
<BODY>

<P class="Heading-top">
W2CSS: A File Converter</P>

<P class="BodyText1">
Table Of Contents</P>

<UL class="List-Bullet-Mine">

<LI>
<A NAME="Overview"><A HREF="#Overview">OVERVIEW OF THE TRANSLATOR PROGRAM</A></A><BR></LI>

<LI>
<A NAME="Overview"><A HREF="#AQuickLook">A QUICK LOOK AT WHAT THE TRANSLATOR DOES</A></A><BR></LI>

<LI>
<A NAME="Overview"><A HREF="#HowToInstallTemplate">HOW TO INSTALL THE TRANSLATOR</A></A><BR></LI>

</UL>
 
 

Configuration options

Tables below summarize translation options that the user can control via special keywords. Some examples follow. Features marked with asterisks (*) are available only in the registered version.

 

Table 5: General Translation Options

Name of Option

Explanation

Default

Choices

updateScreen

When false, changes to the screen are suppressed, allowing translation to go faster

False

False, False, Yes, No

linkedStyleSheet

When false, styles are written to a separate file with extension “.css”; also a LINK line is created in the HTML HEAD

False

False, False, Yes, No

generateHTMLtitle

When false, the HTML TITLE element is created from the Word document Title; set this to false if you wish to create the title by some other means

False

False, False, Yes, No

generateStyleDefs

If false, no style definitions (CSS class definitions) are created, only HTML. However, the CSS classes will still be tagged to the HTML. This option helps when linking to a pre-defined stylesheet.

False

False, False, Yes, No

tagClassesToHTML

When false, styles are not linked to HTML elements. This allows you to use the translator to just create HTML, sans CSS tagging.

False

False, False, Yes, No

htmlSuffix

The default HTML suffix.

“html”

Others you might use are “htm” or “shtml”

paragraphsOnly

When false, only paragraph styles are processed; no images, character styles or hyperlinks are created. HINT: For faster processing, if you just want to get a sense of how things look as you’re developing a document, set this option to FALSE

False

False, False, Yes, No

spaciousOutput

When false, extra blank lines are inserted between HTML elements for better readability

False

False, False, Yes, No

wrapLines

When false, long lines (~120 character or more) are wrapped in the HTML output. This option adds extra processing time to the translation.

False

False, False, Yes, No

BatchSeparator

For text batchfile lists, this is the character used to separate options on a text batch list line (see Batch separator)

“,”

Others to use: ";"

doEmbeddedObjects *

When false, embedded and linked objects are translated (NOTE: when enabled, this feature causes the translation to take a lot longer)

False

False, False, Yes, No

characterStyles *

When false, character styles in the Word doc are detected and translated

False

False, False, Yes, No

 
 

Table 6: Translation Options for Characters

Name of Option

Explanation

Default

Choices

Tabs

This setting controls how tabs in the Word doc will be treated. (The HTML standard only recognizes tabs in PRE elements. ) When set to “asBlanks”, tabs will be turned into blanks; otherwise, tab characters will be passed through into the output HTML.

asBlanks

asBlanks, notConverted

nonBreakingSpaces

Controls how non-breaking spaces and empty paragraphs will be treated.

asPreElement

asCharacterRefs, asNumericRefs, asPreElement, asEmpty

characterRefs

Setting to “asEntityRefs” causes special characters (ASCII codes > 127) to be treated as entity references.

asEntityRefs

asEntityRefs, asNumericRefs, notConverted

convertMathSymbols

When false, causes conversion of math symbols (ASCII numbers 188, 189, 190, 215, 247) into lower ASCII equivalents. For example ¼ becomes 1/4. See chart.

False

False, False, Yes, No

convertSmartQuotes

When false, causes conversion of curly single and double quotes (ASCII numbers 145, 146, 147, 148) into lower ASCII equivalents. For example “” become For example “” become "". See chart.

False

False, False, Yes, No

convertSpecialMarks

When false, causes conversion of TMmark, COPYRIGHTmark, and REGmark (ASCII numbers 153, 169, 174) into lower ASCII equivalents. For example, © becomes (c). See chart.

False

False, False, Yes, No

convertENdash

When false, causes conversion of EN dash (ASCII 150) into a single hyphen. See chart.

False

False, False, Yes, No

convertEMdash

When false, causes conversion of EM dash (ASCII 151) into two hyphens. See chart.

False

False, False, Yes, No

convertEllipsis

When false, causes conversion of the ellipsis character (ASCII 133) into three periods. See chart.

False

False, False, Yes, No

makeHyperlinkStyle*

When false, creates a SPAN element linked to class hyperlink.

False

False, False, Yes, No

 

Table 7: Translation Options for Fonts & Measurements

Name of Option

Explanation

Default

Choices

fontMeasures

Controls how font measures are expressed

asPercents

asPercents, absolute, inPts

leftRightMargins

Controls how left and right margin measures are expressed

asPercents

asPercents, asEms, inPts

otherMeasures

Controls how other measures (including border widths, padding, space before and after, etc) are expressed

inEms

inEms, inPts

defBodyFontSize

Sets the default point size for the HTML <BODY> element.

12

1 to 255

zeroHTMLelementMargins

When false, the translator outputs preface CSS styles that zero out default margins for various HTML elements. This causes better rendition of the Word doc in HTML

False

False, False, Yes, No

zeroHTMLbodyMargins

When false, the translator outputs preface CSS styles that zero out the margins of the whole HTML BODY.

False

False, False, Yes, No

fontSubstitution

This specifies the default font substitution for fonts not named in the file W2CSSfnt.csv See support for font substitution for a more complete discussion.

sans-serif

serif, sans-serif, fantasy, cursive, monospace

pixelsPerPt

Used in translation cell widths of tables to pixels

1.35

1 to 32

 

Table 8: Translation Options for Tables

Name of Option

Explanation

Default

Choices

makeTableCaptions *

When false, <CAPTION> elements will be created for tables from Word caption immediately above table

False

False, False, Yes, No

tableWidths *

Controls whether tables cell widths will be expressed as Percents or in Pixels

asPercents

asPercents, asPixels

processTables *

When false, HTML tables will be created from Word tables. If false, paragraphs in tables will be output at <P> elements

False

False, False, Yes, No

tableRowHgtAttribute *

When false, HTML tables will have row height attributes.

False

False, False, Yes, No

 

Table 9: Translation Options for Tables of Contents and Tables of Figures

Name of Option

Explanation

Default

Choices

createTOCs

When false, Word’s TOC codes will be translated into equivalent HTML hyperlinked Table of Contents.

False

False, False, Yes, No

bulletedTOC

When false, HTML bulleted lists will be generated from Word.

False

False, False, Yes, No

 

Table 10: Translation Options for Captions

Name of Option

Explanation

Default

Choices

captionNames

If set to full, then all the text in a caption will be used for image ALT text; if set to lblAndName then only the label and name will be used

lblAndName

lblAndName, full

captionsAsIMGalt

When false, captions will be used to generate ALT attributes for images.

False

False, False, Yes, No

captionsVisible

If false, then all paragraphs tagged with the style “Caption” will not be output in the HTML. This option is meant to work with other caption options so that images can be given alternate text names.

False

False, False, Yes, No

 
 

Examples of some Configuration options are given below:

Font Substitution Example 1

For example, if fontSubstitution=serif, and you’d created a Word style that used the font Baskerville, and Baskerville is not listed in the file W2CSSfnt.csv, then the CSS font-family for this style will read

font-family: Baskerville, serif

 

Font Substitution Example 2

If the configuration file lists fontSubstitution=none, and you’d created a Word style that used the font Baskerville, and Baskerville is not listed in the file W2CSSfnt.csv, then the CSS font-family for this style will read

font-family: Baskerville

 

captionsAsImgAlt

short for: “captions as IMG ALT attributes”

default is false

possible values are false, false, yes, no

If set to false, then all paragraphs tagged with the style “Caption” will be used as the ALT text associated with images. The caption that comes immediately before an image becomes the ALT text for that image. This option is meant to work with other caption options so that images can be given alternate text names.

 

Captions Example 1

With captionsAsImgAlt=false, and captionsVisible=false the following HTML (Figure II) will be created from the following Word document (Figure HH):

Figure HH: Word Document for Captions Example 1

Figure HH
 

Figure II: Output HTML for Captions Example 1

   <IMG SRC="whale.gif" 
   ALT="Fijian whale in contemplative pose"></P>
 

captionsVisible

default is false

possible values are false, false, yes, no

If false, then all paragraphs tagged with the style “Caption” will not be output in the HTML. This option is meant to work with other caption options so that images can be given alternate text names.

Using the example above (Figure HH), with captionsAsImgAlt=false, and captionsVisible=false the following HTML will be created from the following Word text:

   <P class="Caption">
   Fijian whale in contemplative pose</P>
   <P class="Normal">
   <IMG SRC="whale.gif" 
   ALT="Fijian whale in contemplative pose"></P>
 

captionNames

default is lblAndName

possible values are lblAndName, full

This option works with the other caption configuration settings to provide ALT text for images. If set to full, then all the text in a caption will be used for image ALT text; if set to lblAndName then only the label and name (such as “Figure 1”) will be used (instead of the full caption, which may be “Figure 1: Fijian whale in contemplative pose”).

If the caption isn’t of the form “Figure X: some words”, but rather is just as string of words, then the setting lblAndName will act is if captionNames is set to full.

In the example below, captionNames=lblAndName, captionsAsImgAlt=false, and captionsVisible=false (These are all defaults for the W2CSS translator). the following HTML (Figure KK) will be created from the following Word document (Figure JJ):

Captions Example 2

Figure JJ: Word Document for Captions Example 2

Figure JJ
 

Figure KK: Output HTML for Captions Example 2

   <P class="Caption">

   Figure A: Fijian whale in contemplative pose</P>
   
   <P class="Normal">
   <IMG SRC="whale.gif" ALT="Figure A"></P>

Compare this with the HTML generated in the Captions Example 1, where captionsVisible=false.

 

paragraphsOnly

default is false

possible values are false, false, yes, no

If set to false, then only the paragraph styles in the Word document will be processed. No EM, or STRONG tags will appear in the output HTML, nor will inline images or hypertext anchors.

 

spaciousOutput

default is false

possible values are false, false, yes, no

When false, this option outputs blank lines in your HTML source between each HTML element and between each CSS class definition, making for a more humanly readable HTML file.

 

linkedStylesheet

default is no

possible values are false, false, yes, no

When this option is set to yes, or false, the translator will create a separate file with the CSS style definitions in it. The stylesheet will have the same name as the output HTML filename except the extension will be “.css.” The HTML will include a LINK element in the HEAD area, which connects the CSS stylesheet to the HTML document.

 

generateHTMLtitle

default is yes

possible values are false, false, yes, no

When this option is set to yes, or false, the translator derives the content of the HTML <TITLE> element from the Word document’s title (see DocumentTitle).

When this setting is false, you cannot specify a document name in the opening dialog. Also no <TITLE> element will appear in the HTML output. To get a <TITLE> element in the HTML when generateHTMLtitle is false, use the reserved style htmlHead.

The generateHTMLtitle setting works best used in conjunction with reserved style htmlHead, and is added for users who wish to specify the HTML title by placing it within the text of the Word document.

As an added convenience, if generateHTMLtitle is false and you delete all text from the Document Title Textbox (in the main dialog), no title element will be generated. .

 

wrapLines

default is no

possible values are false, false, yes, no

When this option is set to yes, or false, the translator wraps lines in the text file that are longer than ~ 110 characters.

 

zeroHtmlElementMargins

short for: “zero HTML element margins”

default is yes

possible values are false, false, yes, no

When this option is set to yes, or false, the translator outputs the following style definitions before outputting other class definitions:

   P {
      margin-left: 0;
      margin-right: 0;
      margin-top: 0;
      margin-bottom: 0; }
   
   H1 {
      margin-left: 0;
      margin-right: 0;
      margin-top: 0;
      margin-bottom: 0; }
   
   H2 {
      margin-left: 0;
      margin-right: 0;
      margin-top: 0;
      margin-bottom: 0; }
   
   etc
 

These definitions reset the browser margin defaults for HTML paragraph elements, for all 6 HTML headings, and for PRE, and ADDRESS elements. This was found to be a useful option in order to create HTML documents that look the same in the browser as the original Word document looked in Word.

 

zeroHtmlBodyMargins

short for: “zero HTML body margins”

default is no

possible values are false, false, yes, no

When this option is set to yes, or false, the translator outputs the following style definition for the BODY element, which effectively zero’s all 4 margins:

   BODY { ...
      margin-left: 0;
      margin-right: 0;
      margin-top: 0;
      margin-bottom: 0; }

This resets the browser margin defaults for the HTML <BODY> element. Although at times useful, this setting is by default left off because (in the browsers tested) it forces documents right up against the edge of the screen window, which isn’t generally desirable.

 

fontMeasures

default is asPercents

possible values are inPoints, absolute, asPercents

If fontMeasures is set to asPercents, font sizes in CSS style definitions are expressed as percentages of the font size of the HTML <BODY> element. Before W2CSS outputs other class definitions, a statement setting the font size of the BODY element is output (see defBodyFontSize). Font measures expressed as percentages are the most scalable of the three fontMeasures configuration choices.

If fontMeasures is set to absolute, font sizes in CSS style definitions are expressed according to the “absolute” system provided by CSS (see sources for complete explanation). Font measures expressed as “absolute” values are scalable but are not as flexible as sizes expressed as percents.

If fontMeasures is set to inPoints, font sizes in CSS style definitions are expressed as points. Measures in points are not guaranteed scalable by the CSS specification (although Netscape Navigator 4 does scale them).

 

leftRightMargins

short for: “left and right margin measures”

default is asPercents

possible values are inPoints, inEms, asPercents

If leftRightMargins is set to asPercents, the left and right margins in the CSS styles output will be expressed as percentages of the width of the original MS Word document (the width of the page between the page margins). This corresponds closely to how browsers will interpret this value. Margin values expressed as percentages are the most scalable of the 3 leftRightMargins configuration choices.

If leftRightMargins is set to inEms, the left and right margins in the CSS styles output will be expressed as em values based on the font size in the particular Word style. This value works in browsers tested but not as well as the default, asPercents.

If leftRightMargins is set to inPoints, the left and right margins in the CSS styles output will be expressed as point values. Point values are not guaranteed scalable by the CSS specification (although Netscape Navigator 4 does scale them).

 

otherMeasures

default is inEms

possible values are inPoints, inEms

If otherMeasures is set to inEms, all other measurement in the style (such as padding, space before and after, border thickness, etc) are expressed as em values based on the font size in the particular Word style. This value works relatively well in browsers tested and is scalable.

If otherMeasures is set to inPoints, all other measurement in the style (such as padding, space before and after, border thickness, etc) are expressed as point sizes. Point values not guaranteed scalable by the CSS specification (although Netscape Navigator 4 does scale them).

 

defBodyFontSize

default is 12

possible values are integers > 0 and < 1000

This is the default point size for the HTML <BODY> element. This number is used in all font percentage calculations, and so is relevant if fontMeasures is set to asPercents (see fontMeasures). You should be careful when playing with this measure as you can get some weird results.

 

Font scaling

If the user elects to use the CSS “absolute scale” he/she can set the threshold at which various font sizes (expressed in points in the originating Word document) will be sorted into the various “absolute” categories. The defaults are:

   xx-large-gt=34
   x-large-gt=24
   large-gt=18
   medium-gt=14
   small-gt=12
   x-small-gt=10

In the keywords, the “gt” means greater than. So, for instance, xx-large-gt sets the point size threshold for xx-large, meaning that point sizes greater than or equal to 34 points will be expressed as xx-large in the CSS style definition.

 
 

Color equivalences

This option allows the user to change the text that is output for various Word color constants. The defaults are:

   wdTurquoise=aqua
   wdPink=fuschia
   wdBlue=blue
   wdGray25=silver
   wdGray50=gray
   wdGreen=green
   wdBrightGreen=lime
   wdDarkRed=maroon
   wdDarkBlue=navy
   wdDarkYellow=olive
   wdViolet=purple
   wdRed=red
   wdTeal=teal
   wdYellow=yellow

Color equivalences example 1

If for some reason you want wdDarkRed to be output as the RGB hex number #802020, then, in the configuration file, place the line
wdDarkRed=#802020.

Color equivalences example 2

The color equivalences option can also create weird and inappropriate results. So, for instance in the configuration file you place the line
wdDarkRed=dog
then wherever the color darkRed is found, such as in a font color, the translator will output the word “dog”.

 

Support for CSS font substitution

CSS incorporates a method of specifying fonts so that if a font isn’t found on a client’s system, a suitable alternate can be specified. W2CSS allows you to specify a series of fonts that will be carried into the CSS font-family property as follows:

On startup, W2CSS looks into the folder specified by the setting of User Templates, File Location. (This setting is found in Tools, Options, File Locations, “User Templates”). In this folder, W2CSS looks for a file named W2CSSfnt.csv. This plain text file contains a comma delimited list of values that specify font names. (An sample W2CSSfnt.csv file is included among the files that you install with the template).

The file W2CSSfnt.csv is easy to create and/or edit in a program such as Excel. To do so from Excel, just save the output as what Excel calls a CSV, or comma delimited file. (the suffix csv is appended automatically). If you don’t use Excel you can create and/or edit this font substitution file in any plain text editor.

In Excel, the file looks like this:

Figure KK

 

In a text editor, the file looks like this:

   Times New Roman,Times Roman,Times,serif,
   Arial,Helvetica,sans-serif,,
   Arial Black,Helvetica,sans-serif,,
   Arial Narrow,Helvetica,sans-serif,,
   Garamond,Times Roman,Times Roman,Times,serif
   Swiss921 BT,Arial Black,Helvetica,sans-serif,
   Comic Sans MS,Arial,Helvetica,fantasy,
   Shelly Volante BT,cursive,,,
   Courier New,Courier,monospace,,
   Avante Garde,Arial,Helvetica,sans-serif,
   Zapf Chancery,cursive,,,
   Trebuchet MS,Arial,Helvetica,sans-serif,
   Futura,Arial,Helvetica,sans-serif,
 

Each line of the file is a series of font names. From left to right, the names on a line specify successively which font families are adequate substitutes for the first font named on the line. So, for instance, on the first line shown above, Times New Roman, if not found by the client browser, can be replaced by Times Roman, which, if not found, can be replaced by Times, which if not found, can be replaced by whatever the browser interprets serif to be.

The trailing commas in this file don’t matter.

This list of font substitutions is passed through into the CSS font-family property for each Word style that W2CSS translates.

 
 

Internationalization Support

Version 2 of W2CSS has been re-written to better support non-English versions of MS Word. However, this support is far from complete.

As much as possible, decisions about stylenames are based on internal VBA constants instead of hard-coding stylenames. So, for instance, instead of comparing a stylename to the English language string “Heading 1”, the comparison in VBA code is made to the constant wdStyleHeading1:

 ActiveDocument.Styles(wdStyleHeading1).NameLocal 
 

This assures that, for example, the user of the Polish language version of Word (for whom style wdStyleHeading1 is not the English string “Heading 1”) will get the same results as users of the English language version.

However certain Word styles that are important to the HTML translation do not have VBA constant equivalents. For example, users of the English language version of Word will find (in Word’s HTML templates) a predefined style called “Preformatted” which maps into the HTML <PRE> element. However, there is no constant called wdStylePreformatted, or anything even close. Numerous instances like this abound.

I have spent time working on a remedy to this situation, but it failed the test on other language versions of Word. This situation may be remedied in future versions of the translator.

 

Batch processing

This feature is only available to Registered Users.

Batch processing means that you can line up a “batch” of things to do. In other words, if you have a series of documents to translate, you can list them out, give the list to the translator, and walk away — as opposed to having to run one file, and then attend to process the next.

To perform batch processing, you submit a list of Word filenames to the translator. Batch lists can be createed in a number of ways. These include

 

If using a batchfile, the file should be based on the W2CSS template. After opening this file open, start the translator, pick the tab “Translate A Batch Of Documents”,then hit the “Translate” button. By default, the open batch file will be processed, its contents treated as a batch list.

When using a Word file, you have the option to describe the batch list as a table or as a series of paragraphs.

Rows of a table that have a single quote mark as the first character of the first cell will cause the whole row to be regarded as a comment.

The example below lists 4 files to be batch processed. The first line is a comment. The second and fourth lines specify “n” for linkedStylesheet indicating styles will be included within the HTML file. The third row indicates “y” (yes) for linkedStylesheet and further specifies a name for the linked stylesheet file. The last row of the table also indicates yes for creation of a linked stylesheet but specifies no names for HTML or CSS files — these will default to being the same as the name part of original file but with different extensions.

Example of a batch list as a table

BatchList Example

 

When using a text file (or a Word doc with no table in), you can include comment lines and the same information as specified in a table. The above table, as a text file, will appear as follows:

 

Example of a batch list as plain text

�filename, linkedStylesheet, html output, css output
Test0311.doc, n, test.shtml 
d:\styles\Test0315a.doc, y, test0315a.shtml, test-shtml.css
d:\styles\Test0415.doc, n 
d:\styles\Test0315d.doc, y 
 

If you don’t specify any options for linked stylesheet of html or css names, then the translation will default to the saved configuration settings.

Batch separator

The default “batch separator” is comma (”,”). This is used to separate items on one line of a text batch list. However, if you wish to use, say, a comma in a filename (which is now allowed in Windows 95), then you may need to specify a different batch separator character such as semicolon (“;”).

 

Dealing with a group of documents

In instances where you must maintain a group of documents, it’s useful to combine a number of the features of W2CSS including reserved stylenames, include directives and batch processing.

The example below demonstrates a situation where a set of 3 documents all link to the same CSS stylesheet. These files use special include directives at top of each file. Also, all three files are processed via a batch list. The stylesheet is not shown — it can be generated either by hand or via a different session with the translator.

 

Word Document testA.doc

Figure KK

 

Word Document testB.doc

Figure KK

 

Word Document testC.doc

Figure KK

 
 

Batch list in file batch.doc

�filename, linkedStylesheet, html output, css output
testA.doc, n
testB.doc, n 
testC.doc, n 
 
 
 

The included file “include-top.txt”

<pre style="display: none"><em>This page looks best in a style-aware browser.</em></pre>
<PRE class="Preformatted">[<A HREF="testA.html">TestA</A>] [<A HREF="testB.html">TestB</A>] [<A HREF="testB.html">TestC</A>]</PRE>
 

View of testA.html in CSS-aware browser

Figure KK

 

View of testB.html in CSS-aware browser

Figure KK

 
 

View of testC.html in CSS-aware browser

Figure KK

 

Known problems and Anomalies

 
 

This program is SHAREWARE

This program is shareware. You are free to use it in unregistered form. However, you are urged to register. I am a small-time programmer and have produced this program as a service and in relation to other projects I am involved with. I believe that software should be low cost or free and also that ability to pay should be built in to the price structure of software. I hope that you, the user, whoever you are, find this a useful program; and I urge you to show your appreciation and support by registering.

Registering gives you a key that unlocks all program features. To register, send $15 per user to the address below. Sites with 3 or more users can register for $12 per user license. Send only check or money order. You will receive a key by mail or, if you specify, by email.

 

About this document

Among the files included with the template W2CSS.dot are the documentation files W2CSSdoc.doc, an MS Word file, and W2CSSdoc.html, an HTML file created by the W2CSS translator from the original Word document. This HTML has not been furthered doctored or edited but is the actual output of the translator. The original Word file is included so that you can try generating it yourself.

I cannot take any responsibility for the deficiencies of particular browsers when it comes to implementations of CSS. I mention this because if you view the doc file in various CSS aware browsers, you will get wildly different results.

 

Sources for information on CSS

 
 
 

W2CSS: WORD TO CSS-COMPLIANT HTML TRANSLATOR, Version 2
© Lewis Gartenberg 1998

 
This document created from MS Word using the W2CSS translator.
If you find bugs, odd anomalies, or would like to suggest improvements or changes, please contact me at:

  • W3 IL
  • Rada's Blog
  • Review Center
  • Markos Web
  • Website: http://www.oocities.org/w2css/