Table Of Contents
W2CSS is an HTML translator that creates CSS compliant HTML from MS Word documents.
Version 2 now includes translation of Word tables, creation of character level styles, creation of hyperlinked Tables of Contents from Words TOC fields, and handling of embedded and linked objects. The program is shareware and for unregistered users get use of a limited feature set (ie it is not time-limited).
W2CSS is not a general purpose translator for any Word document. Its probably most useful to those translating major sections of text from Word into HTML who wish to have clean, easy to read HTML that is tagged to simple CSS style definitions.
The table of contents feature makes it easy to create Hypertext tables of contents, providing that the Word document uses styles appropriately.
Speaking generally, the translator attempts to create as exact an HTML replica of your Word document as possible (within limits). It translates style definitions and preserves style tagging. This can be of great benefit to those who wish to use CSS styles but have no decent editor in which to create and tag styles to HTML text. The HTML that is generated is meant to stand on its own, or it can be processed further through other HTML tools.
This is not a program for the naïve word user. It expects a certain level of expertise and understanding about the Word program. To fully maximize the translator, you should be familiar with both paragraph and character level styles; also familiarity with Words Outline view and its implications, some understanding of the general concept of Field codes, Captions, Table of Contents and Tables of Figures is also helpful.
If you are a Word user who mostly utilizes direct formatting and doesnt take much advantage of styles in your documents, you may not see much benefit from this translator. If, on the other hand, you understand and regularly utilize styles in your Word documents, and wish to use CSS in your HTML documents, this translator may be of great benefit.
Many first time and naïve users are unaware that by default, every Word paragraph is tagged to some style definition. All paragraphs in the standard (Normal) Word document are tagged with the style Normal. Users need not be aware as they work, that paragraphs are attached to underlying styles.
Direct formatting is the font and paragraph formatting that is applied on top of the underlying style. Word allows direct formatting on an as-you-go, willy-nilly basis. This can result in a confusing mix of global, style-based formatting together with local, direct formatting. Or, with self-discipline, it can produce consistent formattingmuch is up to the individual user.
More experienced Word users who are style aware, usually know when formatting is a result of globally defined style as opposed to the local, direct formatting.
This program requires a certain self-discipline when using Word, in that you must use styles somewhat strictly in documents that are to be translated. In short, you should refrain from direct formatting (see explanation of direct formatting) except for making words bold, italic or into hyperlinks (see explanation of character formatting). While you are not required to do this, you will see more accurate results from the translator if you do.
A series of Tutorials is available on the Internet at www.g-foods.com/styles/ that assist Word users in learning about styles These tutorials cover not only Word styles, but also explain CSS styles.
This user manual was created as a Word document and translated into HTML using the W2CSS translator. (see About This Document)
NOTE: The following example uses some features only available in the registered version.
In the example following, an original Word document (shown in Figure A), is translated into CSS compliant HTML. Figure B lists a portion of the HTML generated; Figure C, the CSS styles generated. Other figures illustrate how the resulting HTML is rendered in both a CSS aware browser (Figure D) and a non-CSS aware browser (Figure E)
Figure A: Word document for Example One
Figure A shows the document as it appears in Normal View in MS Word. The style name area is open so that you can clearly see the styles attached to each paragraph.
Notice that some paragraphs are tagged with a style named htmlCode. Using this reserved stylename you can interlinearly include HTML code into the Word document. On translation, these lines will pass directly into the HTML output.
Figure B is an excerpt of the HTML BODY created in the translation. Figure C is an excerpt of the CSS style definitions generated. In the BODY, each HTML element is connected to a CSS style definition via its class attribute. For instance, the first element in the body is a level one heading, <H1>, attached to CSS class .Heading-1 via the attribute class=Heading-1.
The translator also creates an additional style definition:
.Heading-1 STRONG {font-weight: normal;}
This contextual selector for STRONG is created because the style for .Heading-1 is formatted in Word as BOLD. In the same way that Word treats bold and italic as toggles, the translator creates a definition which you can later use to control how STRONG text within .Heading-1 elements will appear (see the topic Reversing EM etc).
A number of <SPAN> elements are apparent in the HTML listing (Figure B). Each of these corresponds to a portion of the Word document thats tagged with a Word character style. Although W2CSS doesnt fully translate the hierarchy of Words styles, character styles are understood as laying atop paragraph styles, and so inherit the characteristics of their underlying paragraphs.
An instance of this is the text encoding the cookie prices. Notice that this text is colored maroon. A close inspection of the HTML shows that the maroon color change comes from the class .price-char. This class is tagged to HTML using the <SPAN> element. The class itself only expresses color change:
.price-char { color: maroon; }
If you compare the HTML generated by the W2CSS translator with that generated in Words own HTML conversion (when you pick Save as HTML from the File menu), you will find quite a difference. For starters, you should notice the lack of <FONT> tags in the W2CSS output. Also, youll find that in the Word generated HTML, theres no mention of Headings (H1, H2 etc). This is a critical loss of information about the structure of the document. In contrast, the W2CSS translator maps Word styles Heading 1 and Heading 2 as <H1> and <H2>, respectively.
Most probably, the reason that Words own HTML translation creates such messy HTML is that its meant to accommodate the lowest common denominator. That is, since many Word users dont use styles or, if so, use direct formatting in addition to styles, Words translation makes an attempt to capture as much direct formatting as possible. As noted above, if you wish to use the W2CSS translator to advantage, youre urged to forego all direct formatting except for bold and italic (see what this program requires of you).
Figure B: HTML code generated by the translator for the above Word document
<BODY> <H1 class="Heading-1"><IMG SRC="G1.GIF" ALT="G! FOODS: "> PRODUCTS</H1> <PRE class="Normal"> </PRE> <H2 class="Heading-2">Biscotti</H2> <P class="explanation">These cookies are made from whole organic* brown rice flour, and are sweetened with Sucanat®.</P> <hr> <H3 class="Heading-3"><EM>Almond Biscotti</EM></H3> <P class="ingred-para"><STRONG><SPAN CLASS="ingred-char">Ingredients: </SPAN></STRONG>Organic* Brown Rice Flour, Sucanat®, Whole Eggs, Maple Syrup, Almonds, Spices</P> <P class="price-para">One LB Bag: <SPAN class="price-char">$11.25 ea </SPAN>— 1/2 LB Bag: <SPAN class="price-char">$5.90 ea </SPAN></P> <hr> <H3 class="Heading-3"><EM>Chocolate-Chip Almond Biscotti</EM></H3> <P class="ingred-para"><STRONG><SPAN CLASS="ingred-char">Ingredients: </SPAN></STRONG>Almond Biscotti (above) with the addition of Chocolate Chips</P> <P class="price-para">One LB Bag: <SPAN class="price-char">$12.25 ea </SPAN>— 1/2 LB Bag: <SPAN class="price-char">$6.25 ea </SPAN></P> <PRE class="Normal"> </PRE> <HR SIZE=12 WIDTH="100%"> <H2 class="Heading-2">Butter Cookies</H2> <P class="explanation">Made with ghee (clarified butter), a blend of brown and white rice flours and plain old white sugar, this cookie is low in lactose (ghee has most of the butter's milk solids removed). </P>
<P class="Body-Text">The odd backward P symbol <IMG SRC="sty-t1P.gif" ALT="P Mark"> known as the "paragraph mark", usually is simply <EM>tolerated </EM>by most Word users. One of these is deposited in your document every time you hit Enter (the Return key). Life would be simple if a paragraph mark was simply equivalent to what WordPerfect users used to call a "hard return". </P>
In this example, the CSS styles generated are fully scalable. That is, the font sizes are expressed as percentages of the parent font (the font of the HTML BODY element). Similarly, margins are expressed in EM units, which are also scalable.
Another thing to notice are the entity translations. The register mark is translated to entity ®. The long dash, which doesnt have an entity name, is translated as —. (there may be a recent definition but its not yet well supported) The translator provides options to control translation of each of these entities (an improvement over version 1 of W2CSS).
Close observation will tell you also that blank paragraphs are rendered here as <PRE> elements holding one blank character. This featured is controllable via a translation option. Another way to do this is to encode empty paragraphs with  . But this was found to yielded bad results in the Lynx browser.
In a CSS aware browser, the HTML appears close in appearance to the original Word document (Figure D). In a non-CSS browser (Figure E), the text is still readable, with the structure of headings and sub-headings clear. (The <HR> elements were inserted to clarify the documents appearance in such browsers.)
Figure C: Excerpt of CSS stylesheet generated from (above) Word document
.Heading-1 { font-family: Verdana, Arial, Helvetica, Sans-Serif; font-size: 300%; color: maroon; font-weight: bold; text-align: left; margin-top: 0.33em; margin-bottom: 0.08em; margin-right: 3.39%; margin-left: 0%; } .Heading-1 STRONG {font-weight: normal;} .Normal { font-family: Times New Roman, Arial, Helvetica, Sans-Serif; font-size: 83.33%; color: black; text-align: left; margin-right: 0%; margin-left: 0%; } .Heading-2 { font-family: Verdana, Arial, Helvetica, Sans-Serif; font-size: 183.33%; color: black; font-weight: bold; text-align: left; margin-top: 0.55em; margin-bottom: 0.14em; margin-right: -0.18%; margin-left: 0%; } .Heading-2 STRONG {font-weight: normal;} .explanation { font-family: Georgia, Arial, Helvetica, Sans-Serif; font-size: 91.67%; color: black; text-align: left; margin-right: 0%; margin-left: 0%; } .Heading-3 { font-family: Verdana, Arial, Helvetica, Sans-Serif; font-size: 100%; color: red; font-weight: bold; text-align: left; margin-top: 1em; margin-right: 0%; margin-left: 0%; } .Heading-3 STRONG {font-weight: normal;} .ingred-para { font-family: Georgia, Arial, Helvetica, Sans-Serif; font-size: 83.33%; color: black; font-style: italic; text-align: left; margin-right: 0%; margin-left: 0%; } .ingred-para EM {font-style: normal;} .ingred-char { font-family: Verdana, Arial, Helvetica, Sans-Serif; color: black; font-weight: bold; } .price-para { font-family: Verdana, Arial, Helvetica, Sans-Serif; font-size: 83.33%; color: teal; font-weight: bold; text-align: left; margin-right: 0%; margin-left: 0%; } .price-para STRONG {font-weight: normal;} .price-char { color: maroon; } .nutr-para { font-family: Verdana, Arial, Helvetica, Sans-Serif; font-size: 83.33%; color: black; text-align: left; margin-right: 0%; margin-left: 0%; }
Figure D: View of the HTML document in a CSS compliant browser (MSIE4)
Figure E: View of the HTML document in a non-CSS compliant browser (Netscape Navigator 3)
In this example, the same Word document presented in Example One, above, is modified with the addition of a Table. An automatic Table of Contents is generated as well.
Figure F: Word document for Example 2
Figure F shows the document as it appears in Normal View in MS Word. This document uses a table to create a strip of yellow down the left-hand edge. Also, a special TOC (Table of Contents) field code is visible.
The output HTML appears in a CSS-aware browser in Figure G, and in a graphical non-CSS aware browser in Figure H.
Seeing a document in text only version is a good test of how structurally strong the HTML is. Figure I shows the text version of the document in the Lynx browser. Since the original Word document properly utilized Heading styles, the resulting HTML conveys this structure using HTML heading styles, and the document structure is clearly evident in text only Lynx browser.
The hypertext table of contents is here created from Words TOC field. The Word TOC field code in this example (TOC \t Heading 2,1,Heading 3,2) specifies that Headings 2 and 3 will be used to create TOC levels 1 and 2 respectively. The translator creates the hypertext TOC as a bulleted list (an option setting allows for non-bulleted TOCs).
Figure G: View of the HTML document for Example 2 in a CSS aware browser (MSIE4)
Figure H: View of the Example 2 HTML document in a non-CSS aware browser (Netscape Navigator 3)
Figure I: View of the HTML document in the DOS LYNX browser.
NOTE: The W2CSS translator requires you to have Word for Office 97 (also known as Version 8 of MS Word) installed on your computer. aCSShtmlTranslator, the translator program, is actually a Word macro written in Visual Basic for Word; older Word versions dont contain this language and so cannot run the translator. The macro resides in the template W2CSS.dot. To use the macro, you must first install the template on your computer. To install the template:
In order to use the translator, you must open it from within Word.
REGISTERED USERS can use the translator to convert any Word document.
UN REGISTERED USERS are limited to translating only the active Word document. For them, the following instructions (re attaching a document to the template, and creating a document based on the template) are given.
One way to translate any existing document is to attach the template W2CSS.dot to that document. Heres how:
This sounds a lot harder in writing than it actually is in practice.
Once attached to the active document, (the topmost open document is called the active document) the converter is called by choosing Tools, (on the menu bar) then Macro, then Macros.... This brings up a list of all available macros. Choose aCSS2htmlTranslator and then click Run and you should see the opening dialog.
Once youve installed the translator template, you can get to it at the time you create a new document. Choosing New from the File menu, Word presents you with an array of templates, among which you should find W2CSS.dot. Pick this and your document is linked to the styles converter.
If you dont find W2CSS.dot listed as an available template, check under Tools, Options, File Locations and make sure that the location for User Templates points to where you installed the template.
Once attached to the active document, the converter is called by choosing Tools, (on the menu bar) then Macro, then Macros.... This brings up a list of all available macros. Choose aCSShtmlTranslator and then click Run and you should see the opening dialog.
My intention with Version 2 of W2CSS is to enhance the feature set of version 1 to include
These goals are built on the initial intentions of the translator, as stated in the Version 1 documentation, which are
This program started out as a fun project in using Visual Basic and is still a work in progress. I urge users to get in contact with me if they have suggestions or bugs to report.
Hopefully, this low-cost translator will allow more people to publish books, reports, articles and the like as HTML documents. The ideal of free access to all people is a goal some cherish regarding the Internet.
Unfortunately, I must ask for a donation of money ($) in order to allow you the full use of this program the rent must be paid and the wolf kept from the door. If you are cash strapped or a non-profit institution helping low-income (for instance), write me regarding your needs and a donation can possibly be arranged. For those not wanting to pay for registration, the translator still offers a number of useful features. Also, it is not time-limited as much shareware these days is.
Originally a simple language, HTML today has been stretched to the limits by the introduction of physical formatting elements. CSS aims to address and correct a number of these limits of HTML.
After over 6 months of intensive use of VB for Applications (VBA), I can tell you that, its not always a pretty story there under the hood. Weird things happen. If you are an experienced Word and/or Win95 user (and put up with the daily system crashes and lock ups), Im sure you know what Im talking about there almost daily kind of quirky things you run into with these systems; the kind of stuff that prompts such book titles as Windows 95 Annoyances In other words, there are bugs. Windows in general is now getting so complex that as some commentators have remarked, it may well implode on itself if Microsoft adds yet more complexity.
Another major motivation in this project was the need to maintain equivalent print and HTML documents. This is actually a very thorny problem once you get into it. Hopefully, CSS when it comes to full fruition will resolve current problems in this area.
If you think MS office 97 is buggy for the external user, youve not seen buggy. From the inside, all kinds of expected results dont appear as youd expect. Its a machine with still a bunch of loose wires hanging about on the inside.
I am in no way linked to Microsoft Corporation or any of its subsidiaries or contractors; nor is this project related to Microsoft Corporation.
See the first part of this document for a review of features new to Version 2. If you find bugs, odd anomalies, or would like to suggest improvements or changes, please contact me at the address listed at the end of this file.
This Word document template file contains the translator macro aCSShtmlTranslator.
This plain text file is a comma delimited list of fonts used in the translation process (see Support for Font Substitution).
This plain text file contains configuration settings for the translator (see section on the configuration file).
This contains documentation for the translator in CSS compliant HTML. The documentation file, W2CSSdoc.zip is a ZIP archive that contains the file W2CSSdoc.html in addition to 10 GIF files. To expand the archive file, place it where you wish the W2CSS documentation to reside and expand it using WINZIP or PKUNIP; after that you will be able to browse the documentation in either a CSS or non-CSS browser.
This is the Word file from which the file W2CSSdoc.html was created. It is included as an example how the translator actually works.
The opening dialog appears as shown below.
Figure J: The opening dialog to W2CSS, the translator program
NOTE: Registered Users will have access to all the features discussed here. Un Registered Users will only be able to translate the active document and will not be able to perform batch processing.
Word Document to Translate textbox:
The full pathname of the source Word document that is to be translated. If you dont know the source name, you can use the Browse button to find a file.
Active Document checkbox:
When checked, will default the translation to the Word document that was active at the time the translator was called.
CSS Stylesheet textbox:
If you wish to create a linked stylesheet, this textbox should contain the full pathname locating where the stylesheet will be created. By default, it will be in the same folder as the HTML output file. The browse button next to this box allows you to shop for a location.
Linked Stylesheet checkbox:
If unchecked then no separate stylesheet will be created.
HTML Output Name and Location textbox:
The full pathname of the destination HTML document that is to be created. By default, this is located in the same folder as the source Word document. The browse button next to this box allows you to shop for a location.
HTML Title textbox:
This is the title of the HTML document to be created. If youve not created a name for your Word document (under File Menu, Properties, Title), then you may need to fill in a name here.
The HTML document title is culled from the Word document title, and can be changed in this dialog. (see section document title)
Figure K: The second tab of opening dialog, which allows for batch processing,.
The second tab of the opening dialog allows you to process batches of documents. A batch can be specified in two different ways: 1) as a list of document names stored in a file, or 2) as a list of document names collected together in the dialog.
Translate the documents listed in this File textbox:
The filename specified in this box is expected to contain a list of Word document names. The translator will open this file and process each name listed. See ___ for details of the conventions for this batch file.
Translate the documents listed below textbox:
In this part of the dialog, you can add Word document names to a list. The browse button allows you to shop for files. Or you can type in a name and just click the Add button.
If you choose the Translate button when this tab is showing, the program will act on either the list youve typed in, or it will open the document that contains a list of filenames. Which will be processed depends on whichever choice is specified (by way of the option buttons).
NOTE: Registered Users will have access to all the features discussed here. Un Registered Users can only set translation options by editing the file w2css.cfg.
The options dialog allows you to configure how the translator will convert the Word document into HTML.
The various options are grouped into sub-categories, as shown, below. For a complete discussion of each translation option, see the further sections of this document.
Figure L: The General Tab of the Options Dialog
Figure M: The Characters Tab of the Options Dialog
Figure N: The Measures and Fonts Tab of the Options Dialog
Figure O: The Tables Tab of the Options Dialog
Figure P: The TOC and Captions Tab of the Options Dialog
The registration dialog appears as shown in Figure Q.
This dialog gives information about registering and allows the user to input registration information. Once registered, the initial appearance of the About box goes away and all references to registration disappear.
You are kindly urged to register. Sending $15 in check or money order (no cash or credit cards) with a SASE (self addressed stamped envelope) wil result in your getting a code that will remove the nag dialog and other refeences to registering the program. You will also have access to the full features of the translator.
If you Registering you wont have to pay for upgraded versions of this program. In other words, your key will work on versions beyond Version 2.
Figure Q: The registration dialog
The W2CSS program is not necessarily a fast program. This is primarily due to the fact that its running as interpreted Visual Basic (VB) Code within the Office 97 environment. VB also lacks the numerous facilities that a language such as C provide that allow programmers to condense statements and increase CPU efficiency. It is hoped that future versions will be better optimized for speed. The program was tested on both 120 MHz and 200 MHz Pentium machines and, although slower on the first, was still considered acceptable on both
The translator program (referred to as W2CSS throughout this text) endeavors to preserve as much of a Word documents style information as possible for two intended audiences: users with CSS compliant browsers and those without. To accomplish this, the translator maps Word objects into HTML elements, which isnt, on the surface, a difficult problem. The specifics of which Word elements are mapped into particular HTML elements is discussed below.
In addition to mapping Word objects into HTML, the translator creates CSS class definitions that are equivalent to Word style definitions. The translator also tags HTML elements with CSS classes. This means that all HTML elements output by the translator will be tagged with some class (the exception is LI elements, which get their formatting from the enclosing UL or OL element).
Special reserved stylenames are used for controlling the behavior of the translator. Theyre also used to add HTML and/or to affect the output CSS style definitions. These features are provided for the not faint-of-heart who wish to try special effects.
Word has 9 built-in styles named Heading 1, Heading 2 etc... up through Heading 9. The first 6 of these correspond almost exact to HTMLs H1 through H6, and the translator treats them as such. Words level 7, 8, and 9 headings are all folded into H6.
Word Style |
HTML Element |
Attached CSS Class |
---|---|---|
Heading 1 |
H1 |
.Heading-1 |
Heading 2 |
H2 |
.Heading-2 |
Heading 3 |
H3 |
.Heading-3 |
Heading 4 |
H4 |
.Heading-4 |
Heading 5 |
H5 |
.Heading-5 |
Heading 6 |
H6 |
.Heading-6 |
Heading 7 |
H6 |
.Heading-7 |
Heading 8 |
H6 |
.Heading-8 |
Heading 9 |
H6 |
.Heading-9 |
Word styles named H1, H2, H3, H4, H5, and H6 will be translated into corresponding HTML elements as follows:
Word Style |
HTML Element |
Attached CSS Class |
---|---|---|
H1 |
H1 |
.H1 |
H2 |
H2 |
.H2 |
H3 |
H3 |
.H3 |
H4 |
H4 |
.H4 |
H5 |
H5 |
.H5 |
H6 |
H6 |
.H6 |
The HTML PRE element allows text to appear on a web page as typed in; meaning that carriage-returns and spaces will be recognized by the browser. Most browsers interpret PRE using a monospaced typeface such as Courier.
W2CSS will translate a paragraph into an HTML PRE element if the paragraph is tagged with a style named PRE or Preformatted or Preformatted and more text where and more text can be anything the user wishes. For instance, a Word style named Preformatted ONE will become an HTML <PRE> element attached to the class Preformatted-ONE.
The class definition attached to PRE elements has the white-space property set to PRE.
In tagging PRE to a series of lines, best result are obtained by using Words manual line break (shift + enter) instead of hitting Enter at the end of each line. (In Word, the Enter key creates a new paragraph mark, which will be translated into a new <P> element in the output HTML).
Tabs are generally not recognized in HTML except when they occur within a PRE element (although no exact tab stops can be specified).
If an element is a PRE type, then tabs will be passed through into the html output. If the style is not pre, then tabs will be interpreted depending on the setting nonPreTabs
If nonPreTabs is set to asSpaces then tabs will become spaces, otherwise, theyll be passed through into the HTML.
In keeping with the desire to create semantically sensible HTML, any Word paragraph tagged with a style named ADDRESS will be translated into an HTML <ADDRESS> element. The address element usually appears at the end of an HTML document and can assist Web robots in making better sense of a Web page.
Any Word paragraph tagged with a style named Blockquote will be translated into an HTML <BLOCKQUOTE> element. .
HTML provides 2 basic list styles: ordered lists (tag OL) and unordered lists (tag UL). These are the most popular; there are some other tags that act like lists. Word provides 2 classes of lists that correspond, respectively, to OL and UL, and these are Numbered Lists and Bulleted Lists.
The translator recognizes lists by reading the name of the style tagged to a Word paragraph.
Word provides a number of built-in list styles. These include the following style names:
List |
List Number |
List Bullet |
List 2 |
List Number 2 |
List Bullet 2 |
List 3 |
List Number 3 |
List Bullet 3 |
List 4 |
List Number 4 |
List Bullet 4 |
List 5 |
List Number 5 |
List Bullet 5 |
W2CSS translates the built-in styles depending on the words used in the style name. Those styles named List Number become HTML ordered lists (OL). Those styles named List Bullet become HTML unordered lists (UL). Styles List, List 2, List 3, etc become HTML unordered lists (UL). The CSS list type (such as decimal, lower-roman, upper-roman, lower-alpha, upper-alpha) is determined from reading the Word style definition.
Because the browsers tested compound the indents of nested lists (since technically a nested list is an element thats within another, parent element).the translator suppresses the margin setting for second level and further nested list levels
Word permits ordered lists to continue numbering, even when interrupted by non-list type paragraphs. This is not an option in HTML.
The HTML Paragraph element (<P>) is used for all document elements that dont qualify as any of the other HTML elements previous discussed (Headings, Preformatted text, Address, Blockquote, or Lists.
The way that empty paragraphs (Word paragraphs that contain no text) are translated is controlled by the option nonBreakingSpaces. The default case is for empty paragraphs to be expressed as PRE elements since this works best with all browsers tested including Lynx.
Word users are free to create their own styles and name them as they wish. However, Word paragraphs will map into HTML elements depending on the name of the style attached to a paragraph. This should be clear with reference to the built-in styles, discussed above.
However, the translator further allows users to control how a paragraph maps into HTML by allowing for style names that contain portions of the built in names. For instance, a style named Heading 1 Mine will map into an <H1> element linked to class Heading-1-Mine. The general principle is most succinctly explained by reference to the VB code that recognizes the HTML element type:
For i = 1 To 9 aName = UCase(.Styles(wdHeadingStyles(i)).NameLocal) If Left(UCase(aRangeStyle), Len(aName)) = aName Then setAsHeading i, Trim(str(i)) setAsWordHeading Exit Function End If Next i
This basically says that the stylename attached to a Word paragraph will be searched to see if it contains a built-in stylename. The search requires a match from the left end (start of the string) and is case insensitive.
Version 2 now searches based on VBAs constants for Words styles. These change from language to language. Hopefully, this method will facilitate use of W2CSS in languages other than English.
As mentioned above Version 2 of the translator now checks style names based on VBAs constants for Words styles. These change from language to language. This method should facilitate use of W2CSS in languages other than English.
REGISTERED USERS will be able to translate Word tables into HTML tables. Some restrictions apply, as discussed below. (For NON-REGISTERED USERS, the translator ignores tables and translates the contents of cells as paragraphs.
Figure R shows examples of these.
W2CSS can handle the first two kinds of tables well. The third it cannot. The reason it cant handle non-regular tables is that from Visual Basic, tables with vertically merged cells are reported inaccurately. (Programmers note: MAXINT was reported for cell heights of vertically merged cells. Emails to Microsoft and user groups, yielded no help.)
See Translation Options for Tables for tabular presentation of table specific options.
These tables are the most common and simple type of table. They consist of no merging of cells across columns or rows (in HTML parlance, no spanning of cells). Each columns contains the same number of cells as any other column; each row the same number of cells as other rows.
These tables can have variable numbers of cells in each row, resulting in column spanning.
These tables, as mentioned, are not properly handled by the translator.
Figure R: Various Types of Tables
Processing tables involves a greater amount of time since more VBA calls are made, each call slowing down the translation.
The translator creates a special <DIV> tag for tables that are indented. And for centered tables, special <CENTER> tags are generated in conformance with standard HTML practice.
Most commonly, tables are located either at the left, center or right of the browser window by default, to the left. Word allows you to specify left, center and right (under Tables, Cell Heights & Widths, Row, Alignment). The translator first checks whether the table is center or right aligned. If so, it outputs an corresponding align attribute. If its left align and zero indent then no align attribute is output. If its left align but indented, then a DIV element surrounds the table with a style attribute set properly to indent the table from the left edge of the window.
For centered tables, the <CENTER> element is placed around the <table> tags; this allows backward compatibility with Netscapes way of handing centered tables (tests verified this)
The outside border property of the HTML table derives from the setting of the VB variables Borders.OutsideLineStyle and Borders.OutsideLineWidth; If the linestyle is not NONE, then the width value is set from outsidselinewidth.
The inside borders derive from the VB variables Borders.InsideLineStyle and borders.InsideLineWidth; However, for HTML tables, if there is no outside border, then there is no inside one.
Each cell is treated as consisting of one or more paragraphs. The last paragraph of a cell always closes with a <BR> element. This not only gives better results in current browsers (version 4s) but also better results in Lynx.
Widths can be as a percent or as a number of pixels.
Rows marked in word as heading rows (by highlighting and picking Headings from under the Table menu) will be marked with the TH element instead of the TD element. Many browsers now recognize these tags.
Although this isnt supported in all browsers, the HEIGHT is set in pixels (this is where the points per pixel configuration setting comes into play); since this is non-standard, you have the option of turning this on and off in the translation. (see translation option tableRowHgtAttribute)
The translator will create Table Captions using the caption that occurs immediately before the table. Specifically, this is a paragraph in Words built-in Caption style. This feature can be turned off with translation option makeTableCaptions.
Version 2 of the translator is stronger with regard to translating images. By default, the translator only recognizes linked GIF and JPEG graphics. This option that causes translation of linked and embedded objects is available under the General tab of the Options dialog, via the check Process Linked & Embedded Objects.
Word allows the user to instance images in documents in a number of ways including as inline images and as floating (above the text layer). Images can also be embedded from other applications. For instance, you can embed an Adobe Illustrator image or a Visio image directly into your Word document, without the need to save these objects as separate documents. You also have the option of linking graphics into your Word document. Both linking and embedding are part of Microsofts OLE (Object Linking & Embedding) technology. Although W2CSS doesnt itself contain code that directly translates images, it does call upon Words own built-in capabilities to do so.
Translating Linked and Embedded Objects will slow down your translation markedly. This is because a lot of extra stuff is done behind the scenes. For those of you that are curious, the following describes what occurs when you choose to translated embedded objects:
Generally OLE works fairly well. However, you may run into trouble. I have although its hard to put my finger on exactly what causes the crash or lockup. Basically, my observation is that OLE really slows down the whole computer... you are forewarned!)
The translator recognizes images that meet the following criteria
If the image IS NOT in the same directory as the Word file, the translator will add the pathname to the image tag.
If the image IS in the same directory as the Word file, the name will be added with no path. It is highly advised that you put all images in the same directory as the source Word file.
Its generally agreed that decent HTML contains IMG references that include alternate text . This allows not only for disability access to a site, but also provides clarity for web surfers who have graphics turned off.
In W2CSS, ALT text for images can be specified via three different methods:
You can specify alt text by including a paragraph in style htmlcode immediately before the image. The following example should clarify:
Example:
The alt text P Mark will be applied to the inline image shown highlighted with handles (see below). The resulting HTML is shown below.
<H3 class="Heading-3">Paragraphs in Word</H3> <P class="Body-Text">The odd backward P symbol <IMG SRC="sty-t1P.gif" ALT="P Mark"> known as the "paragraph mark", usually is simply <EM>tolerated </EM>by most Word users. ...</P>
You can specify ALT text by creating a private field in the Word document. Follow the pattern shown in the example below.
This method follows Words own convention (this is evident if you open an existing HTML document, and save it as a Word document. Youll see that ALT text has been turned into private fields.
Example:
{PRIVATE "TYPE=PICT; ALT=P Mark"}
To create a private field, go to the Insert Menu, pick Field and then, within ALL Categories, pick PRIVATE. Before OKing the dialog, be sure to type, with quotes, the text TYPE=PICT; and ALT=alt text where alt text is the alternate text for the image in question. This field must appear in the Word document before the occurrence of the actual inline image.
You can specify alt text by creating a Word caption (tagged with the style name Caption), and placing it somewhere ahead of the image for which it carries the ALT text. The various caption settings control use captions. If you use captions in your Word document solely for the purpose of creating image ALT text, you can set the Caption style to invisible and it shouldnt affect your documents output.
The instructions below detail inserting an image using standard Word menu commands. The translator provides an extra macro that, when called, guarantees inserting only GIFs and JPEGs as inline. (See below).
In an effort to make insertion of images easier, the translator includes a special inline image insertion command. This command is available via the macro insertInlinePic that comes with the template W2CSS.dot. When you call insertInlinePic, you will get Words Insert File dialog. The settings for Link to File Float over text and Save with file may appear on or off unfortunately this is misleading. At this writing, Ive had a hard time controlling these settings from VB, and it seems to be a problem with VB. Nonetheless, whatever image you pick will be inserted as linked to the file and not floated over the text, and also, not saved in the file. This is guaranteed (even though the dialog makes it seem like its not the case) because after you make a selection in the dialog, the information is routed through VB code that controls how the image is inserted into the Word document. In addition, going through this button will disallow insertion of any images that dont have extensions GIF, JPG or JPEG.
Word provides what are called character level styles. These are styles tagged to strings of characters as opposed to being attached to paragraphs.
CHARACTER STYLE TRANSLATION IS AVAILABLE ONLY REGISTERED USERS: The translator will recognize character level Word styles and create equivalent CSS Classes for them. Using the HTML <SPAN> element, these classes are attached to selected strings in the HTML. You can control whether character styles are translated via the Options dialog.
If you are using a non-registered version, the translator disregards all of Words character level styles.
The percentage value font size of a character style is built on the size of the parent; that is, the % value is a percentage of the parents font size. Also, for character styles, the font size can only be expressed either as a percentage or as pts. The absolute scale doesnt make much sense here since the translator is basing the size on the size of the parent.
Character styles only apply on a word by word basis. (see below for more discussion)
Bold Italic Text, and hyperlinks can be considered a special case of character-level formatting . See discussion below. These character level formatting types are recognized whether or not you are using a REGISTERED copy of the translator.
The translator works through paragraphs on a word-by-word basis, not a character-by-character basis. This limitation means that bold, italic, hypertext elements, and character styles will only be recognized on word boundaries. The main reason for this limitation is speed. The alternative, of processing the Word document on a character-by-character basis would be intolerably slow, given the already sluggish performance of VB.
(As to what exactly a word is, the translator defers to whatever VBAs internal structures define a word to be).
The translation options paragraphsOnly allows you to control the way the translator works its way through your document. By default, W2CSS moves through each paragraph on a word-by-word basis. If you dont care about processing character information (character styles, EM, STRONG, hyperlinks, or inline images), then you can change the setting paragraphsOnly to true or yes. See configuration file settings and the specifics of the setting paragraphsOnly).
In Word, the equivalents to HTMLs EMPHASIS and STRONG are (as usually interpreted) italic and bold (respectively). W2CSS maps all strings in bold into the HTML <STRONG> element, and all those in italic into the HTML <EM> element.
In cases where a style definition specifies all italic text (that is, the class definition states font-style as italic) the translator creates an additional CSS class for EM that reverses the otherwise italic text. This is in keeping with the way Word treats italic as a toggle that allows for emphasized text in an already italic paragraph to be non-italicized.
W2CSS accommodates the same reversal for STRONG, or bold text, within a paragraph tagged with a style specifying all bold text (for which font-weight is bold).
Hyperlinks are standard fare on Web pages but only recently introduced into the latest versions of Word. Word hyperlinks translate into the HTML anchor (<A>) element. Word allows you to create hyperlinks to both Web addresses (via a specified URL) and also to bookmarks within a given document.
Translation option makeHyperlinkStyle controls whether a SPAN element will be created for hyperlinks.
To create a hyperlink:
To use an image as a hypertext link:
The content of the HTML <TITLE> element is gotten from the Title property of the Word document. This can be set by picking File (on the menu bar), and then Properties, then the Summary Tab, under which will be found Title. More advanced users, can take advantage of the configuration settings generateHTMLtitle used in conjunction with the reserved stylename htmlHead.
This area of the translator is significantly improved over version 1.
Ian Grahams HTML Sourcebook is an excellent source for a concise discussion of this issue.
HTML documents are ASCII text file descriptions that, when interpreted, become fully formatted quasi desktop published pages. To communicate various special characters, a conventional set of entity references has been established. These entity references stand in for otherwise non-standard characters such as long dash, curly quotes, copyright, registration mark, etc.
The entity references supported by the translator are those beginning at ASCII 160 and ending with ASCII 255.
ASCII Number |
Entity Ref. |
Comment |
---|---|---|
160 |
|
the way nbsp is treated depends on the setting of the ___ option |
161 |
¡ |
|
162 |
¢ |
|
163 |
£ |
|
164 |
¤ |
|
165 |
¥ |
|
166 |
¦ |
|
167 |
§ |
|
168 |
¨ |
|
169 |
© |
see option ___ for more details |
170 |
ª |
|
171 |
&laqno; |
|
172 |
¬ |
|
173 |
­ |
|
174 |
® |
see option ___ for more details |
175 |
&hibar; |
|
176 |
° |
|
177 |
± |
see option ___ for more details |
178 |
² |
see option ___ for more details |
179 |
³ |
see option ___ for more details |
180 |
´ |
|
181 |
µ |
|
182 |
¶ |
|
183 |
· |
|
184 |
¸ |
|
185 |
¹ |
|
186 |
º |
|
187 |
» |
|
188 |
¼ |
|
189 |
½ |
|
190 |
¾ |
|
191 |
¿ |
|
192 |
À |
|
193 |
Á |
|
194 |
 |
|
195 |
à |
|
196 |
Ä |
|
197 |
Å |
|
198 |
Æ |
|
199 |
Ç |
|
200 |
È |
|
201 |
É |
|
202 |
Ê |
|
203 |
Ë |
|
204 |
Ì |
|
205 |
Í |
|
206 |
Î |
|
207 |
Ï |
|
208 |
Ð |
|
209 |
Ñ |
|
210 |
Ò |
|
211 |
Ó |
|
212 |
Ô |
|
213 |
Õ |
|
214 |
Ö |
|
215 |
× |
see option ___ for more details |
216 |
Ø |
|
217 |
Ù |
|
218 |
Ú |
|
219 |
Û |
|
220 |
Ü |
|
221 |
Ý |
|
222 |
Þ |
|
223 |
ß |
|
224 |
à |
|
225 |
á |
|
226 |
â |
|
227 |
ã |
|
228 |
ä |
|
229 |
å |
|
230 |
æ |
|
231 |
ç |
|
232 |
è |
|
233 |
é |
|
234 |
ê |
|
235 |
ë |
|
236 |
ì |
|
237 |
í |
|
238 |
î |
|
239 |
ï |
|
240 |
ð |
|
241 |
ñ |
|
242 |
ò |
|
243 |
ó |
|
244 |
ô |
|
245 |
õ |
|
246 |
ö |
|
247 |
÷ |
see option ___ for more details |
248 |
ø |
|
249 |
ù |
|
250 |
ú |
|
251 |
û |
|
252 |
ü |
|
253 |
ý |
|
254 |
þ |
|
255 |
ÿ |
|
For ASCII codes between 128 and 159 (inclusive) the translator generates a numeric character reference. So, for instance, Words long dash, which is ASCII character 151, will be output as —. (a new entity has been defined for long dash but may not yet be widely supported)
Even with the existence of the entity reference convention, support varies. So, for instance, the registered mark, ASCII 174, which is represented by entity ®, will not appear as a register mark in the Lynx browser. Similarly, Lynx doesnt understand .
The translator addresses a number of these problems by offering translation options that create plain ASCII equivalents to a number of commonly found word processing characters. So, for instance, Word users who regularly place smart quotes in their documents (also called curly quotes) can specify that the translator substitute the plain straight double quote (the one that looks like an inch mark) for curly quotes.
A complete set of these options is listed below. Registered Userscan set these via the Options dialog.
Character name |
Special character |
ASCII # |
Plain ASCII Substitution |
Translation option |
---|---|---|---|---|
Ellipsis |
|
133 |
... |
convertEllipsis |
Curly single quotes |
|
145, 146 |
' ' |
convertSmartQuotes |
Curly double quote |
|
147, 148 |
" " |
convertSmartQuotes |
En dash |
|
150 |
- |
convertENdash |
Em dash |
|
151 |
-- |
convertEMdash |
Trademark |
|
153 |
(tm) |
convertSpecialMarks |
Copyright mark |
© |
169 |
(c) |
convertSpecialMarks |
Registered mark |
® |
174 |
(r) |
convertSpecialMarks |
One-quarter fraction |
¼ |
188 |
1/4 |
convertMathSymbols |
One-half fraction |
½ |
189 |
1/2 |
convertMathSymbols |
Three-quarters fraction |
¾ |
190 |
3/4 |
convertMathSymbols |
Times symbol |
× |
215 |
x |
convertMathSymbols |
Divide symbol |
÷ |
247 |
/ [forward slash] |
convertMathSymbols |
In short, many of the above options undo Words Smart formatting.
Its understood that the following characters will be converted to their respective entities in order to avoid problems in most browsers handling of HTML:
< becomes < > becomes > & becomes &
These translations are suppressed in outputting special htmlCode paragraphs (see below).
The translator provides support for adding other HTML and CSS codes into the output HTML. Using certain reserved stylenames, you can tell the translator to insert HTML code into the Head or Body of the output document. You can also tell the translator to insert CSS statements before or after the generated class definitions. The reserved style names are:
Each of these reserved stylenames is discussed below. Another section of this document also discusses reserved stylenames.
An additional provision, new with version 2, is the special keyword #include. When this directive is placed as the first text of a paragraph in the any of the styles htmlHead, htmlCode, htmlBody, cssBefore, or cssAfter, followed by a filename in quotes, then the text in the filename will be send directly into the appropriate HTML output. These files cannot be Word files, but must be pure ASCII text files.
The following text
#include �c:\styles\include-botA.txt�
in line tagged as style htmlcode will result in the contents of file c:\styles\include-botA.txt being sent directly into the html output stream.
This directive allows you to interleave huge portions of html, javascript ,etc into your documents
If you wish to include other HTML codes in the HEAD of the HTML output, you can do the following:
Any paragraphs tagged with the Word style htmlHead (case insignificant) will be passed directly through the translator and into the <HEAD> of the output HTML. This feature is especially helpful for maintaining documents that have META elements and the like in the <HEAD> area.
SPECIAL PRECAUTION: Text tagged with style htmlHead must appear only at the very top of the document. As soon as paragraphs in styles other than htmlHead are detected, lines tagged with htmlHead will be ignored. This limitation keeps the translator from having to do a second pass through the Word document.
Given the Word document shown in Figure S, the translator creates the code shown in Figure T. The paragraphs tagged with style htmlCode pass directly into the HTML output.
Notice in this example that an HTML <TITLE> element is being inserted in the head area. The W2CSS translator also will generate a TITLE element. To avoid this conflicting situation, use the configuration setting generateHTMLtitle to tell the translator to NOT generate a title element.
Figure S: Word document for HTMLHEAD example
Figure T: HTML output for Word document shown above example
<HTML> <HEAD> <TITLE>Selling A Boat</TITLE> <META CONTENT="FOR SALE, BOATS">
If you wish to include other HTML codes in the BODY of the output HTML, you can do the following: (htmlCode is the same, or synonymous with, stylename htmlBody)
Any paragraphs tagged with the Word style htmlCode (case insignificant) will be taken as HTML codes. The translator will pass any text in these paragraphs directly through and into the output HTML.
Smart Quotes (curly single and double quotes) will be straightened out in any paragraphs tagged as htmlCode. However, text within quote marks will not be effected because these objects can be special filenames and the like. (For instance win95 allows for curly quotes in filenames.)
Given the Word document shown in Figure U, the translator creates the code shown in Figure V. Notice that the paragraph tagged with style htmlCode appears in the HTML output, which, when viewed in a browser, results in the appearance of a Horizontal Rule (HR). Notice also that the style htmlCode is hidden text, so it wont interfere with the printing of the Word document. You can format the style htmlCode any way you want, (hidden or not), without affecting the effect of style htmlCode.
Figure U: Word document for HTMLCODE example
Figure V: HTML output for Word document, above
<H1 class="Heading-top"> <EM>W2CSS: </EM>A File Converter </H1> <HR> <P class="BodyText1"> Table Of Contents</P>
If you wish to include other HTML codes in the BODY of the output HTML, you can use this reserved stylename. It is actually an alias name for reserved stylename htmlCode.
If you wish to include other CSS statements in the STYLE area, before the class definitions, you can do the following:
Any paragraphs tagged with the Word style cssBefore (case insignificant) will be passed directly through the translator and will appear before the class statements generated for each Word style.
Given the Word document shown in Figure W, the translator creates the code shown in Figure X. Note that paragraphs tagged with style cssBefore are placed between the preface codes and the class definitions generated by the translator. The reason that cssBefore text is placed after the preface codes is so that you can override them if you wish.
Figure W: Word document for example of cssBefore style
Figure X: The stylesheet generated for the above Word document
If you wish to include other CSS codes in the STYLE area after the class definitions, you can do the following:
Any paragraphs tagged with the Word style cssAfter (case insignificant) will be passed directly through the translator and will appear after the classes generated for each Word style. Placing style codes here will allow you to override or augment the style instructions generated by the translator. This can create unusual effects such as text floating over other text, as shown in the following example.
Given the Word document shown in Figure Y, the translator creates the code shown in Figure AA. The result, when viewed in a CSS-aware browser is shown in Figure Z. Note that paragraphs tagged with style cssAfter are placed after all the class definitions. In this example, W2CSS translates Word style whatAre into CSS class .whatAre. Adding the line .whatAre {margin-top: -2em} using the reserved stylename cssAfter results in augmenting the definition of class whatAre. The result is the floating effect shown in Figure Z.
Figure Y: Word document for cssAfter example
Figure Z: View of cssAfter example in CSS-aware browser
Figure AA: CSS created from Word document shown above example
Speaking generally, it can be said that Word styles map somewhat easily into CSS styles. The W2CSS translator assumes a certain point of view regarding which Word properties map into which CSS properties. Basically, the translator recognizes all paragraph and character level styles, otherwise ignoring localized direct formatting.
This is a relatively easy task. One small problem is that Word styles are allowed to have spaces in their names, whereas CSS class names cannot. The translator works around this by substituting the dash character (-) where spaces are used in the Word style name (this is consistent with CSS naming conventions, e.g. border-color). So, for example, Words Heading 1 style becomes the CSS class name Heading-1. If the Word user creates a style called Heading-1 and this name has already been used (because Heading 1 has already occurred in the document), then the translator will name the CSS style Heading-1-A. If Heading-1-A has already been used in the Word document, then the translated name becomes Heading-1-B. Unless the user is hell-bent on crashing W2CSS (which is possible), this scheme should resolve namespace problems.
Word paragraph style properties fall into groups of properties corresponding to the way Word itself groups these properties (see Figure BB). These include
Figure BB: Word paragraph style formatting properties
(NOTE: Character styles will not be translated in UNREGISTERED VERSIONS OF W2CSS)
Word character style properties fall into groups of properties corresponding to the way Word itself groups these properties. These include
Character styles are almost a subset of paragraph styles. The translator treats character styles as overlaid on top of underlying paragraph styles. Although W2CSS sidesteps the issue of dealing with Words style hierarchy (the based on aspect of styles), the CSS equivalents to character styles specify only those characteristics that reflect the properties specified by Word. (In other words) Character styles are expressed using the HTML <SPAN> element. In this way, they inherit the properties of their underlying elements much as Word character styles build upon underlying paragraph formatting.
In the following section, these characteristics are discussed in relation to their CSS equivalents.
For all of the CSS properties below (except font size, color and border specifications) the translator provides the following scalability options:
Font sizes can be expressed either as
Percentages and relative values are both scalable; Points is a fixed measure that isnt scalable. In tests using MSIE4 it was found that percentages and relative values are scalable. In tests using Netscape Navigator 4, all three font measures are scalable. The CSS specification promises only the first two to be scalable.
Left and right margin sizes can be expressed either as
Other measures, including border widths, padding, space before and after, etc, can be expressed either as
A Word styles font size, specified in points, translates directly into the CSS font-size property.
If the translators fontMeasures option (which is a configuration setting) is set to percentages, then all font measures are figured as percentages of the font size of parent element. For all classes generated by W2CSS, the parent element is the <BODY> element. W2CSS generates a statement at the beginning of the style definitions section which sets the default body font to 12 points. This number can be changed using the configuration setting defBodyFontSize.
If the translators fontMeasures option (which is a configuration setting) is set to absolute, then font sizes are translated into the CSS absolute scale: xx-small, x-small, small, medium, large, x-large, xx-large. See CSS sources for a more thorough explanation of this absolute scale.
If the translators fontMeasures option (which is a configuration setting) is set to inPoints, then font sizes are expressed as point values..
A Word styles font name translates directly into the CSS font-family property. However, CSS provides the option for font family to be given as a list of font names, and includes the possibility for naming a generic family (such as serif, sans-serif, etc). W2CSS accommodates this capability via the additional text file W2CSSfnt.csv. See support for CSS font substitution for details on this feature. Also, the W2CSS translator provides the option to choose a default generic family if a listing for a font is not otherwise given in the file W2Cssfnt.csv. (see the fontSubstitution configuration option)
The CSS font-style property is equivalent to Words font italic style. The option for oblique is not handled by the translator.
The CSS font-variant property is equivalent to Words font effect, small caps. (This CSS property is not yet supported by many browsers).
A Word styles font color translates into the CSS color property. Word provides 16 built-in names that roughly correspond to standardized CSS color names. The only differences are that Word provides 2 shades of gray (gray25 and gray50). By default, W2CSS maps gray50 to CSS gray and gray25 to CSS silver. W2CSS provides a configuration option for users to change which colorname or hex number they would prefer to have output for the various Word built-in colors. See color equivalences in the discussion of configuration settings.
The following table lists Word internal color names and the matching CSS color names that are output by the translator
CSS name Hex # Word internal name ======== ===== ================== aqua 00FFFF wdTurquoise black 000000 wdBlack blue 0000FF wdBlue fuschia FF00FF wdPink gray 808080 wdGray50 green 008000 wdGreen lime 00FF00 wdBrightGreen maroon 800000 wdDarkRed navy 000080 wdDarkBlue olive 808000 wdDarkYellow purple 800080 wdViolet red FF0000 wdRed silver C0C0C0 wdGray25 teal 008080 wdTeal white FFFFFF wdWhite yellow FFFF00 wdYellow
CSSs word-spacing property is not handled by the translator.
The CSS letter-spacing property is equivalent to Words character-level letter spacing setting.
Word only offers the setting all caps. If set, then the CSS text-transform property is set to capitalize. Otherwise it is set to none.
There is a small anomaly in the translator regarding this property in that once the text is set to all caps, it is output to the HTML file in all caps, not in a mixture of upper and lowercase.
The CSS text-decoration property is translated depending on Words font level settings for strikethrough, double-strike-through and underline. The CSS overline and blink properties are not handled by W2CSS. Words strikethrough and double-strikethrough both fold into the CSS property line-through.
CSSs vertical-align property is derived from either the super or sub script property of a character, or, if characters are raised or lowered by a specific number of points, then this is translated to a percentage value.
The CSS line-height property is roughly equivalent to Words paragraph level line spacing property. Word offers a number of line spacing options and only one of these, the at least setting, is not handled by the translator. The others are handled as follows:
Word Line Spacing INTO CSS Line Height |
||
---|---|---|
Word Value |
CSS Value |
|
Single |
becomes |
Normal |
1.5 |
becomes |
1.5 |
2 |
becomes |
2 |
Multiple |
becomes |
A Number (3, 3.5,Or Whatever) |
Exact |
becomes |
A Value in Ems or Pts |
The CSS text-align property is translated directly from Words setting for paragraph alignment (left, center, right, justified).
The CSS text-indent property is translated from Words setting for paragraph first line indent.
The CSS margin properties are translated from Words left and right paragraph indents and from settings for space before and space after. The left and right indents become the CSS left and right margins; the space before and space after become the CSS top and bottom margins, respectively.
W2CSS defaults to outputting left and right values as percents of the page width or your Word document. (This is a configuration setting; see the leftRightMargins setting). You are advised to experiment with margins and see what the various results will be.
A very sticky problem is calculating the left and right margins of a nested list. In the browsers tested, these margins are apparently calculated as offsets from the margins of enclosing lists. Thus, for list styles that are used as nested lists, the CSS style definition created compensates for these effects.
The translator handles CSS padding as roughly equivalent to the space between a paragraph border and the paragraph text (in Word, this is set in the Borders and Shading dialog, by choosing the Options button). If there are no borders on a paragraph, no padding will be indicated by the translator in the output CSS style.
The translator now handles both character and paragraph level border formatting. Below
Word offers a different set of border options than does CSS. The following chart summarizes how these have been mapped into CSS equivalents. Bear in mind that many browser currently dont handle this property well.
Word internal name CSS border style ================== ================ wdLineStyleDot.......................dotted wdLineStyleDashSmallGap wdLineStyleDashLargeGap wdLineStyleDashDot wdLineStyleDashDotDot wdLineStyleDashDotStroked............dashed wdLineStyleDouble wdLineStyleTriple wdLineStyleDoubleWavy................double wdLineStyleThinThickSmallGap wdLineStyleThickThinSmallGap wdLineStyleThinThickThinSmallGap.....double wdLineStyleThinThickMedGap wdLineStyleThickThinMedGap wdLineStyleThinThickThinMedGap.......double wdLineStyleThinThickLargeGap wdLineStyleThickThinLargeGap wdLineStyleThinThickThinLargeGap.....double wdLineStyleEmboss3D..................ridge wdLineStyleEngrave3D.................groove All other borders....................solid
The CSS background-color setting for the classes the translator generates is derived from the setting of paragraph or character shading. Besides offering standard color constants (see the list of color constants), Word offers shading options in 2.5% increments.
If you specify a shade percentage, then you get a hex number for the background color; if you choose one of Words 16 preset colors, you get a color name. The color names can be changed by using the color equivalences configuration setting.
Page margin settings for the active Word document, (available under File, Page Setup), are translated into margin values for the HTML BODY element. The measuring system used depends on the configuration setting leftRightMargins, which can be as percents, ems or points. The top and bottom margins for the Word document are ignored; you are advised to create space at the top and bottom of the document in other ways.
Background colors other than the 16 predefined constants shown in the chart (above) are not currently handled by W2CSS. Instead, no background color will be specified.
From empirical testing with MSIE4 and Netscape Navigator 4, it was found that a more accurate rendition of CSS equivalents to Word styles results if the base HTML elements in the document are first set to zero. By default, the W2CSS program outputs a preface that zeros the margins for the following HTML elements: <P>, <H1> through <H6> and <ADDRESS> (see configuration setting zeroHtmlElementMargins). Another configuration setting allows for the output of preface that will zero out the margin settings for <BODY> element (see zeroHtmlBodyMargins).
Certain stylenames are interpreted by the translator NOT as links to Word style definitions, but rather as denoting special instructions to the translator. Just as many programming languages utilize compiler directives to tell the compiler how to do its job, W2CSS uses special reserved stylenames to communicate to the translator. The following are reserved stylenames:
Using reserved stylenames is not required to achieve good results with the translator. They are provided for those who wish to have more control over the translators behavior.
Figure CC: An example using various reserved stylenames.
W2CSS now includes creation of automatic hyperlinked Tables of Contents for the same reason that Word itself does: to reduce the tedium of manually creating such lists.
Word allows users to create automatic Tables of Contents and Tables of Figures. A number of the variations are discussed below. W2CSS treats a number of these variations, but not all of them.
A Word Table of Contents (TOC for short) is created by including a field code into the Word document. The following are examples of the TOC field code variations handled by the translator:
TOC \O |
Outline Level TOC |
Creates a TOC that lists all lines tagged with the reserved OUTLINE styles namely Heading 1 through Heading 9 |
TOC \O 1-2 |
Outline level TOC |
Creates a TOC listing OUTLINE styles but limited to levels 1 and 2 namely Heading 1 and Heading 2 |
TOC \T Topic1, 1, Topic2, 2 |
Style based TOC |
Creates a TOC based only on the named styles and maps those line into the TOC levels specified. In this case, Topic1 styled paragraphs map into TOC level 1, Topic2 paragraphs into level 1 |
TOC \C Figure |
Caption based TOF |
Creates a TOF based on Captions that are labeled with the word Figure. Entries in the TOF will include the label, e.g. Figure 3: List of Objects |
TOC \A Table |
Caption based TOF |
Creates a TOF based on Captions that are labeled with the word Table. Entries in the TOF will not include the label, e.g. List of Objects |
TOCs (Tables of Contents) and TOFs (Tables of Figures) only find headings and captions from the point where theyre located in the document downward. This is different than the way Word works. This could be considered a disadvantage. For instance if youre using a Heading 1s for the title of the document but dont want the title to appear in the TOC.
Because of problems with VB internals, complex TOC expressions such as \O 2-2 \t Heading 3,3 dont work only the \O code is recognized.
Via the Options dialog (TOC tab) you can control if you want TOCs and TOFs to be bulleted. The default is for them to appear as bulleted lists.
Translation options, also called configuration settings in this manual, allow you to control the particular aspects of the translation process. Registered users can control most of these settings from the Options dialog (see pictures of interface dialogs). Non-registered users will have to resort to editing the configurations setting file W2CSS.cfg. This file contains keywords that describe and control aspects of the translation.
Another way to control translation is by instancing various configuration setting keywords throughout your document using the reserved style #directive (you can also use style w2cssSetting, which has the same effect). See below for details.
The W2CSS translator allows the user to control its behavior via configuration settings. These setting are found in the file W2CSS.cfg, located in the users Templates folder. (see installation procedure regarding placing files in the Templates directory). This is a plain text file that can edited by Windows Notepad or any other text editor.
The translator will work even if the file W2CSS.cfg is not present. In that case, the settings that are marked as default, below, will be in effect. If W2CSS.cfg is present, then by editing it, you can customize the translators behavior.
Oftentimes, you may want to use special configurations settings when you process a particular Word file. For instance, you might want to just process the HTML and not generate any CSS classes (see example below, which demonstrates this). The W2CSS translator allows you to embed configuration settings in the text of a Word document To do this, use the reserved stylename #directive (previously w2cssSetting) as follows:
[In Version 2, reserved name w2cssSettingis replaced by the name #directive. Both are supported and are synonymous.]
Any paragraph tagged with the stylename #directive will be understood by the translator as containing instructions that control a translation option . The line will be parsed and a match attempted on the text. If the text contains a keyword that matches a translation option keyword, that setting will be modified accordingly.
SPECIAL NOTE: Certain configuration settings only work from the configuration file and will have no effect when placed inline in a document: for example, you cannot tell the translator to start processing embedded objects from within the Word doc since this procedure initiates earlier than the opening of the document.
Given the Word document shown in Figure DD, the translator generates the HTML shown in Figure EE. The settings at the top, generateStyleDefs and tagClassesToHTML, both set to false, result in suppressing all class and style information and related attributes. The inclusion of the LINK element in an htmlHead paragraph, results in inclusion of the LINK statement in the output HTML. In this case, any style information will be controlled by the CSS definitions in file stylesheet1.css.
Figure DD: Top lines of a Word document that contains an embedded option settings. The settings shown generate plain HTML with no class information and no CSS styles. A LINK line connects the document to an existing stylesheet.
Figure EE: HTML generated with the above option settings. Note the LINK element that attaches the page to a stylesheet.
<HTML> <HEAD> <TITLE> W2CSS: Converting Word Documents to CSS compliant HTML </TITLE> <LINK REL="stylesheet" type="text/css" href="stylesheet1.css"> </HEAD> <BODY> <P> W2CSS: A File Converter</P> <P> Table Of Contents</P> <UL> <LI> <A NAME="Overview"><A HREF="#Overview">OVERVIEW OF THE TRANSLATOR PROGRAM</A></A><BR></LI> <LI> <A NAME="Overview"><A HREF="#AQuickLook">A QUICK LOOK AT WHAT THE TRANSLATOR DOES</A></A><BR></LI> <LI> <A NAME="Overview"><A HREF="#HowToInstallTemplate">HOW TO INSTALL THE TRANSLATOR</A></A><BR></LI> </UL> etc...
Given the Word document shown in Figure FF, the translator generates the HTML shown in Figure GG. This is a variation on Example 1, above, except here generateStyleDefs is false and tagClassesToHTML is false. Also, a LINK element is included pointing to an existing stylesheet named stylesheet2.css. This combination of settings results in suppressing the output of CSS style definitions but still generates class attributes for each HTML element. You might do this if you have many documents that all point to the same stylesheet.
Figure FF: Word document for Example 2, a variation on using the reserved style w2cssSetting to link to an existing stylesheet.
Figure GG: HTML generated with the above option settings.
<HTML> <HEAD> <TITLE> W2CSS: Converting Word Documents to CSS compliant HTML </TITLE> <LINK REL="stylesheet" type="text/css" href="stylesheet2.css"> </HEAD> <BODY> <P class="Heading-top"> W2CSS: A File Converter</P> <P class="BodyText1"> Table Of Contents</P> <UL class="List-Bullet-Mine"> <LI> <A NAME="Overview"><A HREF="#Overview">OVERVIEW OF THE TRANSLATOR PROGRAM</A></A><BR></LI> <LI> <A NAME="Overview"><A HREF="#AQuickLook">A QUICK LOOK AT WHAT THE TRANSLATOR DOES</A></A><BR></LI> <LI> <A NAME="Overview"><A HREF="#HowToInstallTemplate">HOW TO INSTALL THE TRANSLATOR</A></A><BR></LI> </UL>
Tables below summarize translation options that the user can control via special keywords. Some examples follow. Features marked with asterisks (*) are available only in the registered version.
Table 5: General Translation Options |
|||
Name of Option |
Explanation |
Default |
Choices |
---|---|---|---|
updateScreen |
When false, changes to the screen are suppressed, allowing translation to go faster |
False |
False, False, Yes, No |
linkedStyleSheet |
When false, styles are written to a separate file with extension .css; also a LINK line is created in the HTML HEAD |
False |
False, False, Yes, No |
generateHTMLtitle |
When false, the HTML TITLE element is created from the Word document Title; set this to false if you wish to create the title by some other means |
False |
False, False, Yes, No |
generateStyleDefs |
If false, no style definitions (CSS class definitions) are created, only HTML. However, the CSS classes will still be tagged to the HTML. This option helps when linking to a pre-defined stylesheet. |
False |
False, False, Yes, No |
tagClassesToHTML |
When false, styles are not linked to HTML elements. This allows you to use the translator to just create HTML, sans CSS tagging. |
False |
False, False, Yes, No |
htmlSuffix |
The default HTML suffix. |
html |
Others you might use are htm or shtml |
paragraphsOnly |
When false, only paragraph styles are processed; no images, character styles or hyperlinks are created. HINT: For faster processing, if you just want to get a sense of how things look as youre developing a document, set this option to FALSE |
False |
False, False, Yes, No |
spaciousOutput |
When false, extra blank lines are inserted between HTML elements for better readability |
False |
False, False, Yes, No |
wrapLines |
When false, long lines (~120 character or more) are wrapped in the HTML output. This option adds extra processing time to the translation. |
False |
False, False, Yes, No |
BatchSeparator |
For text batchfile lists, this is the character used to separate options on a text batch list line (see Batch separator) |
, |
Others to use: ";" |
doEmbeddedObjects * |
When false, embedded and linked objects are translated (NOTE: when enabled, this feature causes the translation to take a lot longer) |
False |
False, False, Yes, No |
characterStyles * |
When false, character styles in the Word doc are detected and translated |
False |
False, False, Yes, No |
Table 6: Translation Options for Characters |
|||
Name of Option |
Explanation |
Default |
Choices |
---|---|---|---|
Tabs |
This setting controls how tabs in the Word doc will be treated. (The HTML standard only recognizes tabs in PRE elements. ) When set to asBlanks, tabs will be turned into blanks; otherwise, tab characters will be passed through into the output HTML. |
asBlanks |
asBlanks, notConverted |
nonBreakingSpaces |
Controls how non-breaking spaces and empty paragraphs will be treated. |
asPreElement |
asCharacterRefs, asNumericRefs, asPreElement, asEmpty |
characterRefs |
Setting to asEntityRefs causes special characters (ASCII codes > 127) to be treated as entity references. |
asEntityRefs |
asEntityRefs, asNumericRefs, notConverted |
convertMathSymbols |
When false, causes conversion of math symbols (ASCII numbers 188, 189, 190, 215, 247) into lower ASCII equivalents. For example ¼ becomes 1/4. See chart. |
False |
False, False, Yes, No |
convertSmartQuotes |
When false, causes conversion of curly single and double quotes (ASCII numbers 145, 146, 147, 148) into lower ASCII equivalents. For example become For example become "". See chart. |
False |
False, False, Yes, No |
convertSpecialMarks |
When false, causes conversion of TMmark, COPYRIGHTmark, and REGmark (ASCII numbers 153, 169, 174) into lower ASCII equivalents. For example, © becomes (c). See chart. |
False |
False, False, Yes, No |
convertENdash |
When false, causes conversion of EN dash (ASCII 150) into a single hyphen. See chart. |
False |
False, False, Yes, No |
convertEMdash |
When false, causes conversion of EM dash (ASCII 151) into two hyphens. See chart. |
False |
False, False, Yes, No |
convertEllipsis |
When false, causes conversion of the ellipsis character (ASCII 133) into three periods. See chart. |
False |
False, False, Yes, No |
makeHyperlinkStyle* |
When false, creates a SPAN element linked to class hyperlink. |
False |
False, False, Yes, No |
Table 7: Translation Options for Fonts & Measurements |
|||
Name of Option |
Explanation |
Default |
Choices |
---|---|---|---|
fontMeasures |
Controls how font measures are expressed |
asPercents |
asPercents, absolute, inPts |
leftRightMargins |
Controls how left and right margin measures are expressed |
asPercents |
asPercents, asEms, inPts |
otherMeasures |
Controls how other measures (including border widths, padding, space before and after, etc) are expressed |
inEms |
inEms, inPts |
defBodyFontSize |
Sets the default point size for the HTML <BODY> element. |
12 |
1 to 255 |
zeroHTMLelementMargins |
When false, the translator outputs preface CSS styles that zero out default margins for various HTML elements. This causes better rendition of the Word doc in HTML |
False |
False, False, Yes, No |
zeroHTMLbodyMargins |
When false, the translator outputs preface CSS styles that zero out the margins of the whole HTML BODY. |
False |
False, False, Yes, No |
fontSubstitution |
This specifies the default font substitution for fonts not named in the file W2CSSfnt.csv See support for font substitution for a more complete discussion. |
sans-serif |
serif, sans-serif, fantasy, cursive, monospace |
pixelsPerPt |
Used in translation cell widths of tables to pixels |
1.35 |
1 to 32 |
Table 8: Translation Options for Tables |
|||
Name of Option |
Explanation |
Default |
Choices |
---|---|---|---|
makeTableCaptions * |
When false, <CAPTION> elements will be created for tables from Word caption immediately above table |
False |
False, False, Yes, No |
tableWidths * |
Controls whether tables cell widths will be expressed as Percents or in Pixels |
asPercents |
asPercents, asPixels |
processTables * |
When false, HTML tables will be created from Word tables. If false, paragraphs in tables will be output at <P> elements |
False |
False, False, Yes, No |
tableRowHgtAttribute * |
When false, HTML tables will have row height attributes. |
False |
False, False, Yes, No |
Table 9: Translation Options for Tables of Contents and Tables of Figures |
|||
Name of Option |
Explanation |
Default |
Choices |
---|---|---|---|
createTOCs |
When false, Words TOC codes will be translated into equivalent HTML hyperlinked Table of Contents. |
False |
False, False, Yes, No |
bulletedTOC |
When false, HTML bulleted lists will be generated from Word. |
False |
False, False, Yes, No |
Table 10: Translation Options for Captions |
|||
Name of Option |
Explanation |
Default |
Choices |
---|---|---|---|
captionNames |
If set to full, then all the text in a caption will be used for image ALT text; if set to lblAndName then only the label and name will be used |
lblAndName |
lblAndName, full |
captionsAsIMGalt |
When false, captions will be used to generate ALT attributes for images. |
False |
False, False, Yes, No |
captionsVisible |
If false, then all paragraphs tagged with the style Caption will not be output in the HTML. This option is meant to work with other caption options so that images can be given alternate text names. |
False |
False, False, Yes, No |
For example, if fontSubstitution=serif, and youd created a Word style that used the font Baskerville, and Baskerville is not listed in the file W2CSSfnt.csv, then the CSS font-family for this style will read
font-family: Baskerville, serif
If the configuration file lists fontSubstitution=none, and youd created a Word style that used the font Baskerville, and Baskerville is not listed in the file W2CSSfnt.csv, then the CSS font-family for this style will read
font-family: Baskerville
short for: captions as IMG ALT attributes
default is false
possible values are false, false, yes, no
If set to false, then all paragraphs tagged with the style Caption will be used as the ALT text associated with images. The caption that comes immediately before an image becomes the ALT text for that image. This option is meant to work with other caption options so that images can be given alternate text names.
With captionsAsImgAlt=false, and captionsVisible=false the following HTML (Figure II) will be created from the following Word document (Figure HH):
Figure HH: Word Document for Captions Example 1
Figure II: Output HTML for Captions Example 1
<IMG SRC="whale.gif" ALT="Fijian whale in contemplative pose"></P>
default is false
possible values are false, false, yes, no
If false, then all paragraphs tagged with the style Caption will not be output in the HTML. This option is meant to work with other caption options so that images can be given alternate text names.
Using the example above (Figure HH), with captionsAsImgAlt=false, and captionsVisible=false the following HTML will be created from the following Word text:
<P class="Caption"> Fijian whale in contemplative pose</P> <P class="Normal"> <IMG SRC="whale.gif" ALT="Fijian whale in contemplative pose"></P>
default is lblAndName
possible values are lblAndName, full
This option works with the other caption configuration settings to provide ALT text for images. If set to full, then all the text in a caption will be used for image ALT text; if set to lblAndName then only the label and name (such as Figure 1) will be used (instead of the full caption, which may be Figure 1: Fijian whale in contemplative pose).
If the caption isnt of the form Figure X: some words, but rather is just as string of words, then the setting lblAndName will act is if captionNames is set to full.
In the example below, captionNames=lblAndName, captionsAsImgAlt=false, and captionsVisible=false (These are all defaults for the W2CSS translator). the following HTML (Figure KK) will be created from the following Word document (Figure JJ):
Figure JJ: Word Document for Captions Example 2
Figure KK: Output HTML for Captions Example 2
<P class="Caption"> Figure A: Fijian whale in contemplative pose</P> <P class="Normal"> <IMG SRC="whale.gif" ALT="Figure A"></P>
Compare this with the HTML generated in the Captions Example 1, where captionsVisible=false.
default is false
possible values are false, false, yes, no
If set to false, then only the paragraph styles in the Word document will be processed. No EM, or STRONG tags will appear in the output HTML, nor will inline images or hypertext anchors.
default is false
possible values are false, false, yes, no
When false, this option outputs blank lines in your HTML source between each HTML element and between each CSS class definition, making for a more humanly readable HTML file.
default is no
possible values are false, false, yes, no
When this option is set to yes, or false, the translator will create a separate file with the CSS style definitions in it. The stylesheet will have the same name as the output HTML filename except the extension will be .css. The HTML will include a LINK element in the HEAD area, which connects the CSS stylesheet to the HTML document.
default is yes
possible values are false, false, yes, no
When this option is set to yes, or false, the translator derives the content of the HTML <TITLE> element from the Word documents title (see DocumentTitle).
When this setting is false, you cannot specify a document name in the opening dialog. Also no <TITLE> element will appear in the HTML output. To get a <TITLE> element in the HTML when generateHTMLtitle is false, use the reserved style htmlHead.
The generateHTMLtitle setting works best used in conjunction with reserved style htmlHead, and is added for users who wish to specify the HTML title by placing it within the text of the Word document.
As an added convenience, if generateHTMLtitle is false and you delete all text from the Document Title Textbox (in the main dialog), no title element will be generated. .
default is no
possible values are false, false, yes, no
When this option is set to yes, or false, the translator wraps lines in the text file that are longer than ~ 110 characters.
short for: zero HTML element margins
default is yes
possible values are false, false, yes, no
When this option is set to yes, or false, the translator outputs the following style definitions before outputting other class definitions:
P { margin-left: 0; margin-right: 0; margin-top: 0; margin-bottom: 0; } H1 { margin-left: 0; margin-right: 0; margin-top: 0; margin-bottom: 0; } H2 { margin-left: 0; margin-right: 0; margin-top: 0; margin-bottom: 0; } etc
These definitions reset the browser margin defaults for HTML paragraph elements, for all 6 HTML headings, and for PRE, and ADDRESS elements. This was found to be a useful option in order to create HTML documents that look the same in the browser as the original Word document looked in Word.
short for: zero HTML body margins
default is no
possible values are false, false, yes, no
When this option is set to yes, or false, the translator outputs the following style definition for the BODY element, which effectively zeros all 4 margins:
BODY { ... margin-left: 0; margin-right: 0; margin-top: 0; margin-bottom: 0; }
This resets the browser margin defaults for the HTML <BODY> element. Although at times useful, this setting is by default left off because (in the browsers tested) it forces documents right up against the edge of the screen window, which isnt generally desirable.
default is asPercents
possible values are inPoints, absolute, asPercents
If fontMeasures is set to asPercents, font sizes in CSS style definitions are expressed as percentages of the font size of the HTML <BODY> element. Before W2CSS outputs other class definitions, a statement setting the font size of the BODY element is output (see defBodyFontSize). Font measures expressed as percentages are the most scalable of the three fontMeasures configuration choices.
If fontMeasures is set to absolute, font sizes in CSS style definitions are expressed according to the absolute system provided by CSS (see sources for complete explanation). Font measures expressed as absolute values are scalable but are not as flexible as sizes expressed as percents.
If fontMeasures is set to inPoints, font sizes in CSS style definitions are expressed as points. Measures in points are not guaranteed scalable by the CSS specification (although Netscape Navigator 4 does scale them).
short for: left and right margin measures
default is asPercents
possible values are inPoints, inEms, asPercents
If leftRightMargins is set to asPercents, the left and right margins in the CSS styles output will be expressed as percentages of the width of the original MS Word document (the width of the page between the page margins). This corresponds closely to how browsers will interpret this value. Margin values expressed as percentages are the most scalable of the 3 leftRightMargins configuration choices.
If leftRightMargins is set to inEms, the left and right margins in the CSS styles output will be expressed as em values based on the font size in the particular Word style. This value works in browsers tested but not as well as the default, asPercents.
If leftRightMargins is set to inPoints, the left and right margins in the CSS styles output will be expressed as point values. Point values are not guaranteed scalable by the CSS specification (although Netscape Navigator 4 does scale them).
default is inEms
possible values are inPoints, inEms
If otherMeasures is set to inEms, all other measurement in the style (such as padding, space before and after, border thickness, etc) are expressed as em values based on the font size in the particular Word style. This value works relatively well in browsers tested and is scalable.
If otherMeasures is set to inPoints, all other measurement in the style (such as padding, space before and after, border thickness, etc) are expressed as point sizes. Point values not guaranteed scalable by the CSS specification (although Netscape Navigator 4 does scale them).
default is 12
possible values are integers > 0 and < 1000
This is the default point size for the HTML <BODY> element. This number is used in all font percentage calculations, and so is relevant if fontMeasures is set to asPercents (see fontMeasures). You should be careful when playing with this measure as you can get some weird results.
If the user elects to use the CSS absolute scale he/she can set the threshold at which various font sizes (expressed in points in the originating Word document) will be sorted into the various absolute categories. The defaults are:
xx-large-gt=34 x-large-gt=24 large-gt=18 medium-gt=14 small-gt=12 x-small-gt=10
In the keywords, the gt means greater than. So, for instance, xx-large-gt sets the point size threshold for xx-large, meaning that point sizes greater than or equal to 34 points will be expressed as xx-large in the CSS style definition.
This option allows the user to change the text that is output for various Word color constants. The defaults are:
wdTurquoise=aqua wdPink=fuschia wdBlue=blue wdGray25=silver wdGray50=gray wdGreen=green wdBrightGreen=lime wdDarkRed=maroon wdDarkBlue=navy wdDarkYellow=olive wdViolet=purple wdRed=red wdTeal=teal wdYellow=yellow
If for some reason you want wdDarkRed to be output as the RGB hex number #802020, then, in the configuration file, place the line
wdDarkRed=#802020.
The color equivalences option can also create weird and inappropriate results. So, for instance in the configuration file you place the line
wdDarkRed=dog
then wherever the color darkRed is found, such as in a font color, the translator will output the word dog.
CSS incorporates a method of specifying fonts so that if a font isnt found on a clients system, a suitable alternate can be specified. W2CSS allows you to specify a series of fonts that will be carried into the CSS font-family property as follows:
On startup, W2CSS looks into the folder specified by the setting of User Templates, File Location. (This setting is found in Tools, Options, File Locations, User Templates). In this folder, W2CSS looks for a file named W2CSSfnt.csv. This plain text file contains a comma delimited list of values that specify font names. (An sample W2CSSfnt.csv file is included among the files that you install with the template).
The file W2CSSfnt.csv is easy to create and/or edit in a program such as Excel. To do so from Excel, just save the output as what Excel calls a CSV, or comma delimited file. (the suffix csv is appended automatically). If you dont use Excel you can create and/or edit this font substitution file in any plain text editor.
In Excel, the file looks like this:
In a text editor, the file looks like this:
Times New Roman,Times Roman,Times,serif, Arial,Helvetica,sans-serif,, Arial Black,Helvetica,sans-serif,, Arial Narrow,Helvetica,sans-serif,, Garamond,Times Roman,Times Roman,Times,serif Swiss921 BT,Arial Black,Helvetica,sans-serif, Comic Sans MS,Arial,Helvetica,fantasy, Shelly Volante BT,cursive,,, Courier New,Courier,monospace,, Avante Garde,Arial,Helvetica,sans-serif, Zapf Chancery,cursive,,, Trebuchet MS,Arial,Helvetica,sans-serif, Futura,Arial,Helvetica,sans-serif,
Each line of the file is a series of font names. From left to right, the names on a line specify successively which font families are adequate substitutes for the first font named on the line. So, for instance, on the first line shown above, Times New Roman, if not found by the client browser, can be replaced by Times Roman, which, if not found, can be replaced by Times, which if not found, can be replaced by whatever the browser interprets serif to be.
The trailing commas in this file dont matter.
This list of font substitutions is passed through into the CSS font-family property for each Word style that W2CSS translates.
Version 2 of W2CSS has been re-written to better support non-English versions of MS Word. However, this support is far from complete.
As much as possible, decisions about stylenames are based on internal VBA constants instead of hard-coding stylenames. So, for instance, instead of comparing a stylename to the English language string Heading 1, the comparison in VBA code is made to the constant wdStyleHeading1:
ActiveDocument.Styles(wdStyleHeading1).NameLocal
This assures that, for example, the user of the Polish language version of Word (for whom style wdStyleHeading1 is not the English string Heading 1) will get the same results as users of the English language version.
However certain Word styles that are important to the HTML translation do not have VBA constant equivalents. For example, users of the English language version of Word will find (in Words HTML templates) a predefined style called Preformatted which maps into the HTML <PRE> element. However, there is no constant called wdStylePreformatted, or anything even close. Numerous instances like this abound.
I have spent time working on a remedy to this situation, but it failed the test on other language versions of Word. This situation may be remedied in future versions of the translator.
This feature is only available to Registered Users.
�filename, linkedStylesheet, html output, css output Test0311.doc, n, test.shtml d:\styles\Test0315a.doc, y, test0315a.shtml, test-shtml.css d:\styles\Test0415.doc, n d:\styles\Test0315d.doc, y
In instances where you must maintain a group of documents, its useful to combine a number of the features of W2CSS including reserved stylenames, include directives and batch processing.
�filename, linkedStylesheet, html output, css output testA.doc, n testB.doc, n testC.doc, n
<pre style="display: none"><em>This page looks best in a style-aware browser.</em></pre> <PRE class="Preformatted">[<A HREF="testA.html">TestA</A>] [<A HREF="testB.html">TestB</A>] [<A HREF="testB.html">TestC</A>]</PRE>
This program is shareware. You are free to use it in unregistered form. However, you are urged to register. I am a small-time programmer and have produced this program as a service and in relation to other projects I am involved with. I believe that software should be low cost or free and also that ability to pay should be built in to the price structure of software. I hope that you, the user, whoever you are, find this a useful program; and I urge you to show your appreciation and support by registering.
Registering gives you a key that unlocks all program features. To register, send $15 per user to the address below. Sites with 3 or more users can register for $12 per user license. Send only check or money order. You will receive a key by mail or, if you specify, by email.
Among the files included with the template W2CSS.dot are the documentation files W2CSSdoc.doc, an MS Word file, and W2CSSdoc.html, an HTML file created by the W2CSS translator from the original Word document. This HTML has not been furthered doctored or edited but is the actual output of the translator. The original Word file is included so that you can try generating it yourself.
I cannot take any responsibility for the deficiencies of particular browsers when it comes to implementations of CSS. I mention this because if you view the doc file in various CSS aware browsers, you will get wildly different results.
W2CSS: WORD TO CSS-COMPLIANT HTML TRANSLATOR, Version 2
© Lewis Gartenberg 1998