I’m often given copy for web sites as Word documents. As one would expect, these documents contain all sorts of symbols that should be converted to entities before they can be used in a web page.
For example, ‘smart quotes’ (curly quotes), trademark and registered symbols, em-dashes, and other symbols look great, but can cause problems if you just drop them into a page and don’t use the corresponding character encoding.
Rather than do a global find-and-replace in my HTML editor, I’ve written a few Word macros that replace these symbols.
The first macro, CleanupHTML, replaces smart quotes with straight quotes. The second macro, PrettyHTML, replaces smart quotes with the correct pretty quote, ala ’.
At some point, I want to extend the macro to replace accented characters, such as è, and also to wrap bold and italics in strong and em tags. Ideally, I’d also find a way of converting lists in Word to straight text wrapped in unordered list items, as lists seem to require the most cleanup when copying and pasting.
Sub CleanupHTML() ' ' CleanupHTML Macro ' Cleanup document for HTML by replacing special characters with entities. ' Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "&" .Replacement.Text = "&" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "©" .Replacement.Text = "©" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "®" .Replacement.Text = "®" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "™" .Replacement.Text = "™" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "--" .Replacement.Text = "—" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "—" .Replacement.Text = "—" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "–" .Replacement.Text = "–" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "…" .Replacement.Text = "..." .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "'" .Replacement.Text = "'" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll With Selection.Find .Text = """" .Replacement.Text = """" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll End Sub Sub PrettyHTML() ' ' PrettyHTML Macro ' ' Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "&" .Replacement.Text = "&" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "©" .Replacement.Text = "©" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "®" .Replacement.Text = "®" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "™" .Replacement.Text = "™" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "--" .Replacement.Text = "—" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "—" .Replacement.Text = "—" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "–" .Replacement.Text = "–" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "…" .Replacement.Text = "..." .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = ChrW(8220) .Replacement.Text = "“" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = ChrW(8221) .Replacement.Text = "”" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "'" .Replacement.Text = "’" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting With Selection.Find .Text = "è" .Replacement.Text = "è" .Forward = True .Wrap = wdFindContinue .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With Selection.Find.Execute Replace:=wdReplaceAll Selection.Find.ClearFormatting Selection.Find.Replacement.ClearFormatting End Sub
Many thanks to
http://celebritycola.blogspot.com/2004/09/preserving-formatting-when-posting.html for getting me started on this.