Expanding MathType's font and character information

Abstract

This document describes how to expand MathType font and character knowledge. Using the techniques shown here, a MathType user can assign a character set (encoding) to a font, create a new encoding, and define the PostScript font name to be used in EPS file generation.

Introduction

MathType has a good bit of knowledge of the fonts and characters it works with. This results in more accurate formatting, translation into TeX, EPS, and improvements in many other areas. Most of this additional knowledge is in the form of tables built into the code. However, it can be external to the program, allowing it to be expanded and corrected without having to change the application itself. Also, more sophisticated users can customize MathType and we (or they) can then make the results available to other users.

The purpose of this document is to define the font/character data and the mechanisms and file formats that MathType uses.

Encodings

An encoding is a one-to-one correspondence between character meanings and integers. For example, ASCII is an encoding that maps characters onto numbers between 0 and 127 and "a" is assigned the number 97. Fonts are said to use or have a specific encoding. The font's encoding determines what character gets displayed when we pass a given number to the operating system. Note that character style and shape plays no part in the encoding concept – A Times-Roman "a" has the same value as a Bookman-Italic "a" (assuming the fonts use the same encoding). A code point is a particular value in an encoding. For example, "a" has the code point 97 in the ASCII encoding.

The MTCode encoding

Central to MT's font info is the MTCode encoding. MTCode assigns a 16-bit constant to every different character that MathType works with. It is superset of Unicode, a standard encoding that attempts to assign a unique number with each of the characters used in the world's languages. Unicode covers a lot of math, but not all the math characters that MathType needs. For this reason, MTCode uses Unicode's Private Use Area (PUA), a range of 6400 code points (0xE000 to 0xF8FF) for its additional math characters. MathType uses MTCode values as the key to all its per-character information --- human-readable character descriptions, TeX translations, token type (variable, operator, etc.).

For more information on Unicode, see the Unicode Consortium. The Unicode Standard, Version 5.0 documents the standard in paperback book form (ISBN 0-321-48091-0) and is available at the better computer book stores (or, of course, Amazon.com). To find out about MathType use of Unicode's Private Use Area, see MTCode Encoding Tables.

Font encodings

Every font is the expression of some character set. In fact, many fonts share the same character set. We use the term "font encoding" to represent a character set that might be shared by one or more fonts. Many applications (e.g. word processor) don't have to know a font's encoding --- the user hits a key, a code is sent to the app, the code is sent back to the operating system to select a character from a font for display. MathType needs to know more.

A font encoding can be thought of as a table with two columns, the position within the font (a numerical index) and an MTCode code point value (the number from MathType master character list, MTCode, that uniquely identifies the character). MathType gives each font encoding a name (e.g. WindowsANSI, MacStd, Symbol). Many encodings are named after the single font whose encoding it is. Many fonts share the same encoding. For example, standard Roman fonts on Windows all have the WindowsANSI encoding.

MathType represents every encoding other than MTCode as a mapping onto MTCode. That is, for every code point in a given encoding it indicates a unique MTCode code point. Using this mapping, MathType can get at all the per-character info for any code point in the encoding and, therefore, for any character in fonts that have that encoding. For this reason, knowing a font's encoding is very important.

Unfortunately, the computer's operating system tells us very little about the encodings of fonts, least of all for those containing math symbols. So, MathType has to keep its own knowledge of which encoding each font has. Of course, as people can create their own fonts, the set of font encodings is open-ended.

To view MathType font encodings, see Font Encoding Tables.

Extension scenarios

MathType font info may be extended in the following ways:

Define that a font has a given existing encoding.
Define (or override) the PostScript font name of a font.
Define a new encoding and specify that certain font(s) have that encoding.
Define new MTCode values and their attributes (description, default style, token type) and use them in a new encoding.

Determining what MathType knows about a font

If you have a font installed on your computer that MathType doesn't seem to know anything about, the first thing to do is to verify that this is the case. The easiest way to do this is via MathType Insert Symbol dialog. Follow these steps:

Choose Insert Symbol from the Edit menu.
Choose Font in the View by menu.
Choose the font in question from the font menu just to the right of the View by menu.
Look at the Encoding name displayed directly under the character grid.

If the encoding name is "Unknown", it means MathType has no encoding for that font. This is not always a bad thing. You can still use such fonts in your equations, MathType will simply not know anything about the characters from that font. This is perfectly acceptable for a font like Wingdings or ZapfDingbats. If you add the "telephone" character from Wingdings to your equation, for example, MathType will simply not attempt to put any spacing around it. If, however, the font in question is full of math symbols, this is a situation you might want to correct.

Assigning an encoding to a font

Defining the encoding for a font falls into two cases:

The font's character set matches that of another font for which MathType does know the encoding. In this case, all you need to do is to assign the same encoding to your font. This is described in the rest of this section.
The font's character set is completely unique. In this case, you will have to create a new encoding first, then assign it to your font. See Creating a New Font Encoding.

Once you have decided to assign an encoding to a font, the next step is to determine if its character set matches that of a font for which MathType already has an encoding. This is easy if you designed your own font as a perfect substitute for another font that MathType already knows about. For example, if you created your own version of the Symbol font, its encoding would be the same as the Symbol font's --- "Symbol". If you aren't in this lucky situation, you'll have to work a bit harder. There are a couple of ways to do this:

If you have access to the Internet, you can view MathType Font Encoding Tables and try to find an exact match in terms of characters and their positions in the font. It would help if you display the font in question in MathType Insert Symbol dialog, as described in the previous section.
Display the font in question in Window's Character Map utility. Use MathType Insert Symbol dialog to view other math fonts on your system, looking for an exact match in terms of characters and their positions in the font. If you find a match, write down the encoding name for the font.

If you find an encoding that matches your new font, you have to tell MathType about it by adding to the Fonts section of FontInfo.ini. See The FontInfo.ini File and Font Sections for details. If you assign an encoding to a font, you should consider letting the MathType tech support department know so we can add it to the built-in font knowledge of the next version of MathType Just send an email to support@wiris.com and mention the font name and the encoding name (please be precise). If you can, send us a copy of the font and any other information associated with it. If it is a commercially available font, let us know who makes it.

Defining a font's PostScript name

MathType produces Encapsulated PostScript (EPS) files. These must refer to fonts using the PostScript names of fonts, not their operating system name (i.e. the font names listed in MathType and other applications' dialogs and menus). So, for example, "Times New Roman" must be referred to as "Times-Roman" in an EPS file. Unfortunately, this is another piece of information that the operating system doesn't give up easily. In fact, the PostScript names are dependent on the PostScript environment in the printer. As EPS files can be sent anywhere, it is difficult to determine what the PostScript names should be automatically. So, MathType must do the best it can to come up with default PostScript names but let the user have ultimate control over this.

If you have Adobe Type Manager (ATM) installed, MathType can communicate with ATM to get the PostScript name for any PostScript font. Names obtained this way are the correct ones for use in an EPS file. For a TrueType screen font paired with a PostScript printer font, we can obtain the PostScript name from the TrueType font's Name table. This is also reliable. Where a TrueType font is used both for the screen and printing, MathType can get a PostScript name from the TrueType font, but it will be fairly meaningless as the name is only used (if at all) by the printer driver to name the temporary PostScript font it generates during printing. This is generally not available to an EPS file during printing.

The bottom line is that, if you are going to create EPS files with MathType you should install ATM. This way you can be confident that MathType can determine the PostScript name of each font it needs. However, in order to handle exceptions to the rule, you can set the PostScript name for a font by adding to the Fonts section of FontInfo.ini. See The FontInfo.ini File and Font Sections for details.

Creating a new font encoding

Creating a brand new font encoding is easy, but a bit tedious. Before you start creating your own encoding, you should contact the MathType tech support department at support@wiris.com to see if we have already defined an encoding for the font in question. Even if we haven't, we can suggest a good name for your new encoding that won't clash with the names of other encodings.

Font encodings are defined using a text file in MathType Fonts directory with a filename that follows these rules:

It must be unique within the Fonts directory;
It should end with a .enc extension;
It should be indicative of the name of the encoding (or identical to it).

Font encoding example.

The first line of the encoding file defines the name of the encoding.

FontEncoding, 1.0, Byte, Symbol

Your encoding file's first line must be exactly like this, except that "Symbol" should be replaced by the name of your encoding. The name of an encoding should be alphanumeric characters without any spaces or punctuation and starting with a letter (e.g. DatapageMath3).

Any blank lines are ignored. Any line that starts with # is a comment:

# Purpose: Symbol font encoding

The rest of the file must contain lines that define the characters in the encoding. For example,

28,226E,NOT LESS-THAN

The three fields are (from left to right):

The position within the font. This is shown in the Insert Symbol dialog for the selected character in the Font position readout to the right of the character grid.
The MTCode (Unicode) value that uniquely defines the character. You can get this by using Insert Symbol to find the character in another font, then reading the value in the Unicode readout to the right of the character grid. Alternatively, you can find it in either MTCode Encoding Tables or the Font Encoding Tables.
The character description. This doesn't define the character's description but it must match the description in the MTCode tables. This redundant information helps avoid many errors.

These lines must be in order by the position in the font (the first field).

Adding new characters to MTCode

Although we have attempted to put every character into the MTCode encoding that is in common use by mathematicians and scientists, this is a goal that we can approach but never reach. If, in the process of creating a new encoding, you come across characters that are not in MTCode (i.e. not in the Unicode Specification or in the MTCode Encoding Tables), you have two choices:

Define the character in question as "undefined". This means MathType will not know the identity of the character. This is acceptable in some situations. To make a character "undefined", you would use a line like this:35,F700,UNKNOWN CHARACTER
Contact Wiris to have the character added to MTCode. To do this, send email to our tech support department at support@wiris.com and provide the following information:
- an example of the character as a GIF file (or fax a good-quality rendition of the character and, if possible, a page of math in which it occurs);
- what font it occurs in (if possible, send us the font itself);
- a suggested name for the character;
- any additional information on how the character is used (e.g. is it a binary operator like '+', etc.).
We will assign a new MTCode value to the character and tell you how to proceed from there.

Reference

Files containing information regarding fonts, encodings, and PostScript font names are installed into a Fonts subdirectory inside the MathType directory. This directory contains:

PostScript and TrueType subdirectories containing copies of MathType fonts.
A FontInfo.ini file that is a text file contains all user-defined extensions to MathType font and character knowledge.

In addition, this directory is the place for any font encoding definition files that you have created or obtained from other users (or the MathType web site).

All extensions to MathType font knowledge are done by changing FontInfo.ini and, possibly, adding font encoding definition files to the Fonts directory.

The FontInfo.ini file

This file will be in Windows initialization file format (like that of WIN.INI in MS Windows 3.1) and consist of:

Comment lines with version, author, and copyright info.
Multiple [Font<num>] sections, each of which contains information for a specific font, including its encoding and PostScript names.
An [Encoding] section that, for a given encoding, specifies the file name of the encoding definition file.
An [MTCode] section that, for a given MTCode value, specifies its attributes.

Font sections

For each font for which additional information (i.e. PostScript name and/or encoding) is to be specified, the FontInfo.ini file must include a [Font<num>] section (<num> is simply an integer to make the section names unique; e.g. Font1, Font2). Each such section may contain one or more key/value pairs, of which, only the OSName key is required to identify the font whose attributes are being overridden. Any omitted keys will cause the corresponding value to be determined in the default manner described in the earlier sections of this document.

Name = <operating system font name>

Identifies the font being described by the other keys in the section. The <operating system font name> value is the name of the font as it appears in MT's font and style dialogs. This key is required.

Encoding = <encoding name>

Identifies the encoding of the font. The <encoding name> value must be the name of a built-in encoding or one defined in the [Encoding] section (see below for details).

PSName<num> = <style> , <PostScript name>

Where <num> is simply an integer to make the key names unique; e.g. PSName1, PSName2. The <style> value must be P for plain, B for bold, I for italic, or BI for bold-italic. The <PostScript name> is the name to be used in the EPS file.

The encoding section

Contains lines of the form:

<encoding name> = <encoding definition file>

…where <encoding name> defines an encoding (or overrides a built-in encoding) using the data in the file referred to by <encoding definition file>. This file, whose extension is .enc, must exist and the encoding name in this line must match that stored in the file.

The MTCode section