Expanding MathType's font and character information
Abstract
This document describes how to expand MathType font and character knowledge. Using the techniques shown here, a MathType user can assign a character set (encoding) to a font, create a new encoding, and define the PostScript font name to be used in EPS file generation.
Introduction
MathType has a good bit of knowledge of the fonts and characters it works with. This results in more accurate formatting, translation into TeX, EPS, and improvements in many other areas. Most of this additional knowledge is in the form of tables built into the code. However, it can be external to the program, allowing it to be expanded and corrected without having to change the application itself. Also, more sophisticated users can customize MathType and we (or they) can then make the results available to other users.
The purpose of this document is to define the font/character data and the mechanisms and file formats that MathType uses.
Encodings
An encoding is a one-to-one correspondence between character meanings and integers. For example, ASCII is an encoding that maps characters onto numbers between 0 and 127 and "a" is assigned the number 97. Fonts are said to use or have a specific encoding. The font's encoding determines what character gets displayed when we pass a given number to the operating system. Note that character style and shape plays no part in the encoding concept – A Times-Roman "a" has the same value as a Bookman-Italic "a" (assuming the fonts use the same encoding). A code point is a particular value in an encoding. For example, "a" has the code point 97 in the ASCII encoding.
The MTCode encoding
Central to MT's font info is the MTCode encoding. MTCode assigns a 16-bit constant to every different character that MathType works with. It is superset of Unicode, a standard encoding that attempts to assign a unique number with each of the characters used in the world's languages. Unicode covers a lot of math, but not all the math characters that MathType needs. For this reason, MTCode uses Unicode's Private Use Area (PUA), a range of 6400 code points (0xE000 to 0xF8FF) for its additional math characters. MathType uses MTCode values as the key to all its per-character information --- human-readable character descriptions, TeX translations, token type (variable, operator, etc.).
For more information on Unicode, see the Unicode Consortium. The Unicode Standard, Version 5.0 documents the standard in paperback book form (ISBN 0-321-48091-0) and is available at the better computer book stores (or, of course, Amazon.com). To find out about MathType use of Unicode's Private Use Area, see MTCode Encoding Tables.
Font encodings
Every font is the expression of some character set. In fact, many fonts share the same character set. We use the term "font encoding" to represent a character set that might be shared by one or more fonts. Many applications (e.g. word processor) don't have to know a font's encoding --- the user hits a key, a code is sent to the app, the code is sent back to the operating system to select a character from a font for display. MathType needs to know more.
A font encoding can be thought of as a table with two columns, the position within the font (a numerical index) and an MTCode code point value (the number from MathType master character list, MTCode, that uniquely identifies the character). MathType gives each font encoding a name (e.g. WindowsANSI, MacStd, Symbol). Many encodings are named after the single font whose encoding it is. Many fonts share the same encoding. For example, standard Roman fonts on Windows all have the WindowsANSI encoding.
MathType represents every encoding other than MTCode as a mapping onto MTCode. That is, for every code point in a given encoding it indicates a unique MTCode code point. Using this mapping, MathType can get at all the per-character info for any code point in the encoding and, therefore, for any character in fonts that have that encoding. For this reason, knowing a font's encoding is very important.
Unfortunately, the computer's operating system tells us very little about the encodings of fonts, least of all for those containing math symbols. So, MathType has to keep its own knowledge of which encoding each font has. Of course, as people can create their own fonts, the set of font encodings is open-ended.
To view MathType font encodings, see Font Encoding Tables.
Extension scenarios
MathType font info may be extended in the following ways:
Define that a font has a given existing encoding.
Define (or override) the PostScript font name of a font.
Define a new encoding and specify that certain font(s) have that encoding.
Define new MTCode values and their attributes (description, default style, token type) and use them in a new encoding.
Determining what MathType knows about a font
If you have a font installed on your computer that MathType doesn't seem to know anything about, the first thing to do is to verify that this is the case. The easiest way to do this is via MathType Insert Symbol dialog. Follow these steps:
Choose Insert Symbol from the Edit menu.
Choose Font in the View by menu.
Choose the font in question from the font menu just to the right of the View by menu.
Look at the Encoding name displayed directly under the character grid.
If the encoding name is "Unknown", it means MathType has no encoding for that font. This is not always a bad thing. You can still use such fonts in your equations, MathType will simply not know anything about the characters from that font. This is perfectly acceptable for a font like Wingdings or ZapfDingbats. If you add the "telephone" character from Wingdings to your equation, for example, MathType will simply not attempt to put any spacing around it. If, however, the font in question is full of math symbols, this is a situation you might want to correct.
Assigning an encoding to a font
Defining the encoding for a font falls into two cases:
The font's character set matches that of another font for which MathType does know the encoding. In this case, all you need to do is to assign the same encoding to your font. This is described in the rest of this section.
The font's character set is completely unique. In this case, you will have to create a new encoding first, then assign it to your font. See Creating a New Font Encoding.
Once you have decided to assign an encoding to a font, the next step is to determine if its character set matches that of a font for which MathType already has an encoding. This is easy if you designed your own font as a perfect substitute for another font that MathType already knows about. For example, if you created your own version of the Symbol font, its encoding would be the same as the Symbol font's --- "Symbol". If you aren't in this lucky situation, you'll have to work a bit harder. There are a couple of ways to do this:
If you have access to the Internet, you can view MathType Font Encoding Tables and try to find an exact match in terms of characters and their positions in the font. It would help if you display the font in question in MathType Insert Symbol dialog, as described in the previous section.
Display the font in question in Window's Character Map utility. Use MathType Insert Symbol dialog to view other math fonts on your system, looking for an exact match in terms of characters and their positions in the font. If you find a match, write down the encoding name for the font.
If you find an encoding that matches your new font, you have to tell MathType about it by adding to the Fonts section of FontInfo.ini. See The FontInfo.ini File and Font Sections for details. If you assign an encoding to a font, you should consider letting the MathType tech support department know so we can add it to the built-in font knowledge of the next version of MathType Just send an email to support@wiris.com and mention the font name and the encoding name (please be precise). If you can, send us a copy of the font and any other information associated with it. If it is a commercially available font, let us know who makes it.
Defining a font's PostScript name
MathType produces Encapsulated PostScript (EPS) files. These must refer to fonts using the PostScript names of fonts, not their operating system name (i.e. the font names listed in MathType and other applications' dialogs and menus). So, for example, "Times New Roman" must be referred to as "Times-Roman" in an EPS file. Unfortunately, this is another piece of information that the operating system doesn't give up easily. In fact, the PostScript names are dependent on the PostScript environment in the printer. As EPS files can be sent anywhere, it is difficult to determine what the PostScript names should be automatically. So, MathType must do the best it can to come up with default PostScript names but let the user have ultimate control over this.
If you have Adobe Type Manager (ATM) installed, MathType can communicate with ATM to get the PostScript name for any PostScript font. Names obtained this way are the correct ones for use in an EPS file. For a TrueType screen font paired with a PostScript printer font, we can obtain the PostScript name from the TrueType font's Name table. This is also reliable. Where a TrueType font is used both for the screen and printing, MathType can get a PostScript name from the TrueType font, but it will be fairly meaningless as the name is only used (if at all) by the printer driver to name the temporary PostScript font it generates during printing. This is generally not available to an EPS file during printing.
The bottom line is that, if you are going to create EPS files with MathType you should install ATM. This way you can be confident that MathType can determine the PostScript name of each font it needs. However, in order to handle exceptions to the rule, you can set the PostScript name for a font by adding to the Fonts section of FontInfo.ini. See The FontInfo.ini File and Font Sections for details.
Creating a new font encoding
Creating a brand new font encoding is easy, but a bit tedious. Before you start creating your own encoding, you should contact the MathType tech support department at support@wiris.com to see if we have already defined an encoding for the font in question. Even if we haven't, we can suggest a good name for your new encoding that won't clash with the names of other encodings.
Font encodings are defined using a text file in MathType Fonts directory with a filename that follows these rules:
It must be unique within the Fonts directory;
It should end with a .enc extension;
It should be indicative of the name of the encoding (or identical to it).
The first line of the encoding file defines the name of the encoding.
FontEncoding, 1.0, Byte, Symbol
Your encoding file's first line must be exactly like this, except that "Symbol" should be replaced by the name of your encoding. The name of an encoding should be alphanumeric characters without any spaces or punctuation and starting with a letter (e.g. DatapageMath3).
Any blank lines are ignored. Any line that starts with # is a comment:
# Purpose: Symbol font encoding
The rest of the file must contain lines that define the characters in the encoding. For example,
28,226E,NOT LESS-THAN
The three fields are (from left to right):
The position within the font. This is shown in the Insert Symbol dialog for the selected character in the Font position readout to the right of the character grid.
The MTCode (Unicode) value that uniquely defines the character. You can get this by using Insert Symbol to find the character in another font, then reading the value in the Unicode readout to the right of the character grid. Alternatively, you can find it in either MTCode Encoding Tables or the Font Encoding Tables.
The character description. This doesn't define the character's description but it must match the description in the MTCode tables. This redundant information helps avoid many errors.
These lines must be in order by the position in the font (the first field).
Adding new characters to MTCode
Although we have attempted to put every character into the MTCode encoding that is in common use by mathematicians and scientists, this is a goal that we can approach but never reach. If, in the process of creating a new encoding, you come across characters that are not in MTCode (i.e. not in the Unicode Specification or in the MTCode Encoding Tables), you have two choices:
Define the character in question as "undefined". This means MathType will not know the identity of the character. This is acceptable in some situations. To make a character "undefined", you would use a line like this:
35,F700,UNKNOWN CHARACTER
Contact Wiris to have the character added to MTCode. To do this, send email to our tech support department at support@wiris.com and provide the following information:
an example of the character as a GIF file (or fax a good-quality rendition of the character and, if possible, a page of math in which it occurs);
what font it occurs in (if possible, send us the font itself);
a suggested name for the character;
any additional information on how the character is used (e.g. is it a binary operator like '+', etc.).
We will assign a new MTCode value to the character and tell you how to proceed from there.
Reference
Files containing information regarding fonts, encodings, and PostScript font names are installed into a Fonts subdirectory inside the MathType directory. This directory contains:
PostScript and TrueType subdirectories containing copies of MathType fonts.
A FontInfo.ini file that is a text file contains all user-defined extensions to MathType font and character knowledge.
In addition, this directory is the place for any font encoding definition files that you have created or obtained from other users (or the MathType web site).
All extensions to MathType font knowledge are done by changing FontInfo.ini and, possibly, adding font encoding definition files to the Fonts directory.
The FontInfo.ini file
This file will be in Windows initialization file format (like that of WIN.INI in MS Windows 3.1) and consist of:
Comment lines with version, author, and copyright info.
Multiple [Font<num>] sections, each of which contains information for a specific font, including its encoding and PostScript names.
An [Encoding] section that, for a given encoding, specifies the file name of the encoding definition file.
An [MTCode] section that, for a given MTCode value, specifies its attributes.
Font sections
For each font for which additional information (i.e. PostScript name and/or encoding) is to be specified, the FontInfo.ini file must include a [Font<num>] section (<num> is simply an integer to make the section names unique; e.g. Font1, Font2). Each such section may contain one or more key/value pairs, of which, only the OSName key is required to identify the font whose attributes are being overridden. Any omitted keys will cause the corresponding value to be determined in the default manner described in the earlier sections of this document.
Name = <operating system font name>
Identifies the font being described by the other keys in the section. The <operating system font name> value is the name of the font as it appears in MT's font and style dialogs. This key is required.
Encoding = <encoding name>
Identifies the encoding of the font. The <encoding name> value must be the name of a built-in encoding or one defined in the [Encoding] section (see below for details).
PSName<num> = <style> , <PostScript name>
Where <num> is simply an integer to make the key names unique; e.g. PSName1, PSName2. The <style> value must be P for plain, B for bold, I for italic, or BI for bold-italic. The <PostScript name> is the name to be used in the EPS file.
The encoding section
Contains lines of the form:
<encoding name> = <encoding definition file>
…where <encoding name> defines an encoding (or overrides a built-in encoding) using the data in the file referred to by <encoding definition file>. This file, whose extension is .enc, must exist and the encoding name in this line must match that stored in the file.
The MTCode section
Contains lines of the form:
<MTCode value in hex> = <token type>,<default style>,<description>
where <token type> is NONE, NUM, VAR, FUNC, OPER, BINOP, RELOP, OPEN, CLOSE, FENCE, PUNCT, INNER, CTRL, or SPACE.
where <default style> is NONE, TEXT, FUNCTION, VARIABLE, LCGREEK, UCGREEK, SYMBOL, VECTOR, NUMBER, USER1, USER2, MTEXTRA, or TEXT_FE. This is only used if the character can be entered from the keyboard in Math mode.
where <description> is the human-readable description shown in the status bar for a given character. Note that this description may not be in the same language as the user's copy of MathType If this concerns the user, they can localize the description in the extension file or wait for a future version of MathType to include it in its localizable description table and, of course, for that version to be localized to the user's language.
小心
On the Mac, prefix the MTCode value with M+, such as M+2820.
Font encoding example
FontEncoding, 1.0, Byte, MathPi4 # mathpi4.enc --- © 2018, WIRIS Europe (Maths for more S.L), All rights reserved. # Project: mtcode, Component: common # Purpose: Mathematical Pi 4 encoding # code,MTCode,UnicodeName 20,0020,SPACE 21,22D0,DOUBLE SUBSET 22,22C2,N-ARY INTERSECTION 23,2286,SUBSET OF OR EQUAL TO 24,2287,SUPERSET OF OR EQUAL TO 25,E915,APPROXIMATE SUBSET OF 26,E917,SUBSET OF WITH DOT 27,2283,SUPERSET OF 28,E971,SUBSET OVER SUBSET 29,E970,SUPERSET OVER SUPERSET 2A,E916,SUPERSET OF WITH DOT 2B,E972,SUPERSET OVER SUBSET 2C,2282,SUBSET OF 2D,2034,TRIPLE PRIME 2E,2283,SUPERSET OF 2F,2044,FRACTION SLASH 30,2033,DOUBLE PRIME 31,002B,PLUS SIGN 32,002D,HYPHEN-MINUS 33,00D7,MULTIPLICATION SIGN 34,00F7,DIVISION SIGN 35,003D,EQUALS SIGN 36,00B1,PLUS-MINUS SIGN 37,2213,MINUS-OR-PLUS SIGN 38,00B0,DEGREE SIGN 39,2032,PRIME 3A,22C3,N-ARY UNION 3B,2282,SUBSET OF 3C,22C3,N-ARY UNION 3D,2207,NABLA 3E,22C2,N-ARY INTERSECTION 3F,00B7,MIDDLE DOT 40,22D1,DOUBLE SUPERSET 41,0391,GREEK CAPITAL LETTER ALPHA 42,0392,GREEK CAPITAL LETTER BETA 43,03A8,GREEK CAPITAL LETTER PSI 44,0394,GREEK CAPITAL LETTER DELTA 45,0395,GREEK CAPITAL LETTER EPSILON 46,03A6,GREEK CAPITAL LETTER PHI 47,0393,GREEK CAPITAL LETTER GAMMA 48,0397,GREEK CAPITAL LETTER ETA 49,0399,GREEK CAPITAL LETTER IOTA 4A,039E,GREEK CAPITAL LETTER XI 4B,039A,GREEK CAPITAL LETTER KAPPA 4C,039B,GREEK CAPITAL LETTER LAMDA 4D,039C,GREEK CAPITAL LETTER MU 4E,039D,GREEK CAPITAL LETTER NU 4F,039F,GREEK CAPITAL LETTER OMICRON 50,03A0,GREEK CAPITAL LETTER PI 51,0398,GREEK CAPITAL LETTER THETA 52,03A1,GREEK CAPITAL LETTER RHO 53,03A3,GREEK CAPITAL LETTER SIGMA 54,03A4,GREEK CAPITAL LETTER TAU 55,0398,GREEK CAPITAL LETTER THETA 56,03A9,GREEK CAPITAL LETTER OMEGA 57,03D0,GREEK BETA SYMBOL 58,03A7,GREEK CAPITAL LETTER CHI 59,03D2,GREEK UPSILON WITH HOOK SYMBOL 5A,0396,GREEK CAPITAL LETTER ZETA 5B,2208,ELEMENT OF 5C,2205,EMPTY SET 5D,220B,CONTAINS AS MEMBER 5E,E914,APPROXIMATE SUPERSET OF 5F,E973,SUBSET OVER SUPERSET 60,221E,INFINITY 61,03B1,GREEK SMALL LETTER ALPHA 62,03B2,GREEK SMALL LETTER BETA 63,03C8,GREEK SMALL LETTER PSI 64,03B4,GREEK SMALL LETTER DELTA 65,03B5,GREEK SMALL LETTER EPSILON 66,03C6,GREEK SMALL LETTER PHI 67,03B3,GREEK SMALL LETTER GAMMA 68,03B7,GREEK SMALL LETTER ETA 69,03B9,GREEK SMALL LETTER IOTA 6A,03BE,GREEK SMALL LETTER XI 6B,03BA,GREEK SMALL LETTER KAPPA 6C,03BB,GREEK SMALL LETTER LAMDA 6D,03BC,GREEK SMALL LETTER MU 6E,03BD,GREEK SMALL LETTER NU 6F,03BF,GREEK SMALL LETTER OMICRON 70,03C0,GREEK SMALL LETTER PI 71,03D1,GREEK THETA SYMBOL 72,03C1,GREEK SMALL LETTER RHO 73,03C3,GREEK SMALL LETTER SIGMA 74,03C4,GREEK SMALL LETTER TAU 75,03B8,GREEK SMALL LETTER THETA 76,03C9,GREEK SMALL LETTER OMEGA 77,03D5,GREEK PHI SYMBOL 78,03C7,GREEK SMALL LETTER CHI 79,03C5,GREEK SMALL LETTER UPSILON 7A,03B6,GREEK SMALL LETTER ZETA 7B,2208,ELEMENT OF 7C,2223,DIVIDES 7D,220B,CONTAINS AS MEMBER 7E,221D,PROPORTIONAL TO A2,2289,NEITHER A SUPERSET OF NOR EQUAL TO A3,2288,NEITHER A SUBSET OF NOR EQUAL TO A7,03C2,GREEK SMALL LETTER FINAL SIGMA AB,03B5,GREEK SMALL LETTER EPSILON AD,2202,PARTIAL DIFFERENTIAL AE,2285,NOT A SUPERSET OF B0,2260,NOT EQUAL TO B5,2284,NOT A SUBSET OF BE,2285,NOT A SUPERSET OF C0,2285,NOT A SUPERSET OF C3,03D6,GREEK PI SYMBOL C6,03F0,GREEK KAPPA SYMBOL C9,2284,NOT A SUBSET OF D2,,2209,NOT AN ELEMENT OF D3,2209,NOT AN ELEMENT OF D4,220C,DOES NOT CONTAIN AS MEMBER D5,220C,DOES NOT CONTAIN AS MEMBER D6,2285,NOT A SUPERSET OF DC,2288,NEITHER A SUBSET OF NOR EQUAL TO DD,2289,NEITHER A SUPERSET OF NOR EQUAL TO DE,2260,NOT EQUAL TO F2,2284,NOT A SUBSET OF F7,2284,NOT A SUBSET OF F8,22C3,N-ARY UNION F9,22C2,N-ARY INTERSECTION FA,2284,NOT A SUBSET OF FB,019B,LATIN SMALL LETTER LAMBDA WITH STROKE
<a><button> Back to MathType SDK intro page</button></a> |