Font knowledge

Extending font knowledge

In some cases it may be necessary to add to the font knowledge of MathFlow MathFlow can use a configuration text file called FontInfo.ini. By placing this file into the same directory as the DLL, it will automatically try to use this file when the DLL is loaded. This section is dedicated to explaining how to use this feature.

Using the techniques shown here, a user can assign a character set (encoding) to a font, create a new encoding, and define the PostScript font name to be used in EPS file generation.

MathFlow contains knowledge of the fonts and characters it works with, which results in improved formatting and translation into EPS. Most of this knowledge is in the form of tables built into the code. However, this information can be extended via the FontInfo.ini file external to the program, allowing it to be expanded and corrected without having to change the application itself.

Encodings

An encoding is a one-to-one correspondence between character meanings and integers. For example, ASCII is an encoding that maps characters onto numbers between 0 and 127 and "a" is assigned the number 97. Fonts are said to use or have a specific encoding. The font's encoding determines what character gets displayed when we pass a given number to the operating system. Character style and shape play no part in the encoding concept — A Times-Roman "a" has the same value as a Bookman-Italic "a" (assuming the fonts use the same encoding). A code-point is a particular value in an encoding. For example, "a" has the code-point 97 in the ASCII encoding.

The MTCode encoding

Central to our font information is the MTCode encoding. MTCode assigns a 16-bit constant to every different character that our software works with. It is superset of Unicode, a standard encoding that attempts to assign a unique number with each of the characters used in the world's languages. Unicode covers a lot of math, but not all the math characters that we need. For this reason, MTCode uses Unicode's Private Use Area (PUA), a range of 6400 code points (0xE000 to 0xF8FF) for its additional math characters. We use MTCode values as the key to all of the per-character information — human-readable character descriptions, token types (variable, operator, etc.).

For more information on Unicode, see the Unicode Consortium. To find out about how we use the Unicode's Private Use Area, see the MTCode encoding tables.

Font encodings

Every font is the expression of some character set. In fact, many fonts share the same character set. We use the term "font encoding" to represent a character set that might be shared by one or more fonts. Many applications (e.g. word processors) don't have to know a font's encoding — the user hits a key, a code is sent to the application, the code is sent back to the operating system to select a character from a font for display. Our software needs to know more than that.

A font encoding can be thought of as a table with two columns, the position within the font (a numerical index) and an MTCode code point value (the number from our own master character list, MTCode, that uniquely identifies the character). We give each font encoding a name (e.g. WindowsANSI, MacStd, Symbol). Many encodings are named after the single font whose encoding it is. Many fonts share the same encoding. For example, standard ISO Latin-1 fonts on Windows all have the WindowsANSI encoding.

Our software represents every encoding other than MTCode as a mapping onto MTCode. That is, for every code-point in a given encoding it indicates a unique MTCode code-point. Using this mapping, we can get at all the per-character information for any code-point in the encoding and, therefore, for any character in fonts that have that encoding. For this reason, knowing a font's encoding is very important.

Unfortunately, the computer's operating system tells us very little about the encodings of fonts, least of all for those containing math symbols. So, we have to keep our own knowledge of which encoding each font has. Of course, as people can create their own fonts, the set of font encodings is open-ended.

Extension Scenarios

Font information may be extended in the following ways:

Define that a font has a given existing encoding.
Define (or override) the PostScript font name of a font.
Define a new encoding and specify that certain font(s) have that encoding.
Define new MTCode values and their attributes (description, default style, token type) and use them in a new encoding.

Determining What MathFlow Knows

Since the font information extension mechanism we are talking about also applies to MathType it can be very useful to do the setup with MathType first to verify that the proper syntax has been used in the configuration file. When the file is finished, you can then copy the file and place it in the same directory as the DLL.

If you have a font installed on your computer MathFlow does not seem to know anything about, the first thing to do is to verify this is the case. The easiest way to do this is via MathType Insert Symbol dialog. Follow these steps after opening MathType (if you don't own MathType it's possible to install a trial):

Choose Insert Symbol from the Edit menu.
Choose Font in the View by menu.
Choose the font in question from the font menu just to the right of the View by menu.
Look at the Encoding name displayed directly under the character grid.

If the encoding name is "Unknown", it means our database of fonts and characters has no information for that font.

Assigning an encoding

Defining the encoding for a font falls into two cases:

The font's character set matches that of another font for which the software already knows the encoding. In this case, all you need to do is assign the same encoding to your font. This is described in the rest of this section.
The font's character set is completely unique. In this case, you will have to create a new encoding first, and then assign it to your font. See Creating a New Encoding.

Once you have decided to assign an encoding to a font, the next step is to determine if its character set matches that of a font for which our software already has an encoding. This is easy if you designed your own font as a perfect substitute for another font that our software already knows about. For example, if you created your own version of the Symbol font, its encoding would be the same as the Symbol font's — "Symbol". If you aren't in this lucky situation, you'll have to work a bit harder. There are a couple of ways to do this:

If you have access to the Internet, you can view our font encoding tables and try to find an exact match in terms of characters and their positions in the font. It would help if you display the font in question in MathType Insert Symbol dialog, as described in the previous section.
Display the font in question in Windows' Character Map utility. Use MathFlow Insert Symbol dialog to view other math fonts on your system, looking for an exact match in terms of characters and their positions in the font. If you find a match, write down the encoding name for the font.

If you found an encoding that matches your new font, you have to tell MathFlow about it by adding to the Fonts section of FontInfo.ini. See The FontInfo.ini file and Font sections for details. If you assign an encoding to a font, you should consider letting our tech support department know so we can add it to the built-in font knowledge of the next version of our software. Just send an email to support@wiris.com and mention the font name and the encoding name (please be precise). If you can, send us a copy of the font and any other information associated with it. If it is a commercially available font, let us know who makes it.

Defining a PostScript name

As stated earlier, our software produces Encapsulated PostScript (EPS) files. These must refer to fonts using the PostScript names of fonts, not their operating system name (the font names listed in MathType and other applications' dialogs and menus. So, for example, "Times New Roman" must be referred to as "Times-Roman" in an EPS file. Unfortunately, this is another piece of information that the operating system doesn't give up easily.

On Windows, our software can generally communicate with the operating system to get the PostScript name for any PostScript font. Names obtained this way are the correct ones for use in an EPS file. For TrueType fonts used both for the screen and printing, our software can often, but not always, obtain the PostScript name from the TrueType font. In order to handle exception cases, however, you can set the PostScript name for a font by adding to the Fonts section of FontInfo.ini. See The FontInfo.ini file and Font sections for details.

Creating a new encoding

Creating a new font encoding is easy, but a bit tedious. Font encodings are defined using a text file that is placed in the same directory as the MathFlow DLL. For an example of what this file should look like, see Font encoding example.

The filename should follow these rules:

It must be unique within the directory that contains the MathFlow DLL.
It should end with an .enc extension.
It should be indicative of the name of the encoding (or identical to it).
The first line of the encoding file defines the name of the encoding. Here is an example:

FontEncoding, 1.0, Byte, Symbol

Your encoding file's first line must start with exactly "FontEncoding, 1.0", which identifies this file as a font encoding file whose version number is 1.0. The rest of the line consists of either Byte or Word. This means that the MTCode value will be either two or four hexadecimal characters. For example, if you use Byte, then an MTCode value is expressed as (for example) AE. If you use Word, then that same number must be expressed as 00AE. The last part of this line is the name of your encoding. So for example, instead of "Symbol", you should use the name of your encoding (which is also what the name of the file should be, followed by .enc). The name of an encoding should be alphanumeric characters without any spaces or punctuation and starting with a letter (e.g. DatapageMath3).

Any blank lines are ignored. Any line that starts with # is a comment. For example:

# Purpose: Symbol font encoding

The rest of the file must contain lines that define the characters in the encoding. For example,

28,226E,Not less-than

The three fields are (from left to right):

The position within the font as a hexadecimal value. This is shown in the MathType Insert Symbol dialog for the selected character in the Font position readout to the right of the character grid.
The MTCode (Unicode) value that uniquely defines a character. You can get this by using MathType Insert Symbol dialog to find the character in another font, then reading the value in the Unicode readout to the right of the character grid. Alternatively, you can find it in either MTCode Encoding Tables or the Font Encoding Tables.
A human-readable character description. This does not define the character's description but it must match the description in the MTCode tables. This redundant information helps avoid many errors.

Caution

Note: These lines must be in order by the position in the font (the first field).

Font encoding example

FontEncoding, 1.0, Byte, MathPi4
#
# This is an example of a font encoding file.
#
20,0020,Space
21,22D0,Double subset
22,22C2,N-ary intersection
23,2286,Subset of or equal to
24,2287,Superset of or equal to
25,E915,Approximate subset of
26,E917,Subset of with dot; is included in as sub-relation
27,2283,Superset of
28,E971,Subset over subset
29,E970,Superset over superset
2A,E916,Superset of with dot; includes as sub-relation
2B,E972,Superset over subset
2C,2282,Subset of
2D,2034,Triple prime
2E,2283,Superset of
2F,2044,Fraction slash
30,2033,Double prime
31,002B,Plus sign
32,002D,Hyphen-minus
33,00D7,Multiplication sign
34,00F7,Division sign
35,003D,Equals sign
36,00B1,Plus-minus sign
37,2213,Minus-plus sign
38,00B0,Degree sign
39,2032,Prime
3A,22C3,N-ary union
3B,2282,Subset of
3C,22C3,N-ary union
3D,2207,Gradient (nabla)
3E,22C2,N-ary intersection
3F,00B7,Middle dot
40,22D1,Double superset
41,0391,Greek capital letter Alpha
42,0392,Greek capital letter Beta
43,03A8,Greek capital letter Psi
44,0394,Greek capital letter Delta
45,0395,Greek capital letter Epsilon
46,03A6,Greek capital letter Phi
47,0393,Greek capital letter Gamma
48,0397,Greek capital letter Eta
49,0399,Greek capital letter Iota
4A,039E,Greek capital letter Xi
4B,039A,Greek capital letter Kappa
4C,039B,Greek capital letter Lamda
4D,039C,Greek capital letter Mu
4E,039D,Greek capital letter Nu
4F,039F,Greek capital letter Omicron
50,03A0,Greek capital letter Pi
51,0398,Greek capital letter Theta
52,03A1,Greek capital letter Rho
53,03A3,Greek capital letter Sigma
54,03A4,Greek capital letter Tau
55,0398,Greek capital letter Theta
56,03A9,Greek capital letter Omega
57,03D0,Greek beta symbol
58,03A7,Greek capital letter Chi
59,03D2,Greek upsilon with hook symbol
5A,0396,Greek capital letter Zeta
5B,2208,Element of
5C,2205,Empty set
5D,220B,Contains as member
5E,E914,Approximate superset of
5F,E973,Subset over superset
60,221E,Infinity
61,03B1,Greek small letter alpha
62,03B2,Greek small letter beta
63,03C8,Greek small letter psi
64,03B4,Greek small letter delta
65,03B5,Greek small letter epsilon
66,03C6,Greek small letter phi
67,03B3,Greek small letter gamma
68,03B7,Greek small letter eta
69,03B9,Greek small letter iota
6A,03BE,Greek small letter xi
6B,03BA,Greek small letter kappa
6C,03BB,Greek small letter lamda
6D,03BC,Greek small letter mu
6E,03BD,Greek small letter nu
6F,03BF,Greek small letter omicron
70,03C0,Greek small letter pi
71,03D1,Greek theta symbol
72,03C1,Greek small letter rho
73,03C3,Greek small letter sigma
74,03C4,Greek small letter tau
75,03B8,Greek small letter theta
76,03C9,Greek small letter omega
77,03D5,Greek phi symbol
78,03C7,Greek small letter chi
79,03C5,Greek small letter upsilon
7A,03B6,Greek small letter zeta
7B,2208,Element of
7C,2223,Divides
7D,220B,Contains as member
7E,221D,Proportional to
A2,2289,Neither a superset of nor equal to
A3,2288,Neither a subset of nor equal to
A7,03C2,Greek small letter final sigma
AB,03B5,Greek small letter epsilon
AD,2202,Partial differential
AE,2285,Not a superset of
B0,2260,Not equal to
B5,2284,Not a subset of
BE,2285,Not a superset of
C0,2285,Not a superset of
C3,03D6,Greek pi symbol
C6,03F0,Greek kappa symbol
C9,2284,Not a subset of
D2,2209,Not an element of
D3,2209,Not an element of
D4,220C,Does not contain as member
D5,220C,Does not contain as member
D6,2285,Not a superset of
DC,2288,Neither a subset of nor equal to
DD,2289,Neither a superset of nor equal to
DE,2260,Not equal to
F2,2284,Not a subset of
F7,2284,Not a subset of
F8,22C3,N-ary union
F9,22C2,N-ary intersection
FA,2284,Not a subset of
FB,019B,Latin small letter lambda with stroke

Adding new characters

Although we have attempted to put every character into the MTCode encoding that is in common use by mathematicians and scientists, this is a goal that we can approach but never reach. If in the process of creating a new encoding, you come across characters that are not in MTCode (i.e. not in the Unicode Specification or in the MTCode Encoding Tables), you have two choices:

Define the character in question as "undefined". This means MathFlow will not know the identity of the character. This is acceptable in some situations. To make a character "undefined", you would use a line like this:

35,F700,Unknown character

Contact us to have the character added to MTCode. To do this, send email to our tech support department at support@wiris.com and provide the following information:
- an example of the character as a GIF file (send a PDF or screen shot of a good-quality rendition of the character and, if possible, a page of math in which it occurs);
- what font it occurs in (if possible, send us the font itself);
- a suggested name for the character;
- any additional information on how the character is used (e.g., is it a binary operator like '+', etc.).

We will assign a new MTCode value to the character and tell you how to proceed from there.

The FontInfo.ini file

This file will be in Windows initialization file format and consist of:

Comment lines with version, author, and copyright info.
Multiple [Font<num>] sections, each of which contains information for a specific font, including its encoding and PostScript names.
An [Encoding] section that, for a given encoding, specifies the file name of the encoding definition file.
An [MTCode] section that, for a given MTCode value, specifies its attributes.

As mentioned above, after you create this file, save it into the same directory as the MathFlow DLL.

Font sections

For each font for which additional information (i.e. PostScript name and/or encoding) is to be specified, the FontInfo.ini file must include a [Font<num>] section (<num> is simply an integer to make the section names unique; e.g. Font1, Font2). Each such section may contain one or more key/value pairs, of which, only the OSName key is required to identify the font whose attributes are being overridden. Any omitted keys will cause the corresponding value to be determined in the default manner described in the earlier sections of this document.

Name = <operating system font name>

Identifies the font being described by the other keys in the section. The <operating system font name> value is the name of the font as it appears in MathFlow style dialog. This key is required.

Encoding = <encoding name>

Identifies the encoding of the font. The <encoding name> value must be the name of a built-in encoding or one defined in the [Encoding] section (see below for details).

PSName<num> = <style> , <PostScript name>

Where <num> is simply an integer, to make the key names unique (e.g. PSName1, PSName2). The <style> value must be P for plain, B for bold, I for italic, or BI for bold-italic. The <PostScript name> is the name to be used in the EPS file.

The Encoding Section

Contains lines of the form:

where <encoding name> defines an encoding (or overrides a built-in encoding) using the data in the file referred to by <encoding definition file>. This file, whose extension is .enc, must exist and the encoding name in this line must match that stored in the file.

The MTCode Section

It is unlikely that you will need to use this section, but we provide the information about this section for completeness. This section contains lines of the form:

where <token type> is NONE, NUM, VAR, FUNC, OPER, BINOP, RELOP, OPEN, CLOSE, FENCE, PUNCT, INNER, CTRL, or SPACE.
where <default style> is NONE, TEXT, FUNCTION, VARIABLE, LCGREEK, UCGREEK, SYMBOL, VECTOR, NUMBER, USER1, USER2, MTEXTRA, or TEXT_FE. (This is only used by MathFlow
where <description> is a human-readable description for a given character.

In this section: