Font knowledge
Extending font knowledge
In some cases it may be necessary to add to the font knowledge of MathFlow MathFlow can use a configuration text file called FontInfo.ini. By placing this file into the same directory as the DLL, it will automatically try to use this file when the DLL is loaded. This section is dedicated to explaining how to use this feature.
Using the techniques shown here, a user can assign a character set (encoding) to a font, create a new encoding, and define the PostScript font name to be used in EPS file generation.
MathFlow contains knowledge of the fonts and characters it works with, which results in improved formatting and translation into EPS. Most of this knowledge is in the form of tables built into the code. However, this information can be extended via the FontInfo.ini file external to the program, allowing it to be expanded and corrected without having to change the application itself.
Encodings
An encoding is a one-to-one correspondence between character meanings and integers. For example, ASCII is an encoding that maps characters onto numbers between 0 and 127 and "a" is assigned the number 97. Fonts are said to use or have a specific encoding. The font's encoding determines what character gets displayed when we pass a given number to the operating system. Character style and shape play no part in the encoding concept — A Times-Roman "a" has the same value as a Bookman-Italic "a" (assuming the fonts use the same encoding). A code-point is a particular value in an encoding. For example, "a" has the code-point 97 in the ASCII encoding.
The MTCode encoding
Central to our font information is the MTCode encoding. MTCode assigns a 16-bit constant to every different character that our software works with. It is superset of Unicode, a standard encoding that attempts to assign a unique number with each of the characters used in the world's languages. Unicode covers a lot of math, but not all the math characters that we need. For this reason, MTCode uses Unicode's Private Use Area (PUA), a range of 6400 code points (0xE000 to 0xF8FF) for its additional math characters. We use MTCode values as the key to all of the per-character information — human-readable character descriptions, token types (variable, operator, etc.).
For more information on Unicode, see the Unicode Consortium. To find out about how we use the Unicode's Private Use Area, see the MTCode encoding tables.
Font encodings
Every font is the expression of some character set. In fact, many fonts share the same character set. We use the term "font encoding" to represent a character set that might be shared by one or more fonts. Many applications (e.g. word processors) don't have to know a font's encoding — the user hits a key, a code is sent to the application, the code is sent back to the operating system to select a character from a font for display. Our software needs to know more than that.
A font encoding can be thought of as a table with two columns, the position within the font (a numerical index) and an MTCode code point value (the number from our own master character list, MTCode, that uniquely identifies the character). We give each font encoding a name (e.g. WindowsANSI, MacStd, Symbol). Many encodings are named after the single font whose encoding it is. Many fonts share the same encoding. For example, standard ISO Latin-1 fonts on Windows all have the WindowsANSI encoding.
Our software represents every encoding other than MTCode as a mapping onto MTCode. That is, for every code-point in a given encoding it indicates a unique MTCode code-point. Using this mapping, we can get at all the per-character information for any code-point in the encoding and, therefore, for any character in fonts that have that encoding. For this reason, knowing a font's encoding is very important.
Unfortunately, the computer's operating system tells us very little about the encodings of fonts, least of all for those containing math symbols. So, we have to keep our own knowledge of which encoding each font has. Of course, as people can create their own fonts, the set of font encodings is open-ended.
Extension Scenarios
Font information may be extended in the following ways:
Define that a font has a given existing encoding.
Define (or override) the PostScript font name of a font.
Define a new encoding and specify that certain font(s) have that encoding.
Define new MTCode values and their attributes (description, default style, token type) and use them in a new encoding.
Determining What MathFlow Knows
Since the font information extension mechanism we are talking about also applies to MathType it can be very useful to do the setup with MathType first to verify that the proper syntax has been used in the configuration file. When the file is finished, you can then copy the file and place it in the same directory as the DLL.
If you have a font installed on your computer MathFlow does not seem to know anything about, the first thing to do is to verify this is the case. The easiest way to do this is via MathType Insert Symbol dialog. Follow these steps after opening MathType (if you don't own MathType it's possible to install a trial):
Choose Insert Symbol from the Edit menu.
Choose Font in the View by menu.
Choose the font in question from the font menu just to the right of the View by menu.
Look at the Encoding name displayed directly under the character grid.
If the encoding name is "Unknown", it means our database of fonts and characters has no information for that font.
Assigning an encoding
Defining the encoding for a font falls into two cases:
The font's character set matches that of another font for which the software already knows the encoding. In this case, all you need to do is assign the same encoding to your font. This is described in the rest of this section.
The font's character set is completely unique. In this case, you will have to create a new encoding first, and then assign it to your font. See Creating a New Encoding.
Once you have decided to assign an encoding to a font, the next step is to determine if its character set matches that of a font for which our software already has an encoding. This is easy if you designed your own font as a perfect substitute for another font that our software already knows about. For example, if you created your own version of the Symbol font, its encoding would be the same as the Symbol font's — "Symbol". If you aren't in this lucky situation, you'll have to work a bit harder. There are a couple of ways to do this:
If you have access to the Internet, you can view our font encoding tables and try to find an exact match in terms of characters and their positions in the font. It would help if you display the font in question in MathType Insert Symbol dialog, as described in the previous section.
Display the font in question in Windows' Character Map utility. Use MathFlow Insert Symbol dialog to view other math fonts on your system, looking for an exact match in terms of characters and their positions in the font. If you find a match, write down the encoding name for the font.
If you found an encoding that matches your new font, you have to tell MathFlow about it by adding to the Fonts section of FontInfo.ini. See The FontInfo.ini file and Font sections for details. If you assign an encoding to a font, you should consider letting our tech support department know so we can add it to the built-in font knowledge of the next version of our software. Just send an email to support@wiris.com and mention the font name and the encoding name (please be precise). If you can, send us a copy of the font and any other information associated with it. If it is a commercially available font, let us know who makes it.
Defining a PostScript name
As stated earlier, our software produces Encapsulated PostScript (EPS) files. These must refer to fonts using the PostScript names of fonts, not their operating system name (the font names listed in MathType and other applications' dialogs and menus. So, for example, "Times New Roman" must be referred to as "Times-Roman" in an EPS file. Unfortunately, this is another piece of information that the operating system doesn't give up easily.
On Windows, our software can generally communicate with the operating system to get the PostScript name for any PostScript font. Names obtained this way are the correct ones for use in an EPS file. For TrueType fonts used both for the screen and printing, our software can often, but not always, obtain the PostScript name from the TrueType font. In order to handle exception cases, however, you can set the PostScript name for a font by adding to the Fonts section of FontInfo.ini. See The FontInfo.ini file and Font sections for details.
Creating a new encoding
Creating a new font encoding is easy, but a bit tedious. Font encodings are defined using a text file that is placed in the same directory as the MathFlow DLL. For an example of what this file should look like, see Font encoding example.
The filename should follow these rules:
It must be unique within the directory that contains the MathFlow DLL.
It should end with an .enc extension.
It should be indicative of the name of the encoding (or identical to it).
The first line of the encoding file defines the name of the encoding. Here is an example:
FontEncoding, 1.0, Byte, Symbol
Your encoding file's first line must start with exactly "FontEncoding, 1.0", which identifies this file as a font encoding file whose version number is 1.0. The rest of the line consists of either Byte or Word. This means that the MTCode value will be either two or four hexadecimal characters. For example, if you use Byte, then an MTCode value is expressed as (for example) AE. If you use Word, then that same number must be expressed as 00AE. The last part of this line is the name of your encoding. So for example, instead of "Symbol", you should use the name of your encoding (which is also what the name of the file should be, followed by .enc). The name of an encoding should be alphanumeric characters without any spaces or punctuation and starting with a letter (e.g. DatapageMath3).
Any blank lines are ignored. Any line that starts with # is a comment. For example:
# Purpose: Symbol font encoding
The rest of the file must contain lines that define the characters in the encoding. For example,
28,226E,Not less-than
The three fields are (from left to right):
The position within the font as a hexadecimal value. This is shown in the MathType Insert Symbol dialog for the selected character in the Font position readout to the right of the character grid.
The MTCode (Unicode) value that uniquely defines a character. You can get this by using MathType Insert Symbol dialog to find the character in another font, then reading the value in the Unicode readout to the right of the character grid. Alternatively, you can find it in either MTCode Encoding Tables or the Font Encoding Tables.
A human-readable character description. This does not define the character's description but it must match the description in the MTCode tables. This redundant information helps avoid many errors.
Caution
Note: These lines must be in order by the position in the font (the first field).
Font encoding example
FontEncoding, 1.0, Byte, MathPi4 # # This is an example of a font encoding file. # 20,0020,Space 21,22D0,Double subset 22,22C2,N-ary intersection 23,2286,Subset of or equal to 24,2287,Superset of or equal to 25,E915,Approximate subset of 26,E917,Subset of with dot; is included in as sub-relation 27,2283,Superset of 28,E971,Subset over subset 29,E970,Superset over superset 2A,E916,Superset of with dot; includes as sub-relation 2B,E972,Superset over subset 2C,2282,Subset of 2D,2034,Triple prime 2E,2283,Superset of 2F,2044,Fraction slash 30,2033,Double prime 31,002B,Plus sign 32,002D,Hyphen-minus 33,00D7,Multiplication sign 34,00F7,Division sign 35,003D,Equals sign 36,00B1,Plus-minus sign 37,2213,Minus-plus sign 38,00B0,Degree sign 39,2032,Prime 3A,22C3,N-ary union 3B,2282,Subset of 3C,22C3,N-ary union 3D,2207,Gradient (nabla) 3E,22C2,N-ary intersection 3F,00B7,Middle dot 40,22D1,Double superset 41,0391,Greek capital letter Alpha 42,0392,Greek capital letter Beta 43,03A8,Greek capital letter Psi 44,0394,Greek capital letter Delta 45,0395,Greek capital letter Epsilon 46,03A6,Greek capital letter Phi 47,0393,Greek capital letter Gamma 48,0397,Greek capital letter Eta 49,0399,Greek capital letter Iota 4A,039E,Greek capital letter Xi 4B,039A,Greek capital letter Kappa 4C,039B,Greek capital letter Lamda 4D,039C,Greek capital letter Mu 4E,039D,Greek capital letter Nu 4F,039F,Greek capital letter Omicron 50,03A0,Greek capital letter Pi 51,0398,Greek capital letter Theta 52,03A1,Greek capital letter Rho 53,03A3,Greek capital letter Sigma 54,03A4,Greek capital letter Tau 55,0398,Greek capital letter Theta 56,03A9,Greek capital letter Omega 57,03D0,Greek beta symbol 58,03A7,Greek capital letter Chi 59,03D2,Greek upsilon with hook symbol 5A,0396,Greek capital letter Zeta 5B,2208,Element of 5C,2205,Empty set 5D,220B,Contains as member 5E,E914,Approximate superset of 5F,E973,Subset over superset 60,221E,Infinity 61,03B1,Greek small letter alpha 62,03B2,Greek small letter beta 63,03C8,Greek small letter psi 64,03B4,Greek small letter delta 65,03B5,Greek small letter epsilon 66,03C6,Greek small letter phi 67,03B3,Greek small letter gamma 68,03B7,Greek small letter eta 69,03B9,Greek small letter iota 6A,03BE,Greek small letter xi 6B,03BA,Greek small letter kappa 6C,03BB,Greek small letter lamda 6D,03BC,Greek small letter mu 6E,03BD,Greek small letter nu 6F,03BF,Greek small letter omicron 70,03C0,Greek small letter pi 71,03D1,Greek theta symbol 72,03C1,Greek small letter rho 73,03C3,Greek small letter sigma 74,03C4,Greek small letter tau 75,03B8,Greek small letter theta 76,03C9,Greek small letter omega 77,03D5,Greek phi symbol 78,03C7,Greek small letter chi 79,03C5,Greek small letter upsilon 7A,03B6,Greek small letter zeta 7B,2208,Element of 7C,2223,Divides 7D,220B,Contains as member 7E,221D,Proportional to A2,2289,Neither a superset of nor equal to A3,2288,Neither a subset of nor equal to A7,03C2,Greek small letter final sigma AB,03B5,Greek small letter epsilon AD,2202,Partial differential AE,2285,Not a superset of B0,2260,Not equal to B5,2284,Not a subset of BE,2285,Not a superset of C0,2285,Not a superset of C3,03D6,Greek pi symbol C6,03F0,Greek kappa symbol C9,2284,Not a subset of D2,2209,Not an element of D3,2209,Not an element of D4,220C,Does not contain as member D5,220C,Does not contain as member D6,2285,Not a superset of DC,2288,Neither a subset of nor equal to DD,2289,Neither a superset of nor equal to DE,2260,Not equal to F2,2284,Not a subset of F7,2284,Not a subset of F8,22C3,N-ary union F9,22C2,N-ary intersection FA,2284,Not a subset of FB,019B,Latin small letter lambda with stroke
Adding new characters
Although we have attempted to put every character into the MTCode encoding that is in common use by mathematicians and scientists, this is a goal that we can approach but never reach. If in the process of creating a new encoding, you come across characters that are not in MTCode (i.e. not in the Unicode Specification or in the MTCode Encoding Tables), you have two choices:
Define the character in question as "undefined". This means MathFlow will not know the identity of the character. This is acceptable in some situations. To make a character "undefined", you would use a line like this:
35,F700,Unknown character
Contact us to have the character added to MTCode. To do this, send email to our tech support department at support@wiris.com and provide the following information:
an example of the character as a GIF file (send a PDF or screen shot of a good-quality rendition of the character and, if possible, a page of math in which it occurs);
what font it occurs in (if possible, send us the font itself);
a suggested name for the character;
any additional information on how the character is used (e.g., is it a binary operator like '+', etc.).
We will assign a new MTCode value to the character and tell you how to proceed from there.
The FontInfo.ini file
This file will be in Windows initialization file format and consist of:
Comment lines with version, author, and copyright info.
Multiple [Font<num>] sections, each of which contains information for a specific font, including its encoding and PostScript names.
An [Encoding] section that, for a given encoding, specifies the file name of the encoding definition file.
An [MTCode] section that, for a given MTCode value, specifies its attributes.
As mentioned above, after you create this file, save it into the same directory as the MathFlow DLL.
Font sections
For each font for which additional information (i.e. PostScript name and/or encoding) is to be specified, the FontInfo.ini file must include a [Font<num>] section (<num> is simply an integer to make the section names unique; e.g. Font1, Font2). Each such section may contain one or more key/value pairs, of which, only the OSName key is required to identify the font whose attributes are being overridden. Any omitted keys will cause the corresponding value to be determined in the default manner described in the earlier sections of this document.
Name = <operating system font name>
Identifies the font being described by the other keys in the section. The <operating system font name> value is the name of the font as it appears in MathFlow style dialog. This key is required.
Encoding = <encoding name>
Identifies the encoding of the font. The <encoding name> value must be the name of a built-in encoding or one defined in the [Encoding] section (see below for details).
PSName<num> = <style> , <PostScript name>
Where <num> is simply an integer, to make the key names unique (e.g. PSName1, PSName2). The <style> value must be P for plain, B for bold, I for italic, or BI for bold-italic. The <PostScript name> is the name to be used in the EPS file.
The Encoding Section
Contains lines of the form:
<encoding name> = <encoding definition file>
where <encoding name> defines an encoding (or overrides a built-in encoding) using the data in the file referred to by <encoding definition file>. This file, whose extension is .enc, must exist and the encoding name in this line must match that stored in the file.
The MTCode Section
It is unlikely that you will need to use this section, but we provide the information about this section for completeness. This section contains lines of the form:
<MTCode value in hex> = <token type>,<default style>,<description>
where <token type> is NONE, NUM, VAR, FUNC, OPER, BINOP, RELOP, OPEN, CLOSE, FENCE, PUNCT, INNER, CTRL, or SPACE.
where <default style> is NONE, TEXT, FUNCTION, VARIABLE, LCGREEK, UCGREEK, SYMBOL, VECTOR, NUMBER, USER1, USER2, MTEXTRA, or TEXT_FE. (This is only used by MathFlow
where <description> is a human-readable description for a given character.