Embedded data in PDF

An important feature of MathType is support for PDF graphics. This document describes useful equation data that MathType embeds in PDF graphics it generates.

MathType embeds data in the PieceInfo dictionary associated with the page object in an equation PDF. The PieceInfo dictionary contains two sub-dictionaries. One in the EGO dictionary, and contains data required by the EGO specification for editing MathType equations. The other is a DesignScience dictionary, which contains metric data, as well as data describing the equation content.

The individual data item keys and types are:

Data Item KeyData Type
BaselineCGPDFReal. -- in points
MTEFCGPDFStreamRef
LeftSideBearingCGPDFReal (optional)
RightSideBearingCGPDFReal (optional)
MathMLCGPDFStreamRef (optional)
creatorCGPDFStringRef
descentCGPDFReal (value in points)
tactileRectCGPDFArrayRef<CGPDFReal> [llx, lly, urx, ury] (values in points)

The embedding hierarchy is summarized here:

  • pdfdoc
    • pdfpage
      • PieceInfo (dictionary)
        • DesignScience (dictionary)
          • LastModified (required date string)
          • Private (required data - dictionary)
            • Baseline
            • MTEF
            • LeftSideBearing
            • RightSideBearing
            • MathML
        • EGO (dictionary)
          • LastModified (required date string)
          • Private (required data - dictionary)
            • creator
            • descent
            • tactileRect

This scheme follows the recommendations for embedding private data in the PDF 1.3 spec.

Note that the Baseline, LeftSideBearing and RightSideBearing fields are redundant with the descent and tactileRect fields in the EGO data. However, the EGO data is essentially prescribed, as it should really be isomorphic to the EGO data now embedded in PICT comments. At the same time, it is appealing to group the DSI quantities together under these names as this is what we do in other formats. Since the duplication is limited to 4 real numbers, this seems acceptable.