The file contains 1156 records, one for each diacriticized character combination found in either file. Each record contains seven columns, separated by the semicolon (;) character. The columns are as follows:
WARNING: An earlier version of this page claimed that the JACKPHY frequency appeared in column 5, and the MUMS Books frequency appeared in column 6. That was incorrect.
The MUMS Books database contains 3,279,507 records and 1,353,406,304
characters. There are 9,948,061 characters with diacritics, of which
57,255 exhibit more than one diacritic.
The JACKPHY database contains 335,589 records and 139,542,423 characters.
There are 2,473,467 characters with diacritics, of which 415 exhibit
more than one diacritic.
The data should not be accepted uncritically, as transcription
errors are almost certainly present: for example, the 16 occurrences
of LATIN SMALL LETTER A WITH ACUTE AND ACUTE almost certainly
represent a double-acute mark rather than vertically stacked
acute accents.
The data were provided by the courtesy of James Agenbroad of the
Library of Congress, and were typed in and massaged by
John Cowan
<cowan@ccil.org>.