mdc2html : a Manuel de Codage to HTML converter

What mdc2html is

How to get it

How to use it

Examples

How to use it

mdc2html has two basic ways of working.

  1. mdc2html can be used to create web pages containing hieroglyphs. You first write the web page, embedding manuel de codage where you want the hieroglyphs to appear. You then run mdc2html to convert the web page. mdc2html simply copies most of the file, but converts embedded MdeC text to hieroglyphs.
  2. mdc2html can also be used for visualizing an existing text file that contains MdeC. Manuel de codage is a standard format for sending hieroglyphs in ASCII. Thus Egyptologists often use it in their emails. If you pass such a text file through mdc2html it generates a skeleton HTML file containing the hieroglyph version of the text, which you can then view with your web browser.
Examples of both uses can be seen on the "Examples" page. Below you can find details on how to run mdc2html and the various options.

Running mdc2html.pl

mdc2html.pl reads a file and writes HTML to the standard output. Thus to generate an HTML file from the text file my_glyphs.txt you would use:

perl mdc2html.pl my_glyphs.txt > my_glyphs.html

In Unix the mdc2html.pl script can be executed directly provided you edit the script so that the path "#!/usr/bin/perl" in the initial line reflects the location of perl on your system. Under Windows the script must be run as "perl mdc2html.pl" in an MS-DOS window. (Assuming that the perl binary directory is in $PATH, otherwise you must use the full pathname of perl.)

The mdc2html.conf file

The mdc2html system includes a file called mdc2html.conf. This contains definitions and option settings used at runtime. In most cases the defaults will be fine, but if ever mdc2html does not do quite what you want it to, the first thing to do is look in the mdc2html.conf file to see if there is an option you can tweak. It is a text file and you can edit it with your favourite text editor.

The most important options in mdc2html.conf are described below, Others are documented in the comments in the file itself (in the configuration file lines starting with "# " are comments).

The configuration file has four sections:

  1. The "Environment" section which contains general information and option settings. For instance, you can indicate the directory in which your GIFs are stored.

    Note that many of the entries here are comments, this is because they are the default values which are used automatically by mdc2html anyway. However they are still documented in the configuration file.

    One entry that you might want to change is that of "BROWSER". Currently some versions of Netscape do not display the complex tables used by mdc2html in a completely satisfactory way. This is especially noticable where tables with backgound images are used, as in cartouches. If you set BROWSER=NS, this adds some tweaks to improve the appearance under Netscape (but also diminishing it a bit under Internet Explorer). If you use Netscape you may wish to try this, but it is not a perfect solution and it is hpoed to work around the problrm in future versions of mdc2html.

  2. The "Substitutions" section. This lists "aliases" for GIFs.

    The actual GIF images are named using their "Gardiner" names. For example the owl is G17.gif. In the environment section you will find the line: m=G17, which allows you to write m in the MdeC where you want an owl. Most of these aliases are standard, but you could add your own. (Note that you can only alias single hieroglyphs, not words that are written with several hieroglyphs.)

    You can also use this facility to alias Gardiner codes. For instance if you have a text that includes an obscure hieroglyph that does not have an image in the mdc2html library, then you can add an alias to a standard hieroglyph that is roughly similar. For example A316=A1 substitutes a plain man (A1) for a man with a stick(A316).

  3. The "GIFsizes" section. This has an entry for each GIF in the library in the format: Sign=WidthxHeight eg G17=38x41. This information is stored and used to calculate scaling factors where signs are grouped together.

    Note that in version 2 there are now many more compound GIFs. These are needed where the glyphs overlap since these cannot be represented by two separate images in a simple table format. Thus D&d is used for rather than D:d.

    You will only want to change this section if you add your own images to the image library.

  4. The "Compounds" section. This is new in version 2.03 and has an entry for each compound GIF in the library in the format:
        Compound=Spelling  
     eg a&b&t=a:(b*t)
    
    This information is stored and used to detect cases where words have a compound GIF that represents them, but where the compound has not been written explicitly. In such cases the compound is automatically substituted.

    This new feature can be switched off by setting the option "NO_COMPOUNDS=Y".

An example of part of the standard configuration file is:
#mdc2html configuration file

#Environment
#---------------------------------------------
#IMG_DIR_B=hgifs/
#IMG_DIR_R=hgifr/
#IMG_SUFFIX=gif
#BGCLOR=white
...
#Substitutions : Alias=Image
#---------------------------------------------
x=Aa1
M=Aa13
Ab=U23
...
#GIFSizes - Sign=WidthxHeight
#---------------------------------------------
#   Needs an entry for each GIF
A1=31x40
A10=40x42
A11=28x40
...

In the released configuration file you will find a section labelled "Weni". The text "autobiography of Weni" was used as an example in testing, and this section has some substitutions for hieroglyphs in that text that are not in the GIF library. For instance the first entry is P13=P1 which substitutes a boat with a rudder (P1) for the one without a rudder (P13) that is actually used in the text.

Setting the Configuration on the Command Line

To facilitate "one-off" modifications, you can supply configuration file entries on the command line. These override the values in the file, and you can supply a number of such changes together. Environment changes are entered exactly as in the configuration file, but for substitutions you have to prefix "GIF_" to the entry that would appear in the configuration file. For example:

perl mdc2html.pl BGCLOR=blue GIF_P13=P1 weni.txt > weni.html

would run mdc2html setting the HTML background to blue and using P1 for P13.

Error Handling

If mdc2html can't make sense of a piece of text that it is translating from MdeC format it:
  • Prints a message consisting of "??" and the offending text fragment to stderr.
  • Inserts "?Tr:-" plus the text in the HTML file.
Note that mdc2html may sometimes attempt to translate sections of text that are not in fact in MdeC format - for example lines ending in "!". This will produce spurious errors (to fix it in this case, just append a space to the "!").

If mdc2html is checking for the presence of GIF files and one is missing it:

  • Prints "** Missing GIF" followed by the (Gardiner) name of the GIF to stderr.
  • Puts a large "?" in the HTML in place of the missing GIF. It also puts the Gardiner code underneath for identification purposes.

Limitations of mdc2html

  • mdc2html maps text to HTML and is thus limited to left-to-right processing. It cannot produce right-to-left or vertical output.
  • mdc2html uses HTML tables to format the lines of glyphs. This is a fairly crude mechanism and does not allow arbitrary positioning of glyphs.
  • mdc2html uses a fixed library of glyphs, therefore glyph manipulation such as scaling, rotation and superposition are not supported.
Overall mdc2html is useful for visualising the words of a text, but does not support the "professional" features of the Manuel that allow the actual appearance of a text to be reproduced.

mdc2html is an experimental system and therefore has a number of other current limitations:

  • Its GIF library was generated by running latex2html on the sign list from Serge Rosmorduc's sesh documentation. It has been polished since, but could still do with improvement. Note that you can use any library of GIFs you want with mdc2html provided that you fill in the GIFsizes section of the configuration file. Note also that if you find an obscure glyph that is not in the library, one option is to obtain a GIF from somewhere (you could scan it in) and just add it to the library.
  • A number of features have been omitted or only partially implemented. Consult the status list to see details of which features of the Manual de Codage are supported.

  • My version of netscape dies when attempting to process large files produced by mdc2html To avoid this problem there is currently an arbitrary limit of 2000 glyphs that can be processed in one file.