IMPORTANT CAUTIONARY NOTE: Some of the settings described here are advanced settings that are only necessary under very specific circumstances. If you do not know what any of this entails then it is likely you will not need it. Do not needlessly fiddle with these settings, esp. not with the Code Page settings described further down unless you know what you are doing and what the potential consequences are. I take no responsibility for any damage you do to your system by following these instructions.
Remember: If it ain't broke, why fix it!
For the general user of modern GUI-based operating systems, the shell normally hardly ever plays a role except that it will occasionally pop up and display systems messages during certain installation steps. As a linguist, however, you frequently come across tools that require command line interaction. Unfortunately, the default Windows shell settings are not geared towards the professional user and especially not towards users working in multilingual settings. An effect of this is that the shell does not support UTF-8 encodings of textual data. The leads to the unfortunate effect that when you call up the TreeTagger from the command line and have its output written to the display for languages with special characters even as simple as German Umlauts or ß, these to not display correctly on the screen. This is not, however, due to a malfunction of the TreeTagger, but an effect of the interaction between shell and UTF-8 encoded data. The issue can be resolved by writing the output directly to a file, but sometimes one wants to just inspect initial output on the screen and for such cases you can make some modifications to the shell settings in order to fix these issues.
In order to achieve the desired effect, first of all change the standard font used by the Windows shell aka Command Prompt as well as the Windows PowerShell.
In order to ensure that all characters are correctly represented in a Windows shell, you need to ensure that the font used to display the output is a True Type Font (ttf) and that it is capable of representing all Unicode characters. Avoid the so-called Raster Fonts, esp. on high resolution screens as they may not display clearly. A good choice that displays very legibly – also for teaching and presentation purposes – is the TT Font Lucida Console
. There may not be any need to change these settings on your system, but if you find that special characters are not displaying correctly, experiment with the font settings in the Windows shell.
CAUTION: This should only be necessary under very specific circumstances! Do not fidle with code page settings lightly, this stuff can seriously mess up your system!
If the standard output (screen) of annotation tools such as the TreeTagger leads to faulty character representations, e.g. if Umlauts and other characters with diacritics, consider temporarily changing the code page. This is achieved by typing the following after opening the shell and before executing the TreeTagger:
chcp 65001
If this solves your problem, consider adding this line to the beginning of the batch file driving the automatic annotation, e.g. if Umlauts are not correctly displayed in TreeTagger output.
… to be continued …