Programming basics for corpus linguists

Python installation

Python 3.6 from


Picking a programming editor that you are comfortable with is really important as it will help you write your code while minimizing frustration. Decent programming editors help you by highlighting structure and syntactic features of the programming code you are using. Typically, they come with presets for the major programming languages and can recognize the meaningful units of the language such as system words, structures etc. The recommendations we are making here are based on our own personal experience and preferences and are in no way complete or suggesting that there are no other good editors. They are merely suggestions to get people started.

Some examples of good programming editors are:

Notepad++ - available for Windows only, good regular expression support, useful as a standard replacement for the editors shipping with operating systems; recommendation to anyone having to deal with textual data, regardless of whether they are programmers or not. For an example of what can be done with Notepad++ in the philologies, you might want to consult my Analysing Faust tutorial.

Visual Studio Code - free, versatile and configurable editor, available for Windows, Linux and Mac OS. Lots of community support for different programming and editing tasks.

TextWrangler - an editor that is often used by Mac OS users. It is akin to the very powerful commercial editor BBEdit

Sublime Text - a very versatile editor that is available for different operating systems (Windows, Mac OS, Linux)

Skills to aim at

The important thing when it comes to programming is that you get started at all. It is best if you get started out of your own interest. Ideally, you are motivated by a task that interests you in a subject area you enjoy. An interest in processing and analyzing linguistic corpora is a great starting point.