by Kris Shaffer
By far, the slowest part of this project is encoding the poetry in a way that allows us to analyze its sound computationally. The IPA Unicode tool helps a lot, but translating German text to IPA, and then encoding that IPA as digital text, is a long, slow job.
So we decided to speed it up.
To start, Jordan compiled a list of 50 of the most common words in German, along with their IPA translation. (Leigh has also created an ordered list of the most common words in Schubert’s songs which will serve as the basis for further growth of the dictionary.) Then this morning, Jordan, David, and I wrote a translator script in Python. This script takes a text file containing a German poem, checks each word against Jordan’s German-to-IPA dictionary, and if the word is in the dictionary, it replaces it with its IPA equivalent. It even strips punctuation and accounts for capitalization.
The script is really simple. All you need is a text file with a German poem (here’s “Nacht und Träume” if you want a sample), the German-to-IPA dictionary, and this script. Be sure they are all in the same folder. Then go to the last three lines of the script and update the sourceFile and outputFile names to suit your needs. Finally, run the script. At the terminal, run:
Or if you use a program like TextMate for Mac to edit the file names in the script, simply save the script and type command-R to run it from within TextMate.
Here’s what the German text for “Nacht und Träume” looks like going into the script:
Heil’ge Nacht, du sinkest nieder;
Nieder wallen auch die Träume
Wie dein Mondlicht durch die Räume,
Durch der Menschen stille Brust.
Die belauschen sie mit Lust;
Rufen, wenn der Tag erwacht:
Kehre wieder, heil’ge Nacht!
Holde Träume, kehret wieder!
And here’s what the output looks like:
ha:Il.gə naχt du zIŋ.kəst ni.dəʁ
ni.dəʁ wallen a:ʊχ di Träume
vi da:In Mondlicht dʊɾχ di Räume
dʊɾχ deʁ Menschen stille Brust
di belauschen zi mIt Lust
Rufen wɛn deʁ Tag erwacht
Kehre wieder ha:Il.gə naχt
Holde Träume kehret wieder
Note that not every word is translated, only those in the dictionary. However, even just getting 20% of the words out of the way will save a good chunk of time. And as the dictionary grows, it will speed up the process even more.
For now, we’re adding words to the dictionary manually, focusing on those that are the most frequent in the poems we’re studying. However, in a future stage, we hope to write a dictionary builder — a script that will analyze fully translated songs for German-IPA word pairs and then add them to the dictionary. Then, every time we finish an IPA translation, we run the dictionary builder and add words to the dictionary, speeding up all of our subsequent translations in the process.
There’s one thing that this translator won’t be able to do, though: stress. While multi-syllable words have stress patterns that we can encode in the dictionary, every poem has single-syllable words, and their poetic stress is dependent on the meter of the poem and the arrangement of words within that meter. For example, “und” might be stressed one time in the poem and unstressed the next. So we’ll never be able to simply go straight from a German poem to statistical analysis of phonemical structures that are music- and stress-sensitive. There will always be some human intervention with the IPA text. However, having a translator that automatically generates IPA for a large number of words in each poem will certainly help us move a lot faster as we build our corpus of nineteenth-century German poems.
Feel free to try out the script. If not for computational analysis, maybe for that next German Diction assignment!
And we’ll always welcome new contributions to the dictionary.