by Kris Shaffer
We've reached a minor milestore in our coding efforts. While others in the group have been working hard to encode the text for a complete Schubert song cycle in IPA (the International Phonetic Alphabet), I've been writing software that will analyze various aspects of the sounds of that poetry, independent of the music. Today I finished that code. (for now...)
The core program is our poemAnalysis script. This script will take all of the IPA files in the texts folder and spit out several files for each poem, containing different kinds of basic statistical data. For now, we are just looking at vowels and categorizing each of them as open, open-mid, neutral, close-mid, and close — describing both their sound and the physicality of speaking or singing them. Our poemAnalysis script will calculate the probability of occurrence for each of these vowel types, outputting that data song-by-song, stanza-by-stanza, and line-by-line. The script allows us to choose whether we analyze every vowel, or only those that are stressed in the poetic meter. We can also choose whether we want to analyze both vowel phonemes in a diphthong, or skip the second one. (Since singers tend to sustain only the first vowel in a diphthong, we usually prefer the latter option.) The script is currently set up to conduct several of these analyses simultaneously, outputting the results of each set of options into individual files. (These go in the statOutput folder.)
These output files are in a format that makes for easy import into a statistical analysis application. I have recently begun learning the statistical programming language R. After just a couple weeks of work, I'm already amazed at how quickly and simply we can perform some of the statistical analyses we're interested in exploring. For example, a very short R script is also posted in our GitHub repository. This script imports the song-by-song data (produced by poemAnalysis), and it measures the correlation between each pair of songs in the corpus. This can tell us 1) how consistent poets are in their phonemical patterns, and 2) which poems stand out as having the most unique sound. Though they have not been fully proofread yet, the test analysis showed something interesting: the most unique song in the corpus so far is the one written by a different poet than all the rest! Of course, we need more (proofread) data before we can actually conclude anything, but it suggests that we might be able to find some differences in the way poets write — as well as the way composers set that poetry to music.
Finally, as our poemAnalysis script runs, it calculates how much the probability of occurrence for each vowel type changes — line-by-line and stanza-by-stanza. Then it flags moments where the change exceeds a certain threshold. (This is flexible, but we're currently looking for places where multiple vowel categories change in excess of two standard deviations.) Though there are some false positives (and probably false negatives), these flags are directing our attention to many interesting moments in the poems. In many cases, the moments where the sound changes substantially are also moments of metrical change or shifts in plot or the narrator's attitude towards something/someone. This is exactly what we want to see, especially if accompanied by musical changes, too. (We'll blog more details once the data is cleaner and more complete.)
We'll continue to provide updates as we go. In the meantime, feel free to download our data and scripts, and play around with them. Let us know if you find something cool!