DolorousStroke
Active Member
Given the current interest in alliteration, I wondered if it would be easy to write something which would try to identify alliterative hot spots in the text.
I settled on a Java program which uses an electrostatic energy density model--that is, each letter has a repulsive power against other same letters, and same-letters closer together create more energy as 1/r. More alliteration. I then increased the charge of letters at the start of words. I brutely converted sh --> ʃ and th --> þ (conflating ash and thorn) to avoid treating "though" and "toe" as alliterative; both of these brute conversions ignore real phonetics, thus ignoring e.g. asSHat and fooTHill. I also kept consecutive doubled letters from reacting against each other (e.g. piPPin), although this also ignores words where doubled letters are doubly pronounced, e.g. haTTrick.
Since this is alphabetical and not phonetic, it completely misses alliterations like “cellar sale” or “German jelly.”
The word "to" is overpowered: e.g. "to try to" gets you a high score; in general, probably the greatest weakness is taking short particle words too seriously. I changed ; and : to be sentence breaks (on the theory that alliteration likely doesn't continue past hard punctuation and shorter phrases are better tests). The density metric is Euclidean square root, to avoid punishing longer sentences. I set a minimum of 25 characters for sentence consideration, so as not to favor shorter sentences. I didn't consider vowel alliteration (if that's a thing). The sentence tagger and token tagger are Apache’s OpenNLP. The part-of-speech tagger is available but I didn’t use it (although it could be used to exclude or deweight the infinitive particle “to”, for example).
There are an infinite number of other improvements that could be made. Particularly, using the CMU Pronouncing Dictionary to actually get real alliterative phonemes (e.g. separating ash and thorn; ph as f, etc.):
The problem with that is it won't have many Middle-earth terms or perhaps oft-used words, which will show up unmodified, artificially decreasing alliteration.
Also, additional electrostatic charge should be given not only to letters at the start of words, but at the start of syllables. This would require syllabification, which could be done with the CMU library or also e.g.
or e.g. using the old TeX typesetting hyphenation algorithm (which I've seen implemented in JavaScript somewhere).
You could also give some electrostatic repulsion to nearby sounds (on some metric) like k and g.
With that said, the next post (response-to-this) will be the most-alliterative (by the above process) sentences in Fellowship of the Ring, flawed as such list might be by the above weaknesses.
Code available upon request. This has been somewhat debugged, but there are a few scores that don’t look quite right (even by the limited terms here); eg “heed no nightly noises” below with two separate scores. Glad to answer some questions.
I settled on a Java program which uses an electrostatic energy density model--that is, each letter has a repulsive power against other same letters, and same-letters closer together create more energy as 1/r. More alliteration. I then increased the charge of letters at the start of words. I brutely converted sh --> ʃ and th --> þ (conflating ash and thorn) to avoid treating "though" and "toe" as alliterative; both of these brute conversions ignore real phonetics, thus ignoring e.g. asSHat and fooTHill. I also kept consecutive doubled letters from reacting against each other (e.g. piPPin), although this also ignores words where doubled letters are doubly pronounced, e.g. haTTrick.
Since this is alphabetical and not phonetic, it completely misses alliterations like “cellar sale” or “German jelly.”
The word "to" is overpowered: e.g. "to try to" gets you a high score; in general, probably the greatest weakness is taking short particle words too seriously. I changed ; and : to be sentence breaks (on the theory that alliteration likely doesn't continue past hard punctuation and shorter phrases are better tests). The density metric is Euclidean square root, to avoid punishing longer sentences. I set a minimum of 25 characters for sentence consideration, so as not to favor shorter sentences. I didn't consider vowel alliteration (if that's a thing). The sentence tagger and token tagger are Apache’s OpenNLP. The part-of-speech tagger is available but I didn’t use it (although it could be used to exclude or deweight the infinitive particle “to”, for example).
There are an infinite number of other improvements that could be made. Particularly, using the CMU Pronouncing Dictionary to actually get real alliterative phonemes (e.g. separating ash and thorn; ph as f, etc.):
The problem with that is it won't have many Middle-earth terms or perhaps oft-used words, which will show up unmodified, artificially decreasing alliteration.
Also, additional electrostatic charge should be given not only to letters at the start of words, but at the start of syllables. This would require syllabification, which could be done with the CMU library or also e.g.
or e.g. using the old TeX typesetting hyphenation algorithm (which I've seen implemented in JavaScript somewhere).
You could also give some electrostatic repulsion to nearby sounds (on some metric) like k and g.
With that said, the next post (response-to-this) will be the most-alliterative (by the above process) sentences in Fellowship of the Ring, flawed as such list might be by the above weaknesses.
Code available upon request. This has been somewhat debugged, but there are a few scores that don’t look quite right (even by the limited terms here); eg “heed no nightly noises” below with two separate scores. Glad to answer some questions.
Last edited: