## Sunday, October 31, 2010

### Combinatorial Fun with Hawaiian.

I've been wondering recently about how many distinct words Hawaiian could have, if you consider a fixed number of syllables. With fewer sounds than English, it's tempting to think that it must have a smaller vocabulary. I'm going to use combinatorics (basically easy ways of counting large numbers) to estimate the number of distinct words of progressively larger syllable count and see how it compares to English. It's rather simple to calculate, as long as you take care with your linguistics.

To start off with, Hawaiian has 18 official sounds (represented visually using the letters a, ā, e, ē, i, ī, o, ō, u, ū, h, k, l, m, n, p, w, ʻ), 10 of which are vowels and 8 of which are consonants. It also has 15 diphthongs (iu, ou, oi, eu, ei, au, ai, ao, ae, ōu, ēi, āu, āi, āo, āe). Now, a syllable in Hawaiian always only consists of either a single vowel, or a consonant + vowel or diphthong, represented schematically by (C)V where the 'C' represents a consonant, the 'V' a vowel or diphthong, and the parentheses show that the consonant is optional (one side affect of this is that Hawaiian words always end in a vowel). The first question to be asked is, "How many syllables are possible?". We can see that there are 10 possible one-letter syllables consisting of the 5 vowels plus 5 long vowels, plus an additional 15 syllables comprised of a single diphthong, plus an additional 80 syllables formed by taking one of the 8 consonants and adding one of the 10 vowels to it, plus an additional 120 syllables formed by taking one of the 8 consonants and adding one of the 15 diphthongs. Not all of these syllables actually exist in common usage; for instance, the syllable wū does not exist in Hawaiian, and the syllable wu occurs only in two loan-words from English. However, for the purposes of this post I am calculating only how many words could theoretically exist, not how many actually do (which would require exhaustive knowledge of the language that I do not possess). So for single syllables, we have a total of $$10+15+80+120=225$$.

Now, since any word in Hawaiian will be made up of these syllables, we can quickly calculate how many distinct words of a given number of syllables would be able to exist. Since Hawaiian has nothing against duplication of sounds (and often rather encourages it), for a word of two syllables we may have any of the 225 for the first syllable, and any of the 225 for the second. Thus to find the total number of two-syllable words we just multiply those two numbers (or equivalently raise 225 to the nth power where n is the number of syllables in the word), for a total of $$225^2=50,625$$ words of two syllables. That's not bad in terms of vocabulary. Pretty much all words necessary for daily life plus quite a few extra would fit pretty nicely into that amount.

But Hawaiian utilizes many longer polysyllabic words, so if we expand our list to words of three syllables, we get $$225^3 = 11,390,625$$ distinct words. At this point we're already well over the vocabulary of even the most linguistically rich languages on Earth. Even using a rather lax definition of "word", the English language has around 1–1.5 million words at the most, the vast majority of which are scientific, legal, technical, medical, financial and other terms of generally non-everyday use.

But we don't need to stop here! Going to four syllables gives $225^4=2,562,890,625$ This is a mind-bogglingly big number when it comes to words. That's over two-and-a-half billion words, and that's not counting the one, two, and three-syllable words already formed. It's common to have words of four or five syllables in Hawaiian, which blows the realm of neologisms so far open that it would be nearly impossible to come up with concepts for all of them (for words of 5 syllables, the number is  1,078,203,909,375, over a trillion words!).

Conclusion: Although the sound range of Hawaiian may sound limited to the Anglophone ear with our 44-odd sounds, combinatorics shows that Hawaiian is capable of forming astronomically many times more words than even the most wordy languages on Earth. I'd do a similar calculation for English, but for the fact that it would be several orders of magnitude harder. This is not because of the greater number of sounds in English, but how they group together to form syllables. In Hawaiian, you can only have (C)V syllables. In English, syllables are of the form (CCC)V(CC). (This is of course a bit of an edge case – the only words like this I can think of off the top of my head start with “str”, such as “strings” and “strips”, each of which has 3 consonant sounds, 1 vowel sound, followed by an additional 2 consonant sounds.)

Now, while in Hawaiian any syllable can follow any other syllable and be pretty easily pronounceable, making all possible combinations of sounds would lead to some very difficult or impossible to pronounce words in English. Certain consonant sounds double easily or sound good together, while other do not. This is partly why, for instance we have words like “kitten” but not “kiththen” or “kixthen” in Modern English: languages tend to change in the direction of being easier to say quickly. An analysis of potential English words would have to take all this into account, which would require looking at all the sounds of English individually with respect to both preceding and following syllables – not an impossible task, but certainly a daunting and difficult one. (If I had to take a stab at it, I'd guess that English probably has more possible syllables than Hawaiian does, but within the same order of magnitude. I'm not motivated enough to actually try it though.)

And with that, I will close off this post. A hui hou!

December 4, 2010: Edited to fix some completely inexcusable mathematical mistakes. Turns out the numbers I got were orders of magnitude too low. They should be correct now.

August 30, 2011: Edited to fix some really elementary mistakes in naming numbers. It should be correct now.

August 1, 2014: Edited to correct the really basic mistake of forgetting that syllables can be made up of single diphthongs as well as single vowels (indeed, the Hawaiian word for "I" is simply "au"). This increased the base number of syllables, which had a domino effect of inflating all the further numbers. How did I ever manage that minor in Mathematics...