Thursday, June 23, 2011

New Data Blog: Data in Colour

I've been waiting for a while to announce my new data blog, but here it is now.


All my data projects will now be posted there. Non-statsy posts will appear here as per usual.

End of Entry

Sunday, June 19, 2011

Chinese and Entropy

I happened to have a cold that day, and was trying to write a brief note to a friend canceling an appointment that day. I found that I couldn't remember how to write the character , as in da penti 打喷嚔 "to sneeze". I asked my three friends how to write the character, and to my surprise, all three of them simply shrugged in sheepish embarrassment. Not one of them could correctly produce the character. Now, Peking University is usually considered the "Harvard of China". Can you imagine three Ph.D. students in English at Harvard forgetting how to write the English word "sneeze"?? Yet this state of affairs is by no means uncommon in China. -- David Moser

I came across David Moser's essay on why Chinese is hard a while ago, and found it quite entertaining. Moser points out that Chinese is not only difficult for non-native speakers, it is also difficult for native-born Chinese as well. Among the reasons Moser thinks Chinese is hard are that the language is not phonetic, it has no alphabet, and it has a god-awful dictionary system.

All of these reasons are somewhat consequences to the fact that Chinese is much denser than English. I would hypothesize a Chinese text to have a higher entropy than its English equivalent (although this is probably pretty difficult to measure). Even in speech, one can convey the same information in fewer syllables in Chinese than in English.

I think one reason why Chinese is so terse is that historically, Chinese people commended terseness. Terseness is associated with wisdom. One can imagine followers of an old master asking a long list of questions, only to receive a single, one-syllable word in response.

End of Entry