I recently finished a literature review for my Language & Statistics 2 class. The topic was computational models of historical linguistics and my partner and I focused on cognate identification and phylogenetic inference. We split the work and my part was cognate identification. So I decided to blog about it for a bit and maybe someone out there will have something to offer. Granted, that won’t help my grade, but improving my understanding is more important. You can also check out our presentation.
First of all, to frame the problem, historical linguistics is a branch of linguistics that studies language change. Language can change in many ways, but the methods we looked at pretty much solely focused on phonological and semantic changes, with a few brief nods to syntactic change (on the phylogenetic inference side). The main tool used by historical linguists in reconstructing dead languages is the comparative method. This method looks at two languages suspected of being related and tries to infer the regular sound changes that led to the divergence. By examining lists of suspected cognates, they find sound correspondences — sounds that appear in similar contexts in both languages, but which aren’t necessarily the same phoneme. For example, the word for beaver in English and German derives from the Proto-Germanic word *bebru. In Old English, this became beofor (the f sounds like a /v/). In modern German, the word is Biber, with the /b/ phoneme preserved as it was in Proto-Germanic. So we could infer a sound correspondence between English /v/ and German /b/ in this context.
So what are cognates? If you have studied a second language, you no doubt have heard this term. I propose the following two classifications for cognates. A loose cognate will be a pair of words in two languages that is spelled or pronounced the same, with some minor variations. In this way, French resumé and English resumé would be considered cognates. Loose cognates have also been called orthographic cognates. A strict cognate is a pair of words in two related languages that descended from the same word in the ancestor language. Loan words are words that come into a language directly from another language, such as resumé. These words do not undergo the regular sound changes that are observed in strict cognates and so they are not considered cognates at all by historical linguists.
What is the effect the distinction between these two definitions would have on computational approaches to this task? I will look at this further in a future post, but feel free to post your thoughts in the comments.






5 comments
Comments feed for this article
9 December 2007 at 12:12:49
anileklavya
I have been thinking about this for a long time and have still not reached a conclusion. However, right now I am inclined to say that perhaps the strict cognates may be more useful for theoretical studies of language change etc., whereas loose cognates may be more useful for solving practical (computational) problems.
27 January 2008 at 12:07:12
suranah
Jason,
I was a bit curious about the literature survey presentation. It could be a good start for anybody thinking of delving into computational historical linguistics.
But the link you have provided is broken, nor could I find it on LS2 wiki. It would be helpful if you can check the path of that link again. Also, if you have a page for all the LS2 presentations and it could be made public, please provide that too. Thanks.
27 January 2008 at 14:36:25
Jason Adams
I have fixed the link above to the presentation (and copied the file to this site), so hopefully there won’t be any more issues. It’s always dangerous linking to content someone else controls. :)
You can also read our survey at http://www.cs.cmu.edu/~jmadams/adamsagarwal2007.pdf
9 December 2008 at 15:31:06
Taraka
Jason,
Is it fine if I can use some of your slides ,for my presentation ,on Cognate Identification? I want to use the part of the presentation especially on literature survey.
9 December 2008 at 15:48:05
Jason Adams
Please feel free to use the slides, Taraka. Just be sure to give credit. Also (though this is not necessary), I’d love to see your presentation when you’re done. Best of luck!