Cognate Identification: Definition

Posted: 6 December 2007 in Uncategorized
Tags: , , , , , , ,

I recently finished a literature review for my Language & Statistics 2 class. The topic was computational models of historical linguistics and my partner and I focused on cognate identification and phylogenetic inference. We split the work and my part was cognate identification. So I decided to blog about it for a bit and maybe someone out there will have something to offer. Granted, that won’t help my grade, but improving my understanding is more important. You can also check out our presentation.

First of all, to frame the problem, historical linguistics is a branch of linguistics that studies language change. Language can change in many ways, but the methods we looked at pretty much solely focused on phonological and semantic changes, with a few brief nods to syntactic change (on the phylogenetic inference side). The main tool used by historical linguists in reconstructing dead languages is the comparative method. This method looks at two languages suspected of being related and tries to infer the regular sound changes that led to the divergence. By examining lists of suspected cognates, they find sound correspondences — sounds that appear in similar contexts in both languages, but which aren’t necessarily the same phoneme. For example, the word for beaver in English and German derives from the Proto-Germanic word *bebru. In Old English, this became beofor (the f sounds like a /v/). In modern German, the word is Biber, with the /b/ phoneme preserved as it was in Proto-Germanic. So we could infer a sound correspondence between English /v/ and German /b/ in this context.

So what are cognates? If you have studied a second language, you no doubt have heard this term. I propose the following two classifications for cognates. A loose cognate will be a pair of words in two languages that is spelled or pronounced the same, with some minor variations. In this way, French resumé and English resumé would be considered cognates. Loose cognates have also been called orthographic cognates. A strict cognate is a pair of words in two related languages that descended from the same word in the ancestor language. Loan words are words that come into a language directly from another language, such as resumé. These words do not undergo the regular sound changes that are observed in strict cognates and so they are not considered cognates at all by historical linguists.

What is the effect the distinction between these two definitions would have on computational approaches to this task? I will look at this further in a future post, but feel free to post your thoughts in the comments.

About these ads
Comments
  1. anileklavya says:

    I have been thinking about this for a long time and have still not reached a conclusion. However, right now I am inclined to say that perhaps the strict cognates may be more useful for theoretical studies of language change etc., whereas loose cognates may be more useful for solving practical (computational) problems.

  2. suranah says:

    Jason,

    I was a bit curious about the literature survey presentation. It could be a good start for anybody thinking of delving into computational historical linguistics.

    But the link you have provided is broken, nor could I find it on LS2 wiki. It would be helpful if you can check the path of that link again. Also, if you have a page for all the LS2 presentations and it could be made public, please provide that too. Thanks.

  3. Jason Adams says:

    I have fixed the link above to the presentation (and copied the file to this site), so hopefully there won’t be any more issues. It’s always dangerous linking to content someone else controls. :)

    You can also read our survey at http://www.cs.cmu.edu/~jmadams/adamsagarwal2007.pdf

  4. Taraka says:

    Jason,

    Is it fine if I can use some of your slides ,for my presentation ,on Cognate Identification? I want to use the part of the presentation especially on literature survey.

    • Jason Adams says:

      Please feel free to use the slides, Taraka. Just be sure to give credit. Also (though this is not necessary), I’d love to see your presentation when you’re done. Best of luck!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s