<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Netflix Prize:  Good science or not?</title>
	<atom:link href="http://mendicantbug.com/2007/12/14/netflix-prize-good-science-or-not/feed/" rel="self" type="application/rss+xml" />
	<link>http://mendicantbug.com/2007/12/14/netflix-prize-good-science-or-not/</link>
	<description>Wanderings into computational linguistics, science, social media and life...</description>
	<pubDate>Fri, 09 Jan 2009 13:22:08 +0000</pubDate>
	<generator>http://wordpress.org/?v=MU</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: MT Eval with Binary Comparisons &#171; The Mendicant Bug</title>
		<link>http://mendicantbug.com/2007/12/14/netflix-prize-good-science-or-not/#comment-815</link>
		<dc:creator>MT Eval with Binary Comparisons &#171; The Mendicant Bug</dc:creator>
		<pubDate>Tue, 13 May 2008 04:53:56 +0000</pubDate>
		<guid isPermaLink="false">http://mendicantbug.com/2007/12/14/netflix-prize-good-science-or-not/#comment-815</guid>
		<description>[...] 12 May 2008 in collaborative filtering, computational linguistics, machine learning, machine translation, machine translation evaluation, mt, mt eval, rankings, recommender systems   The standard way of doing human evaluations of machine translation (MT) quality for the past few years has been to have human judges grade each sentence of MT output against a reference translation on measures of adequacy and fluency.  Adequacy is the level at which the translation conveys the information contained in the original (source language) sentence.  Fluency is the level at which the translation conforms to the standards of the target language (in most cases, English).  The judges give each sentence a score for both in the range of 1-5, similar to a movie rating.   It became apparent early on that not even humans correlate well with each other.  One judge may be sparing with the number of 5&#8217;s he gives out, while another may give them freely.  The same problem crops up in recommender systems, which I have talked about in the past. [...]</description>
		<content:encoded><![CDATA[<p>[...] 12 May 2008 in collaborative filtering, computational linguistics, machine learning, machine translation, machine translation evaluation, mt, mt eval, rankings, recommender systems   The standard way of doing human evaluations of machine translation (MT) quality for the past few years has been to have human judges grade each sentence of MT output against a reference translation on measures of adequacy and fluency.  Adequacy is the level at which the translation conveys the information contained in the original (source language) sentence.  Fluency is the level at which the translation conforms to the standards of the target language (in most cases, English).  The judges give each sentence a score for both in the range of 1-5, similar to a movie rating.   It became apparent early on that not even humans correlate well with each other.  One judge may be sparing with the number of 5&#8217;s he gives out, while another may give them freely.  The same problem crops up in recommender systems, which I have talked about in the past. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ian&#8217;s Blog &#187; Netflix Prize - is RMSE a good measurement?</title>
		<link>http://mendicantbug.com/2007/12/14/netflix-prize-good-science-or-not/#comment-373</link>
		<dc:creator>Ian&#8217;s Blog &#187; Netflix Prize - is RMSE a good measurement?</dc:creator>
		<pubDate>Mon, 17 Dec 2007 18:20:06 +0000</pubDate>
		<guid isPermaLink="false">http://mendicantbug.com/2007/12/14/netflix-prize-good-science-or-not/#comment-373</guid>
		<description>[...] I think a user might have a hard time noticing such a difference in performance. I&#8217;m not the first to express concerns of this [...]</description>
		<content:encoded><![CDATA[<p>[...] I think a user might have a hard time noticing such a difference in performance. I&#8217;m not the first to express concerns of this [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Chris</title>
		<link>http://mendicantbug.com/2007/12/14/netflix-prize-good-science-or-not/#comment-367</link>
		<dc:creator>Chris</dc:creator>
		<pubDate>Sat, 15 Dec 2007 16:50:55 +0000</pubDate>
		<guid isPermaLink="false">http://mendicantbug.com/2007/12/14/netflix-prize-good-science-or-not/#comment-367</guid>
		<description>Bob Carpenter at Alia-i and Hal Daume at his NLPer blog also blogged about this.  I'll leave the quality assessment up to you guys since you're far better qualified, but I'll say this:  rarely does anything involving NLP get this kind of press.  There's no such thing as bad press, right?</description>
		<content:encoded><![CDATA[<p>Bob Carpenter at Alia-i and Hal Daume at his NLPer blog also blogged about this.  I&#8217;ll leave the quality assessment up to you guys since you&#8217;re far better qualified, but I&#8217;ll say this:  rarely does anything involving NLP get this kind of press.  There&#8217;s no such thing as bad press, right?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
