关于推荐算法中rmse和rank的大讨论

没有时间写长篇大论了,摘抄一段最近看的一个博客 RecSys 2016 - Part I.,非常赞同以下的观点。Ghosts of the past (10 years)Before concluding this post, I’d like to highlight one of the long-standing problems of RecSys and recommender systems research in general. Some parts of this community are stuck in the past. In 2016, we still had papers that worked on explicit feedback data, did the rating prediction task and thus evaluated w.r.t. RMSE or MAE. This is the classic task that was popularized by the Netflix Prize 10 years ago.
The goal of a recommender system is to rank the items for the user (or in a situation, or to an item, etc.) and show the most relevant ones. This task is usually referred to as the top-N recommendation task. Several research papers showed that good rating prediction doesn’t necessarily mean good top-N recommendation and vice versa. In fact, the order of algorithms on these two tasks can be quite the opposite. Rating prediction is pretty much useless in 99% of the cases because a good recommender has to solve the top-N task. Of course, solving the top-N task is just a part of the whole recommender system; there are also other things to consider. It is also true that the results of offline evaluation, in general, should be handled with a grain of salt; but as I wrote in my post on RecSys 2015: to do research, you need some kind of well-defined evaluation, even if it is just an approximation of the final goal. The thing is: rating prediction is not an approximation of the final goal and is therefore now obsolete. Any paper that focuses on this task in 2016, shows that its authors have no clue about real recommender systems. That’s why this practice is constantly called out by researchers in the industry.
Note, that this doesn’t mean that explicit feedback is necessarily bad. You can do top-N recommendations based on explicit feedback as well. It will be less interesting for practitioners, because explicit feedback is usually hard to gather in the wild, and even if you have it in large quantities, you will only have it for a small portion of your user base. Now, there are several public implicit feedback datasets, so everyone may choose to switch to those. But doing top-N recommendations on explicit data is fine.
The worst thing about this is not that rating prediction papers are written, per se. The authors might be outside of the community or just getting into the field. They might think – based on the vast literature on rating prediction – that this is the problem they should try to solve. The problem is that these papers receive good reviews and get accepted to conferences like RecSys or published in journals. This depends on the reviewers, who should know better. I hope that we won’t see any rating prediction papers next year. I’ll do my part by calling out this practice because I think the whole community would benefit from banishing rating prediction papers.
=========================== 分 割 线 =================================
【关于推荐算法中rmse和rank的大讨论】


推荐阅读