In this challenge, you will work on the famous Movielens dataset. The goal of this challenge is to predict for a user and a given film the score that is the most likely to be awarded by the user.
Currently, there are more and more information available on the web, more and more music to listen, movies to watch and things to buy. Indeed, companies such as Amazon, Netflix, Spotify or Youtube are proposing a large quantity of contents to their users. Therefore, developing systems that help users find items they may like is crucial and it's even better when it's the interesting content that finds the user. With the emergence of the web, consumerism flurished and buyers are presented a much larger choice of products, so sellers have to adapt and improve their advertising strategies. As the amount of transactional data has exploded within the recent years, entreprises can now have a much better understanding of the links between the customers and their products. Recommender systems have become essential to both sellers and buyers by automating recommendation based on data analysis.
In this context, we wanted to introduce you to movies recommendation. If you choose this challenge, you will work on a subset of the famous Movielens dataset that contains 1 million ratings from 6000 users on 4000 movies. Each user has at least 20 ratings and the ratings are made on a 5-star scale (whole star ratings only). The goal of the project is to implement a personalized recommendation system based on these data using machine learning tools. The system has to predict for a user and a given film the score that is the most likely to be awarded by the user.
Recommending items, in particularly movies, is a challenging task. In the opposite of "classical" machine learning, where you only have to predict a class given several features, recommendation implies using this predictions to recommend the suitable movies to the adequate people. In addition to that, this preferences can be sometimes versatile and evolve from a period to another.
In the starting kit, we have implented an algorithm that predict the for a user and a given movie the score that is the most likely to be awarded by the user. This will be your baseline.
Brought to you by the yellow team (yellow@chalearn.org)
Sihem ABDOUN, Stephen BATIFOL, Abdallah BENZINE, Abdelhak LOUKKAL, Clément THIERRY, Yaohui WANG
Data were provided by GroupLens
The problem is a regression problem. Each sample is characterized by 54 features. The goal is to predict the user ratings. The range of the ratings is defined from 1 to 5. You are given for training a data matrix X_train of dimension num_training_samples x num_features and an array y_train of labels of dimension num_training_samples. You must train a model which predicts the labels for two test matrices X_valid and X_test. Preparing your submission with the starting kit is the easiest.
There are 2 phases:
This sample competition allows you to submit either:
The submissions are evaluated using the a_metric = 1 - MAE/MAD This metric computes 1 minus the Mean absolute error (MAE) divided by mean absolute deviation (MAD).
Submissions must be submitted before the 2017-04-30 14:23:00+00:00. You may submit 5 submissions every day and 10 in total.
This competition is organized solely for test purposes. No prizes will be awarded.
The authors decline responsibility for mistakes, incompleteness or lack of quality of the information provided in the challenge website. The authors are not responsible for any contents linked or referred to from the pages of this site, which are external to this site. The authors intended not to use any copyrighted material or, if not possible, to indicate the copyright of the respective object. The authors intended not to violate any patent rights or, if not possible, to indicate the patents of the respective objects. The payment of royalties or other fees for use of methods, which may be protected by patents, remains the responsibility of the users.
ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS" THE ORGANIZERS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL ISABELLE GUYON AND/OR OTHER ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE THROUGH THIS WEBSITE.
Participation in the organized challenge is not-binding and without obligation. Parts of the pages or the complete publication and information might be extended, changed or partly or completely deleted by the authors without notice
Start: Nov. 26, 2016, midnight
Description: Development phase: tune your models and submit prediction results, trained model, or untrained model.
Start: April 30, 2017, midnight
Description: Final phase (no submission, your last submission from the previous phase is automatically forwarded).
Never
You must be logged in to participate in competitions.
Sign In