WSDM Cup 2018

There will be two tasks in WSDM cup 2018, one is conducting song recommendations and the other is making churn prediction based on user history. Participants can choose to compete in one or both of the tasks. The data for the challenges come from KKBOX, a leading music streaming company from Taiwan.

The task winners will be awarded $2500 each. The respective second and third runner-ups will be awarded $1500 and $500. Four travel grants (US$500 each team) for attending the 2018 WSDM Cup Workshop held by KKBOX Group will be offered to the top 4 teams that are ranked among top 10 and consists of all student members except one advisor. The awards are kindly sponsored by KKBOX. Prizes as described will be awarded to the participants with the highest scores (based on the merits of the data science models submitted) who agree to submit and share their solutions in the 2018 WSDM Cup workshop held by Competition Sponsors.
Task 1 - Churn Prediction

For a subscription business, accurately predicting churn is critical to long-term success. Even slight variations in churn can drastically affect profits. In this challenge you’re tasked to build an algorithm that predicts whether a user will churn after their subscription expires.

Task 2 - Recommendation System

While the public’s now listening to all kinds of music, recommendation algorithms still struggle in key areas. Without enough historical data, how would an algorithm know if listeners will like a new song or a new artist? And, how would it know what songs to recommend brand new users? Your task to to solve the abovementioned challenges and build a better music recommendation system.

September, 15th, 2017
competition begins
December, 17th, 2017 (23:59 PST)
competition ends
December, 19th, 2017
winner announcement
January, 9th, 2018
workshop paper submission deadline
February, 9th, 2018
WSDM cup workshop
Competition and Workshop Organizers

Shou-De Lin, National Taiwan University

Xing Xie, Microsoft

Yian Chen, KKBOX

Yuh-Ming Chiu, KKBOX

WSDM Cup 2018 Workshop

Date: Friday, February 9, 2018 (9:00 am - 12:15 pm)
Location: Marina Vista

We will hold WSDM Cup 2018 workshop on February, 9th 2018, which brings top participants together and exchange all the brilliant ideas with each other. There are two tasks in this challenge - recommendation and churn prediction.

Recommendation systems facilitate users retrieving contents they might like but not aware of yet. Furthermore, an e ective recommendation system can potentially increase users’ retention and conversion rate. One critical challenge for building a recommender system lies in the existence of cold start cases when we have sparse records for certain users or items: without enough rating data about a new song or a new user, it is necessary to rely on auxiliary information to perform effective recommendation. In the first task of WSDM Cup 2018, we challenge the participants to solve the above mentioned challenges in building a music recommendation system. The 2nd task of the Cup focuses on churn prediction. For a subscription business, accurately predicting churn is critical to its long-term success as even a slight variation in churn can sig- ni cantly a ect the pro ts. In this task, participants are asked to build an algorithm that predicts whether a user will churn a er their subscription expires. The competition data and award are provided by KKBOX, a leading music streaming service company from Taiwan.



Speaker: Craig Knoblock

Title: Extracting, Aligning, and Linking Data to Build Knowledge Graphs


There is a tremendous amount of data spread across the web and stored in databases that can be turned into an integrated semantic network of data, called a knowledge graph. However, exploiting the available data to build knowledge graphs is difficult due to the heterogeneity of the sources, scale in the amount of data, and noise in the data. In this talk I will present our approach to building knowledge graphs, including acquiring data from online sources, extracting information from those sources, aligning and linking the data across sources, and building and querying knowledge graphs at scale. We applied our approach, implemented in a system called DIG, to a variety of challenging real-world problems including combating human trafficking by analyzing web ads, identifying illegal arms sales from online marketplaces, and predicting cyber attacks using data extracted from both the open and dark web.


Craig Knoblock is a Research Professor of both Computer Science and Spatial Sciences at the University of Southern California (USC), Research Director at the Information Sciences Institute, and Associate Director of the Informatics Program at USC. He received his Bachelor of Science degree from Syracuse University and his Master's and Ph.D. from Carnegie Mellon University in computer science. His research focuses on techniques for describing, acquiring, and exploiting the semantics of data. He has worked extensively on source modeling, schema and ontology alignment, entity and record linkage, data cleaning and normalization, extracting data from the Web, and combining these techniques to build knowledge graphs. He has published more than 300 journal articles, book chapters, and conference papers on these topics. Dr. Knoblock is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), Fellow of the Association of Computing Machinery (ACM), Senior Member of IEEE, past President and Trustee of the International Joint Conference on Artificial Intelligence (IJCAI), and winner of the 2014 Robert S. Engelmore Award.

09:50 - 10:05

Incorporating Field-aware Deep Embedding Networks and Gradient Boosting Decision Trees for Music Recommendation

Bing Bai, Yushun Fan


Truncated SVD-based Feature Engineering for Music Recommendation

Nima Shahbazi, Mohamed Chahhou, Jarek Gryz


KKBOX’s Music Recommendation Challenge Solution with Feature Engineering

Jianyu Zhang, Françoise Fogelman-Soulié


Could You Play That Song Again? – Reminding Users of Their Favorite Tracks Through Recommendations

Malte Ludewig, Dietmar Jannach


Prediction of repeated listening by means of GBDT-based approach

Vasiliy Rubtsov, Dmitry I. Ignatov, Anvar Kurmukov


Predicting Customer Churn: Extreme Gradient Boosting with Temporal Data

Bryan Gregory


A Hybrid Approach for Music Recommendation

Lin Zhu, Yihong Chen, Wen Jiang


A Practical Pipeline with Stacking Models for KKBox’s Churn Prediction Challenge

Zhinan Wang, Wenming Xiao, Jun Wang


Ensembling XGBoost and Neural Network for Churn Prediction with Relabeling and Data Augmentation

Chence Shi, Zheye Deng, Yewen Xu, Weiping Song, Yichun Yin, Jile Zhu, Ming Zhang


An ensemble approach to streaming service churn prediction

Hang Li, Quang Hieu Vu,Thanh Lam Pham,Tam T. Nguyen, Song Chen, Jeong-Yoon Lee


Closing: RecSys Challenge 2018: Automatic Playlist Continuation

Hamed Zamani