Code for love: algorithmic dating

Posted on

One of the innovative Danish TV channels, DR3, has a history of dating programs with Gift ved første blik as, I believe, the initial program. A program with – literally – an arranged marriage between to participants matched by what was supposed to be relationship experts. Exported internationally as Married at First Sight the stability of the marriages has been low as very few of the couples have stayed together, – if one is to trust the information on the English Wikipedia.

Now my colleagues at DTU Compute has been involved in a new program called Koden til kærlighed (the code for love). Contrary to Gift ved første blik the participants are not going to get married during the program, but will live together for a month, – and as the perhaps most interesting part – the matches are determined by a learning algorithm: If you view the streamed program of the first episode you will have the delight of seeing glimpses of data mining Python code with Numpy (note the intermixed camelcase and underscore :).

The program seems to have been filmed with smartphone cameras for the most part. The participants are four couples of white heterosexual millenials. So far we have seen their expectations and initial first encounters, – so we are far from knowing whether my colleagues have done a good job with the algorithmic matching.

According to the program, the producers and the Technical University of Denmark have collected information from 1’400 persons in “well-functioning” relationships. There must have been pairs among the 1’400 so the data scientist can train the algorithm using pairs as the positive examples and persons that are not pairs as negative examples. The 350 new singles signed up for the program can then be matched together with the trained algorithm. And four couples of – I suppose – the top ranking matches were selected for the program.

Our Professor Jan Larsen was involved in the program and explained a bit more about the setup in the radio. The collected input was based on responses to 104 questions for 667 couples (apparently not quite 1’400). Important questions may have been related to sleep and education.

It will be interesting to follow the development of the couples. There are 8 episodes in this season. It would have been nice with more technical background: What are the questions? How exactly is the match determined? How is the importance of the questions determined? Has the producers done any “editing” in the relationships? (For instance, why are all participants in the age range 20-25 years?). When people matches how is the answer to the question matching: Are the answers homophilic or heterophilic? During the program there are glimpses of questions, that might have been used. Some examples are “Do you have a tv-set?”, “Which supermarket do you use?”and “How many relationships have you ended?” It is a question whether a question such as “Do you have a tv-set?” is a any use. 667 couples compared to 104 questions are not that much to train a model and one should think that less relevant questions could confuse the algorithm more than it would help.