In my effort to be updated on and investigate social media I got an account on one of the large Facebook-wannabee websites with social network facilities. I knew of noone on the site and was therefore fairly surprised when its friend-suggester came up with a person that I knew, – and only that person! How was the website able to know I was connected to that person? There is excessively little information on the public Internet to connect me with that person. The person is in another place, is another age and is in another business. If I google on the public web I find no pages that mention me and the person on the same page. The way that I logged into the social website was independent of other social web-sites: I didn’t explicitly tell the website about my other accounts on Twitter, Facebook, MySpace, LinkedIn, Xing, Tumblr or Posterous. Thus it could not get access to my social network through me, so the website must have gotten this relatively private information from somewhere. How?
I will come back to that. First a bit on other issues of privacy.
I recently went to the Extended Semantic Web Conference where Abe Hsuan provided one of the fine keynote talks. He focused on privacy on the Web and the “Data Valdez”. Among the topic he addressed were:
- The Dog Poop Girl from Seoul who was photographed by an anonymous subway passenger. The girl’s dog had shit on the floor of the Seoul subway train and the girl was so embarrassed that she left it there. As the photo was released an Internet storm arose against the poor girl, her identity and personal details being revealed.
- In the AOL Data Valdez the company released 20 million Web search queries in 2006 and with a bit of compiling journalist could reveal the identity of individual users, e.g., a 62-year old woman and her search queries on “60 single men” and other personal searches.
- Netflix Personalization Challenge where researchers could break the anonymization in the video rental company’s data by correlating data with IMDb.
- Pandora’s Android App that appears to send user’s birth date, gender and GPS information to advertising companies according to an analysis by Veracode.
Hsuan also pointed to a whole series of companies that specializes in correlating information across and beyond the Web: bluecava, Blue Kai, Epsilon/Abacus, TargusInfo, brilig.com, Sense Networks, Ingenix (prescription drug history, therapeutic outcomes and billing information), face.com (facial recognition technology). In April 2011 researchers reported that Apple devices stored lists of locations with timestamps without the user acknowledgment. This is just to help the user to get faster geolocation through wifi and mobil phone tower data rather than slow GPS According to Apple. However, with access to the unencrypted backup of the device you will be able to observe the travels of the device user.
Google got itself into a lawsuit after collecting and transmitting location data on the Android platform, see here.
Revealing too much about your location in public may give thieves a good opportunity and a Danish insurance company advices users to remove the Facebook Places application. There is an asymmetry in knowledge: The thieves know when you are away from your house, but the thieves are not willing to reveal that they are in your house. The interesting website Social Clusters by Morten Barklund allows you to make intelligent visualizations of your friend network from Facebook. To enable that you have to reveal your friend network to the Web service. Though eager to try it out, I was too reluctant to reveal my network. Regardless, I am probably already in the database as some people in my friend network on Facebook have signed up, i.e., I am not among the presently 102 registered users but very likely among the presently 28’513 connected persons. Registration may not be necessary to reveal you friend network. If one among your Facebook friends has an open profile some information about you is revealed even if you have a closed profile. According to a study a third of Danes on social networks regularly upload photos of people other than themselves. And among these a fourth has an open profile. So if you have more than 12 friends there is a fair chance/risk that an image of you is accessible to non-friends even if you have a closed profile and never uploaded images of yourself.
Now back to the social website that guessed right with its friend suggester. How did it do it? Here are some suggestions:
- Facebook could have revealed my friend network to the website. This option is unlikely given that Facebook and the website is competitors.
- The website could have obtained information from the public information on Facebook. I think this is also unlikely. Facebook would not allow a competitor to crawl its website to acquire the friend network.
- Before I understood that Facebook applications were actually thirdparties and not just keep the data within Facebook I added a few applications. One among them was the Friend Wheel. I do not know what the Friend Wheel does with my data, but I don’t think that it has got to the social website.
- A likely path for the data is that the other person logged in via Facebook so the other social website could get hold on the Facebook friend network. As I was in this network and my name is pretty unique the social website could match up my name with the name in the friend network.