Perhaps I shouldn’t be surprised, but many organizations are reluctant to share the demographics of their employees. And yet, how can we address equality in gender, ethnicity, or religion if we can’t measure it?
Inspired by the IT Counts application created by WAX Science (a spin-off of the CRI), in which people report the percentage of women attending and speaking at scientific conferences, Gayathri and I started wondering if there was a way to measure the demographics of a social network. We looked through the most popular: Facebook, Twitter, and LinkedIn, looking for ways to gather such data.
No Good APIs for Gender
Twitter offers the best API of the 3, because much of the data is public. However no gender information is available for users. Facebook does provide gender via an API, but only from users that install your application. LinkedIn has recently curtailed its API, restricting most of it to pre-approved partners. Nor is the gender of its users reported.
But LinkedIn does present a significant advantage for our use case: as a career-focused network, it gathers links between users, companies, and educational institutions. It has a powerful search function that allows a user to see near-complete lists of users who work at given company, or went to a particular school.
So how to figure out the gender of users, when that information is not provided? That’s where face recognition comes in! You may be familiar with how Facebook and Google can identify faces in a photo. Well, there are a bunch of free services like Betaface and Face++, which can recognize multiple faces in photos, and then (this is where it get’s interesting) make almost accurate guesses of the gender, age, and race of the person photographed. You can try it yourself using the Face++ demo.
Of course, none of this makes sense if the photos aren’t good quality. Once again, LinkedIn is works well for us in this respect, because most users post nice-looking headshots that are appropriate for a CV, instead of funny memes and line drawings that you can find on Facebook and Twitter. In our informal tests, we’ve found that around three-quarters of LinkedIn users post photos, and about three-quarters of those can be analyzed, giving about a 50% analysis coverage overall.
There are occasional mistakes in the photo analysis, such as women being classified as men, and vise-versa. Age also seems difficult to estimate. And “race” is perhaps the toughest of all. Gayathri noticed that the racial classification of Indians seems almost random between “white”, “asian”, and “black”. This lead us down a fascinating rabbit hole of trying to understand how Indians have been racially classified in the past. For example, the United States government has flip-flopped between considering Indians as white or non-white at least 4 times and rejected citizenship of Indians despite accepting that they are Caucasians because “the average man knows perfectly well that there are unmistakable and profound differences.”
Extend the Browser
Back to LinkedIn. In addition to not providing the user search via their API, they also require logging in to use the search function. However, the actual profile images are publicly available, once you have the URL. These constraints led me to the idea of writing a browser extension that can activate when the user visits a LinkedIn search page. The extension basically goes down the page, extracting the URLs of profile images, sending them off for analysis, and tallying the results. Since only around 10 profiles are shown at a time, the extension automatically moves to from page to page of the results. A nice advantage of this approach is that it does not require making a fake LinkedIn account, nor do we need to store or transfer the images themselves, just their URLs.
The final step was to make some pretty graphs. For that, I chose the C3.js library, mostly for for their pretty donut charts.
The source code for the project is available on GitHub. There are a few hoops to jump through when installing it, including signing up for a free account on Face++. Once we test our extension, we would like to put it up directly on the Chrome Store, which makes it trivial to install.
So far, In Your Face is not at all a game! Perhaps it could work to transform it into a guessing game. For example, guessing which of two companies has more women, or more people smiling in their photos. Or compare the percentage of men vs. women in a guy’s 1st-degree network as compared to a girl’s.
Another important aspect is how to share the information gathered through In Your Face. We’ve thought of publishing the the data on a website, and/or providing social media buttons. These are not surefire solutions, because we cannot guarantee that each user will see the same results on LinkedIn. But given how slow the analysis is, it could help to spread the word that way.