Automated Fake Account Detection at LinkedIn
Although we prevent a large majority of bulk fake accounts from being created at registration, we sometimes don’t have enough information at that point to determine if accounts are fake. For this reason, we have other downstream models to catch smaller batches of fakes. First, we create clusters of accounts by grouping them based on shared attributes. We then find account clusters that show a statistically abnormal distribution of data, which is indicative of being created or controlled by a single bad actor. These are supervised machine learning models that use features per cluster instead of per member. The models score the clusters, then propagate the cluster label to individual accounts. This cluster approach allows us to catch more fake accounts quickly, faster than we could if we wait for them to start taking abusive actions on the site.
Fake accounts that are not created in bulk by a single bad actor are often detected by our activity-based models. Our models require more information on these accounts’ behavior to decide whether they are indeed fake. We have many models that either look for specific types of bad behavior typical to abusive accounts or behavior that is anomalous. Additionally, our systems have redundancy, which ensures that fake accounts not caught by the early stages of our defenses (top of the funnel) are eventually caught by the latter ones (bottom of the funnel).
A human element will always be necessary to catch fake accounts that have evaded our models. While we strive to take down fake accounts before they interact with our members, we also get signals from our members who report suspicious activity on the site. Members give us valuable information by reporting accounts that have not been caught by our models so that they can go through additional scoring and review. Finally, we have a team of investigators that look for accounts that have evaded all levels of automated defenses. These investigations also yield valuable signals that can subsequently be incorporated into our models.
We are constantly improving our fake account models given the adversarial nature of abuse. The quicker we catch fake accounts, the more we prevent damage to our members, which helps keep LinkedIn a safe, trusted and professional community.