eHarmony’s “Big Data” Talk: Keeping it Real for the Non-Techies

David Gevorkyan, a Principal Software Engineer at eHarmony, recently gave a talk discussing “how Hadoop helps [eHarmony] to process over a billion possible matches into several highly compatible matches for each of our users per day.”  Sounds pretty technical, right?

I watched the whole talk (53 minutes!)  and I’ve pulled out some pieces for the non-techies out there.  There were a lot of interesting tidbits about how eHarmony works.  You can see the talk, and the slides, on eHarmony’s engineering blog.

First off, I’m very pleased eHarmony put something out there that gives us a little bit more knowledge about how they work.  Transparency is a beautiful thing.  Also, thanks so much to David, who was kind enough to answer some of my questions about eHarmony and his talk.

Now, on to the good stuff!


Dr. Neil Clark Warren, founder of eHarmony, came up with a way to systematically match people, using “29 dimensions of compatibility”.  The exact 29 dimensions are not disclosed, but they include such things as humor, spirituality, sociability, and ambition.

Over 600,000 marriages have come from people meeting via eHarmony, or about 438 marriages per day (this accounts for about 5% of all new US marriages).  eHarmony currently has about 50 million registered users.

David mentioned a study conducted by Harris Interactive  for eHarmony that did an analysis on divorce rates, and for the 7-year period eHarmony has been operating, the divorce rate was about 4.8%.  (Statistics about current national divorce rates vary, but some recent research puts it at about a 40-50% chance during one’s lifetime.. so that’s looking at marriages much longer than 7 years.)

David says that what differentiates eHarmony from other matchmaking sites like Match.com and OkCupid is eHarmony’s “compatibility matching system,” which has three parts:

  1. Compatibility matching: compatibility based on the personality and psychological profiles
  2. Affinity matching: historical data from the last 15 years that uses machine learning models to predict different things such as probability of communication between users
  3. Match distribution: ensuring we deliver the right matches at the right time to as many people as possible throughout the entire network

Step 1:  Compatibility Matching

When you join eHarmony, you provide criteria such as preferences on distance, income, age range, religion, smoking and drinking preferences, and others.  After that, you fill in a comprehensive relationship questionnaire (150 questions!), which is targeted to extract personality and psychological profiles.  These questions provide eHarmony with information about personality, values, attributes, and beliefs.  eHarmony then uses the “29 dimensions of compatibility” to make the matches.

EHarmonyTalk-SlideBestFriends
Sample eHarmony question

Based on a marital satisfaction survey of 5000 users, eHarmony took the most highly-satisfied couples and uses their compatibility scores to predict new matches.

When a new user joins eHarmony, it runs them through “complex mathematical equations”, which produces a score–if the score is above the threshold for the highly-satisfied couples from the survey, it considers them compatible.

David shared with me the link to one of eHarmony’s matching patents.

eHarmonyPatent
eHarmony matching patent

On a technical note, eHarmony uses a data storage system called Voldemort (developed by LinkedIn) to store its one-billion+ potential matches per day.

Step 2: Affinity Matching

Based on 15 years of historical data, the system will predict probability of communication between two users (among other things).  David says, ““Even though the users are compatible with each other, you might not always decide to give that user as a match.”

And why not?  Well, it may be that the user has specified he/she will only communicate with someone within a certain distance, or a certain age range.  So the system won’t try and match these people.  David told me there is some flexibility with this, but if a person has listed something has “very important” then eHarmony won’t give you a match that doesn’t meet your criteria.

He showed an interesting slide on how distance in miles affects the probability of communication:  most communication happens, not surprisingly, when users are nearer to each other.  However, at some point (over about 1000 miles) it doesn’t really matter any more–I guess long distance is long distance!

EHarmonyTalkDistanceInMiles
How distance affects probability of communicating

David says most communication happens when the man is taller by 4 to 8 inches–and that men are more eager to talk to women who are taller than them than women are to talk to men who are shorter than them.

Different words you use to describe yourself in your profile affect the probability of communication–that is, how likely you are to get a message from someone else.

For men, these words are likely to get more messages:  “perceptive, physically fit, passionate, intelligent, funny, optimistic”.  And for women:  “sweet, funny, ambitious, thoughtful, passionate”.

Each user has an average of about 1000 attributes, and altogether the users have answered about 4 billion questions.  eHarmony makes tens of millions of potential daily matches.  Now that’s a lot of data!

Technical note:  originally they were using Amazon Web Services, but one issue was that they could not predict when processing jobs (such as predicting matches) would finish.  Why does it matter?  They want to deliver potential matches “first thing in the morning”.

Step 3:  Match Distribution

eHarmony wants to make as many people on the system happy, so it tries to maximize communication between users.  This is done via machine learning to try and determine how many matches to send per day, what time of day, etc.


Finally, someone in the audience asked why certain people are rejected by eHarmony.  David said they do have machine learning algorithms in place that are a part of that, but did not give details.

Advertisements

My Ten Favorite Recent Articles on Love and Relationships

There are a ton of great (and not so great) articles out there about love, online dating, and the science of relationships.  I’d love to write posts on all of the interesting ones, but since that’s not feasible, I will instead share ten of my recent favorites.  Enjoy!

In no particular order…

Online Dating: Love in the Time of the Internet
A look at how online dating doesn’t let us use one of our most powerful tools:  intuition.

Matchmaker, Matchmaker, Make Me a Spreadsheet
Since OkCupid founder Christian Rudder’s new book, Dataclasym: Who We Are When We Think No One’s Looking just came out, he’s been doing a lot of interviews.  This one from Five Thirty Eight was very well-done.

Matching Algorithms, A Work in Progress
A thoughtful look from Online Dating Insider David Evans about the state of matching algorithms in online dating.  I’m definitely in agreement.

Tinder And Evolutionary Psychology
Liraz Margalit says Tinder gives us everything we need “to make an informed first impression about a potential long-term mate”.

Esther Perel: The secret to desire in a long-term relationship
A 20-minute TED talk on how long-term relationships affect desire.

For the first time, there are more single American adults than married ones, and here’s where they live
What affect will this have on marriage?  For more insights on what this means for single women, check out this NPR clip.

T is for Turning
Zach Brittle from the Gottman Institute Relationship blog talks about one of the things successful couples are better at than their non-successful counterparts:  turning towards their partner.

Infographic: The 10 Most Interesting Dating Studies of 2014
From the Science of Relationships blog, a round-up in infographics of interesting dating studies from 2014.

Couples on Different Sleep Schedules Can Expect Conflict—and Adapt
Most interesting quote from this Wall Street Journal article:  “when women reported higher relationship satisfaction, they were more likely to have been asleep at the same time as their partner the night before, almost down to the minute”.

How I Rebuilt Tinder And Discovered The Shameful Secret Of Attraction
An imperfect study, but it opens the door to some fascinating–and disturbing–possibilities of why we’re attracted to some people and not others.


Read anything interesting lately?  Send links my way at lovedata@mcquaker.org, or on Twitter @lovedatablog.