Tinder Users More Likely to Tweet About Positive Things Than Negative

If I had to guess, I’d think people who were posting about Tinder would have a lot more negative things to say than positive.  Creepy messages, no-shows for dates, fake profiles… there are plenty of negative things out there.

But after digging into the data, I was surprised to find that, on average, the tweets about Tinder contain more positive words than negative:

TinderSentimentScores

I tagged words used in the tweets as either negative or positive, and increased the negative or positive score if the words had things like “very” in front of them.  The most negative words had a score of -1, and the most positive had a score of 1.

As you can see, most of the words fell into the slightly positive category, between 0.0 and 0.2.

I did the same thing for OkCupid and eHarmony tweets.  They had a generally similar trend, but were slightly more on the positive side than the Tinder tweets.

I wanted to know more.  What were the most common negative words in the tweets about Tinder, OkCupid, and eHarmony?  To find out, I created a word cloud.  The more common the word, the bigger it appears.

MostNegativeWords

(A brief note about why some of the words look funny:  I used a technique called “stemming,” which groups similar words together by chopping off the end.  For example, “desper” includes desperate, desperation, etc.)

People are tweeting about some scary stuff!  Attack, scary, devil, panic, death.

Negative words like:  hate, weird, stupid, desperate.

Other things include:  garbage, fraud, freak, and drunk.

(By the way, the tagging set I used classifies the word ‘fun’ as positive and negative, presumably to include when people use it in the context of “making fun” of someone.)

What about the positive side?

MostPositiveWords

Positive words include “funni” (for ‘funny’, ‘funnily’ etc), and of course, love.

Friend, success, amazing, hot, strong, caring, genuine–we can see what people are hoping to find when they tweet about online dating.

On a technical note:  I used the open-source tool KNIME to collect the tweets and do the analysis.  For more on how I did it, check out my blog post on the KNIME website.

eHarmony’s “Big Data” Talk: Keeping it Real for the Non-Techies

David Gevorkyan, a Principal Software Engineer at eHarmony, recently gave a talk discussing “how Hadoop helps [eHarmony] to process over a billion possible matches into several highly compatible matches for each of our users per day.”  Sounds pretty technical, right?

I watched the whole talk (53 minutes!)  and I’ve pulled out some pieces for the non-techies out there.  There were a lot of interesting tidbits about how eHarmony works.  You can see the talk, and the slides, on eHarmony’s engineering blog.

First off, I’m very pleased eHarmony put something out there that gives us a little bit more knowledge about how they work.  Transparency is a beautiful thing.  Also, thanks so much to David, who was kind enough to answer some of my questions about eHarmony and his talk.

Now, on to the good stuff!


Dr. Neil Clark Warren, founder of eHarmony, came up with a way to systematically match people, using “29 dimensions of compatibility”.  The exact 29 dimensions are not disclosed, but they include such things as humor, spirituality, sociability, and ambition.

Over 600,000 marriages have come from people meeting via eHarmony, or about 438 marriages per day (this accounts for about 5% of all new US marriages).  eHarmony currently has about 50 million registered users.

David mentioned a study conducted by Harris Interactive  for eHarmony that did an analysis on divorce rates, and for the 7-year period eHarmony has been operating, the divorce rate was about 4.8%.  (Statistics about current national divorce rates vary, but some recent research puts it at about a 40-50% chance during one’s lifetime.. so that’s looking at marriages much longer than 7 years.)

David says that what differentiates eHarmony from other matchmaking sites like Match.com and OkCupid is eHarmony’s “compatibility matching system,” which has three parts:

  1. Compatibility matching: compatibility based on the personality and psychological profiles
  2. Affinity matching: historical data from the last 15 years that uses machine learning models to predict different things such as probability of communication between users
  3. Match distribution: ensuring we deliver the right matches at the right time to as many people as possible throughout the entire network

Step 1:  Compatibility Matching

When you join eHarmony, you provide criteria such as preferences on distance, income, age range, religion, smoking and drinking preferences, and others.  After that, you fill in a comprehensive relationship questionnaire (150 questions!), which is targeted to extract personality and psychological profiles.  These questions provide eHarmony with information about personality, values, attributes, and beliefs.  eHarmony then uses the “29 dimensions of compatibility” to make the matches.

EHarmonyTalk-SlideBestFriends
Sample eHarmony question

Based on a marital satisfaction survey of 5000 users, eHarmony took the most highly-satisfied couples and uses their compatibility scores to predict new matches.

When a new user joins eHarmony, it runs them through “complex mathematical equations”, which produces a score–if the score is above the threshold for the highly-satisfied couples from the survey, it considers them compatible.

David shared with me the link to one of eHarmony’s matching patents.

eHarmonyPatent
eHarmony matching patent

On a technical note, eHarmony uses a data storage system called Voldemort (developed by LinkedIn) to store its one-billion+ potential matches per day.

Step 2: Affinity Matching

Based on 15 years of historical data, the system will predict probability of communication between two users (among other things).  David says, ““Even though the users are compatible with each other, you might not always decide to give that user as a match.”

And why not?  Well, it may be that the user has specified he/she will only communicate with someone within a certain distance, or a certain age range.  So the system won’t try and match these people.  David told me there is some flexibility with this, but if a person has listed something has “very important” then eHarmony won’t give you a match that doesn’t meet your criteria.

He showed an interesting slide on how distance in miles affects the probability of communication:  most communication happens, not surprisingly, when users are nearer to each other.  However, at some point (over about 1000 miles) it doesn’t really matter any more–I guess long distance is long distance!

EHarmonyTalkDistanceInMiles
How distance affects probability of communicating

David says most communication happens when the man is taller by 4 to 8 inches–and that men are more eager to talk to women who are taller than them than women are to talk to men who are shorter than them.

Different words you use to describe yourself in your profile affect the probability of communication–that is, how likely you are to get a message from someone else.

For men, these words are likely to get more messages:  “perceptive, physically fit, passionate, intelligent, funny, optimistic”.  And for women:  “sweet, funny, ambitious, thoughtful, passionate”.

Each user has an average of about 1000 attributes, and altogether the users have answered about 4 billion questions.  eHarmony makes tens of millions of potential daily matches.  Now that’s a lot of data!

Technical note:  originally they were using Amazon Web Services, but one issue was that they could not predict when processing jobs (such as predicting matches) would finish.  Why does it matter?  They want to deliver potential matches “first thing in the morning”.

Step 3:  Match Distribution

eHarmony wants to make as many people on the system happy, so it tries to maximize communication between users.  This is done via machine learning to try and determine how many matches to send per day, what time of day, etc.


Finally, someone in the audience asked why certain people are rejected by eHarmony.  David said they do have machine learning algorithms in place that are a part of that, but did not give details.

What are Tinder, OkCupid and eHarmony users Tweeting About?

I recently did some analysis to see which dating apps/sites people were Tweeting about the most.  Tinder won by a landslide, followed by OkCupid and Badoo.

Now I wanted to see what these Tweets were about.  Using the open-source data analytics tool KNIME, I fed in the Tweets, did some data cleanup, and created word clouds to get a picture of the most common topics.

Let’s start with Tinder:

Tinder Word ClodSome of the words had me scratching my head initially.. castle?  Bots?  But a little digging led to this retweeted story:  “Tinder Hacked By Bots Promoting Castle Clash Game Downloads“.

Other words, like dating, singles, sexy, hot, and matches, all fit in with Tinder’s reputation.

I took a look at the data on swiping right (saying ‘yes’ to a match) and swiping left (saying ‘no’) and people were Tweeting about swiping right almost twice as much as swiping left.

Next up, OkCupid:

OkCupid Word CloudAs I mentioned in my previous post, about 60% of the OkCupid posts had to do with OkCupid’s boycotting of Firefox because Mozilla’s president had donated to the anti-gay marriage Prop 8.   That definitely skews the data!  Another big story (for the words nightmare, steals, phone) had this headline:  “Nightmare OkCupid Date Steals Girl’s Phone and Impersonates Her Online.”  Interestingly, marriage made it into the word cloud (unlike in Tinder and eHarmony) but that seems also to be related to the Mozilla boycott!

Next up in number of Tweets was Badoo–but most of the data was in Spanish.

I decided to focus on the next one instead, eHarmony:

eHarmony Word Cloud

Apparently eHarmony had a commercial out that made a lot of people uncomfortable.   As for “job,” it included various job postings at eHarmony (I did get rid of all the Tweets by the dating app’s Twitter account, but other people posted the same information).

995pm refers to a deal eHarmony was running at the time:  $9.95 per month.

Other interesting words:  exclusive, “findlove”, senior, and matchmaker.

What About Love?

“Love” was the 42nd most common word in Tinder’s Tweets, 108th for OkCupid (people were too busy Tweeting about the boycott!) and the 11th most common at eHarmony.  I can’t say I’m surprised that people Tweeting about eHarmony are Tweeting the most about love!

Now What?

The next thing I’d like to do with this data is sentiment analysis–that is, are people saying more negative or positive things about each of these sites?  I’d also like to get some more data for OkCupid since the Mozilla boycott has passed.


 

Technical Notes

You may have noticed that each word cloud contains the name of the dating site/app itself.  I did remove the majority of those words (each Tweet has one so it’s not particularly relevant!) but a few stayed in because of the way the data was organized.  I used KNIME’s Parts of Speech tagger, and interestingly it sometimes tagged the dating apps as a noun, and sometimes as a verb (depending on where in the sentence it appears).  If I were to do this again, I’d remove those instances as well.