To wrap up the NBA Twitter Text Analysis series, we wanted to recap and review the tweets that were made during the NBA Championship series.
In Game 1, we looked at how hashtag usage changed in pre and post games.
In Game 2, we looked at how sentiment changed for particular players.
In Game 3, we did topic clustering to group similar tweets.
In Game 4, we had a different take on hashtag, twitter @’s, and emoji usage.
In Game 5, we looked at how sentiment changed over time and geographical location.
Game 6 Twitter Text Analysis: Pulling out Meaningful Words
During Game 6, we recorded 1,506,129 tweets for a span of about 13 hours. We wanted to capture the tweets before, during, and after the game on both sides of the coast and conversations that happened way into the night as well. If you want to learn more about how we created the dataset using Google Cloud Platform(GCP) tools, it is explained in Game 4 of our NBA series.
Throughout the series, we learned a lot about the best ways to process tweets relatively quickly including:
- Batch processing tweets into 1000 tweets at a time was much faster than processing one at a time in an iteration loop. More than 1000 at time slowed things down further.
- VaderSentiment was 5x faster than Textblob in doing sentiment analysis.
To close out our Twitter Text Analysis series, we decided to focus on:
- Network Graph Analysis of players mentioned in the same tweet
- Tweet volume over time in 15-minute increments
- Positive and Negative Words used throughout the event
- Named Entity Recognition (NER)
There was a huge rise in tweet volume before the game, stayed at around 40,000 tweets and then it started to taper down as time passed.
In preparation for analysis, the text was removed of punctuation, special characters, lowercase, and stemmed to keep only the root words.
Positive Vs Negative Words
We were able to pull out an overwhelming amount of positive words used rather than negative. People are most likely very happy for the Toronto Raptors for their win and the people that are negative are most likely a small minority of people who rooted for the Warriors.
Negative Words Used
There is a huge use of the word ‘shit’ is from tweets like this: “You see that shit? Yeah don’t do that tomorrow.”
The word ‘hurt’ came from a lot of tweets such as this:
RT @BR_NBA: Don’t hurt ’em with the Game 6 fit, Wardell.
And just commenting on how the players are hurt:
RT @LucasJHann: Imagine the audacity, the game after you rushed KD back and he got hurt again, to say “we don’t want to give the raptors tw…
Klay getting hurt rn is the first time I lost faith in the Warriors
Free throws hurting the warriors big time
idgaf who gets hurt steve Kerr bet not put Jordan bell in the game
Scenario: Entire Warriors team hurt.
Mark Jackson: I don’t like this Warriors team on the floor defensively.
Who else you want them to play?
God please give the warriors strength ? they hurtin out there
Positive Words Used
The interesting word that came on top was ‘free’. A look at the dataset and this was this tweet:
‘RT @miss9afi: Canada gave me what the Islamic regime took away from me…Canada gave me my freedom…’
Named Entity Recognition(NER)
Here are a few examples:Here is a list of the top 10 entities found:
Obviously, there would be a lot of talk about Toronto’s star players such as Kawhi Leonard and Kyle Lowry, but the injury that Klay Thompson suffered must have created a lot of talk as well. The word “first” was used a lot, probably because this was Toronto’s first championship. As you can see Warriors are featured on the bottom since looks like most people didn’t care for that team to win again.
This a great dataset that we hope to extract more insights out of it in the future. It was also great working with teammates in Texas, California, Virginia and in our India offices to make this all happen.
If you have anything you are interested in seeing in the future on text analysis, drop us a line at [email protected] or tweet us @springmlinc. We’d love to hear from you as well as any ideas for future text analysis series.