One of the most useful ways to deploy Natural Language Processing at the moment is to uncover meaningful structure within unstructured text.
While we’re nowhere near true machine comprehension of text, in this series of posts applying NLP to tweets about the NBA Finals, we’ve demonstrated a variety of techniques to extract meaningful data from text: sentiment analysis, topic clustering, numerical analysis of trends, time series analysis, etc.
In this post, we want to look at another kind of structure we can often pull from text: the network graph represented by co-mentions of players & coaches.
What is a graph?
To most people, a graph might represent a bar chart or a pie chart. In the world of data processing, that would actually be referred to more specifically as a ‘chart’ or a ‘plot’.
A graph within this context has a specific and different meaning, which is a connection of nodes and edges representing some kind of a network.
Graphs can solve various problems in all manner of domains, from social media to physics, but today we’re going to look at the network defined by tweets about players & coaches during Game 6 of the NBA Finals.
Defining the Network
In a Network Analysis, defining what makes a valid connection between two points in the network is key. When cell phone companies analyze their networks, for example, there may be one network defined by voice calls between two phones, and another network defined by text messages. Comparing those two networks would probably reveal something about the age of various users.
In this network, an edge is created when two players are mentioned in the same tweet.
That decision will reveal interesting relationships, but it’s also important to consider the range of ambiguity that still remains. A co-mention like this could be teammates who are considered part of a unit or rivals on opposite teams being compared against one another. Or it could simply be a random connection of someone listing favorite players.
Visualizing the Network
To visualize our network, we created an undirected plot using NetworkX. Raptors players (and Coach Nick Nurse) are colored in red, and Warriors players (and Coach Steve Kerr) are colored in blue. If a tweet mentions two players (or coaches), a line is drawn between those entities.
We focused on tweets from one hour before the game to one hour after the game.
Because visualizing everyone in one network graph can be a lot to process, let’s create additional plots focusing within each team. We’ll start with the Warriors:
Focusing on the Warriors
We could count the connections ourselves, but using NetworkX’s built in degree-centrality algorithm, we can see what ratio of other nodes any given node is connected to. So in this case, DeMarcus Cousins leads the pack by being co-mentioned with 93.75% of his teammates.
It’s interesting to see how many players ranked equal to or higher Stephen Curry, who we might expect to be seen as a central figure on the Warriors. Perhaps he is seen as a player who is discussed on his own, while other players are considered as part of a group.
We also see that poor Damion Lee didn’t seem to get any co-mentions within the tweets we saw, but our Twitter stream represented only a fraction of the total number of tweets that happened last night.
Focusing on the Raptors
And now it’s time to visualize the graph of our new champions, the Raptors!
And to discuss the results numerically, here is a table of their centrality scores:
Jeremy Lin received a co-mention with all other players who made it into the analysis (we accidentally dropped a couple of Raptors players due to a missing comma). This is surprising, because he averaged only 3.4 minutes per game during the finals. But he is popular on Twitter, with his own hashtag (#linsanity), so it’s possible his popularity in the social sphere earned him his co-mentions.
Next up we see a four-way tie between Kawhi Leonard, Kyle Lowry, Norman Powell, and Jodie Meeks. It’s interesting to see that we have more even scores across the top of the Raptors’ roster. It would be too much of a stretch to say that this is strong evidence of more teamwork on the Raptors, but it may be evidence that their fans and others who discuss them see consider them more in groups than we tended to see with the Warriors.
Focusing on the Cross-Team Mentions
Now let’s focus on cross-team mentions, meaning co-mentions with players on the opposite team. First let’s visualize:
And we’ll look at the centrality table here (but just the top through including both coaches):
We see Marc Gasol receive the most cross-team mentions. He did not lead the field in any stats like rebounds or assists (although he made the top 5 players of the night in both categories). He was the tallest player on the court, but I doubt it was his imposing stature that made the difference, so we’d have to investigate the tweets more closely to try to understand his rank there.
We do see that coaches Steve Kerr and Nick Nurse, who both had co-mentions with well over 50% of their own players, received fewer co-mentions with players on the other squad. Note that we do see a line between the two of them, however, as they were being compared to one another throughout the game.
This series has been a very fun exercise in getting to deploy some favorite NLP techniques on an interesting social dataset.
The team on these posts has been led by our enthusiastic and supportive Digital Marketing Manager Tamera Fall. On the data side, the team has included Data Scientist Baolin Liu, Senior Data Engineer Varshith Manchikanti, and me (Data Engineer Andrew Larimer). And further marketing support has come from SpringML co-founder Prabhu Palanisamy and our Senior Web Designer Abhishek Kumar. Thank you all for being a dream team to make this series happen!