Investigating the Twitter Interest Graph

Why Is Twitter All the Rage?

I’m presenting a short webcast entitled Why Twitter Is All the Rage: A Data Miner’s Perspective that is loosely adapted from material that appears early in Mining the Social Web (2nd Ed). I wanted to share out the content that inspired the topic. The remainder of this post is a slightly abridged reproduction of a section that appears early in Chapter 1. If you enjoy it, you can download all of Chapter 1 as a free PDF to learn more about mining Twitter data.

Why Is Twitter All the Rage?

How would you define Twitter?

There are many ways to answer this question, but let’s consider it from an overarching angle that addresses some fundamental aspects of our shared humanity that any technology needs to account for in order to be useful and successful. After all, the purpose of technology is to enhance our human experience.

As humans, what are some things that we want that technology might help us to get?

  • We want to be heard.
  • We want to satisfy our curiosity.
  • We want it easy.
  • We want it now.

In the context of the current discussion, these are just a few observations that are generally true of humanity. We have a deeply rooted need to share our ideas and experiences, which gives us the ability to connect with other people, to be heard, and to feel a sense of worth and importance. We are curious about the world around us and how to organize and manipulate it, and we use communication to share our observations, ask questions, and engage with other people in meaningful dialogues about our quandaries.

The last two bullet points highlight our inherent intolerance to friction. Ideally, we don’t want to have to work any harder than is absolutely necessary to satisfy our curiosity or get any particular job done; we’d rather be doing “something else” or moving on to the next thing because our time on this planet is so precious and short. Along similar lines, we want things now and tend to be impatient when actual progress doesn’t happen at the speed of our own thought.

One way to describe Twitter is as a microblogging service that allows people to communicate with short, 140-character messages that roughly correspond to thoughts or ideas. In that regard, you could think of Twitter as being akin to a free, high-speed, global text-messaging service. In other words, it’s a glorified piece of valuable infrastructure that enables rapid and easy communication. However, that’s not all of the story. It doesn’t adequately address our inherent curiosity and the value proposition that emerges when you have over 500 million curious people registered, with over 100 million of them actively engaging their curiosity on a regular monthly basis.

Besides the macro-level possibilities for marketing and advertising—which are always lucrative with a user base of that size—it’s the underlying network dynamics that created the gravity for such a user base to emerge that are truly interesting, and that’s why Twitter is all the rage. While the communication bus that enables users to share short quips at the speed of thought may be a necessary condition for viral adoption and sustained engagement on the Twitter platform, it’s not a sufficient condition. The extra ingredient that makes it sufficient is that Twitter’s asymmetric following model satisfies our curiosity. It is the asymmetric following model that casts Twitter as more of an interest graph than a social network, and the APIs that provide just enough of a framework for structure and self-organizing behavior to emerge from the chaos.

In other words, whereas some social websites like Facebook and LinkedIn require the mutual acceptance of a connection between users (which usually implies a real-world connection of some kind), Twitter’s relationship model allows you to keep up with the latest happenings of any other user, even though that other user may not choose to follow you back or even know that you exist. Twitter’s following model is simple but exploits a fundamental aspect of what makes us human: our curiosity. Whether it be an infatuation with celebrity gossip, an urge to keep up with a favorite sports team, a keen interest in a particular political topic, or a desire to connect with someone new, Twitter provides you with boundless opportunities to satisfy your curiosity.

Think of an interest graph as a way of modeling connections between people and their arbitrary interests. Interest graphs provide a profound number of possibilities in the data mining realm that primarily involve measuring correlations between things for the objective of making intelligent recommendations and other applications in machine learning. For example, you could use an interest graph to measure correlations and make recommendations ranging from whom to follow on Twitter to what to purchase online to whom you should date. To illustrate the notion of Twitter as an interest graph, consider that a Twitter user need not be a real person; it very well could be a person, but it could also be an inanimate object, a company, a musical group, an imaginary persona, an impersonation of someone (living or dead), or just about anything else.

For example, the @HomerJSimpson account is the official account for Homer Simpson, a popular character from The Simpsons television show. Although Homer Simpson isn’t a real person, he’s a well-known personality throughout the world, and the @HomerJSimpson Twitter persona acts as an conduit for him (or his creators, actually) to engage his fans. Likewise, although this book will probably never reach the popularity of Homer Simpson, @SocialWebMining is its official Twitter account and provides a means for a community that’s interested in its content to connect and engage on various levels. When you realize that Twitter enables you to create, connect, and explore a community of interest for an arbitrary topic of interest, the power of Twitter and the insights you can gain from mining its data become much more obvious.

There is very little governance of what a Twitter account can be aside from the badges on some accounts that identify celebrities and public figures as “verified accounts” and basic restrictions in Twitter’s Terms of Service agreement, which is required for using the service. It may seem very subtle, but it’s an important distinction from some social websites in which accounts must correspond to real, living people, businesses, or entities of a similar nature that fit into a particular taxonomy. Twitter places no particular restrictions on the persona of an account and relies on self-organizing behavior such as following relationships and folksonomies that emerge from the use of hashtags to create a certain kind of order within the system.

Editor’s Note: If you’re interested in learning more about Twitter and social media data mining techniques, check out Matthew’s webcast Why Twitter Is All the Rage: A Data Miner’s Perspective.

Related

Sign up for the O'Reilly Programming Newsletter to get weekly insight from industry insiders.