Instagram Scraper — How “Popular” Am I Compared to My Friends?
By Sophie Zhao and Mentor: Sining Chen
Did you know that your Instagram friends are, on average, more popular than you are? This is according to the Friendship Paradox, mathematically proven characteristics of social networks. Being an avid Instagram user, I wanted to study this phenomenon on my Instagram network. I proceeded to ask the following question:
Am I less “popular” (have fewer followers) than the people I follow (my followees)?
Since I am not an influencer, it seems like I should expect to be “less popular” as suggested by the Friendship paradox. Is this true for the other users as well? In any case, let’s prove it with numbers!
What you will see in this story:
- How to scrape data from Instagram, specifically using Instaloader.
- Workings of a virtual network through Instagram profiles
- Using scraped data and coded algorithm to determine “popularity”
- Limitations and topics of further inquiry
Steps I took to answer my questions:
- Import Instaloader
- Access user profile
- Save followees
- Obtain followers of followees
- Calculate average followers
- Compare results
Step 1: Import Instaloader
The coding language I used was Python. I needed to obtain data from Instagram, including user profiles, numbers of followers, and numbers of followees. To do this, I used a Python package called Instaloader; it is a free open source software written in Python.
import instaloader
Instaloader can download…
- …pictures and videos along with their captions and other metadata from Instagram
- …profiles (public and private), hashtags, user stories, feeds and saved media
- …comments, geotags and captions of each post
It can also…
- automatically detect profile name changes and rename the target directory accordingly
- allow fine-grained customization of filters and where to store downloaded media
For more information about instaloader: https://instaloader.github.io/
To install instaloader: https://instaloader.github.io/installation.html
*Important Update: As of May 2020, there have been issues regarding the use of Instaloader in the way used in this story. More specifics touched on in the following closed discussion: https://github.com/instaloader/instaloader/issues/615
Step 2: Accessing a user profile
In the following code, <username> and <password> are the login info of the account wanting to access a desired user’s (<user>) data.
# Login infoL = instaloader.Instaloader()L.login(<username>, <password>)# Desired useruser = <user>
For example, when I wanted to access my own information, I would put my username “sz_713” as <username> and enter my password in <password>. My desired user, myself, would also be “sz_713”
# Login infoL = instaloader.Instaloader()L.login("sz_713", "my_password")# Desired useruser = "sz_713"
*Note: To be able to access data from <user>, the account you log in from must have permission to see the profile. This means that if <user> is blocking you or is a private account you do not follow, you are unable to access their information from your account.
The Anatomy of an Instagram User Profile
Each instagram user profile contains the number of followers, followees, and their usernames and links. This enables us to crawl through the network from any user represented by a node (see Step 4: Obtain the number of followers of the followees). Instaloader allows us to obtain such data.
Step 3: Save followees
Using methods in instaloader, we can gather a list of <user>’s followees. The list of followees is printed into a text file for later use.
# Print list of followees into txt fileprofile = instaloader.Profile.from_username(L.context, user)list_followees_txt = open("list_followees", "w")for followee in profile.get_followees():list_followees_txt.write("%s \n" % followee.username)list_followees_txt.close()
In the code above, I accessed a user profile (which was my own) and opened a text file called “list_followees”. I am then able to write the usernames of profile I follow into “list_followees”.
Basics of A Social Network
In a social network, the users are nodes or vertices. The relationships between the users are edges. A network is called “directed” when edges have directions to them. For example, the act of user A following user B, without B necessarily following A. Twitter and Instagram could be “directed networks” as I might follow Taylor Swift who does not necessarily follow me back. On the other hand, Facebook consists of “undirected” networks where all edges are mutual friendships and bi-directional.
Directed network
In this case from left to right, perhaps Grace follows Jack who follows Aries. Also note that a total of five other users follow Jack, and so on.
Un-directed network
We can visualize a social network in Instagram without the arrows as well, assuming that each numbered node represents a user.
Step 4: Obtain followers of followees
Next, I want to get the total number of followers of the followees in the list to later calculate their average. Instead of getting the number of followers for every single followee, the following code scrapes the data of about 30 random followees to do this calculation.
# Get number of followers of followees from txt filewith open('list_followees') as f:content = f.readlines()total_followees = len(content)content = [x.strip() for x in content]sum = 0count = int(total_followees/30)num_followees = 0for username in content:if count == 0:profile = instaloader.Profile.from_username(L.context, username)followers = profile.followerssum += followersnum_followees += 1count = int(total_followees / 30)else:count -= 1
I take a random sample as opposed to a full set of data for each user due to limitations of scraping data from Instagram. If I try to scrape too much data, there is a request timeout that will terminate the code, which I don’t want.
Approach to a Network
Considering the undirected network of Instagram profiles above, to find the friends of my friends requires me to crawl a network from myself (the ego node) out to at least two degrees. We needed an algorithm. This doesn’t sound like much. However, in my case for example, each of the numbers in the picture above are the number of followers, and each of them have their own number of followers and so on. Therefore, in reality, the further out we move, the number of nodes needed to be crawled explodes exponentially.
Step 5: Calculate average followers
After obtaining the sum of the number of followers, I can easily divide that by the number of followers we used (30) to calculate the average number of followers.
# Get average num followersaverage = sum / num_followeesprint("The average number of followers is " + str(average))
My results say “The average number of followers is 29178.48275862069”
Step 6: Compare results
Finally, to answer my original question, I compare the average number of followers I calculated (29178.48275862069) to the number of followers that <user> has. If <user> has more followers than its followees do on average, I declare it more popular than its followees. Otherwise, is followees are more popular than itself.
# Compare user num followers to averageuser_followers = (instaloader.Profile.from_username(L.context, user)).followersif (average > user_followers):print("On average, " + user + "'s followees have more followers than" + user + ".")else:print("On average, " + user + "has more followers than their followees.")
My output says “On average, sz_713’s followees have more followers than themselves.” It compared the 29178.48275862069 followers that my followees have on average to the 427 followers I have.
Summary
I used instaloader as a way of scraping data from Instagram users. Using its methods, we can answer the question of whether a specific user is more or less popular than the people he or she follows. Instaloader allows us to use an appropriate account to access our desired user to test. We can then gather those followees’ number of followers and compare their average to the user’s number of followers, therefore we can answer our original question through a simple comparison.
Limitation
- Avoids request time out issue by limiting data gathered → results are not completely accurate, but are more of an estimate.
Related topics for further investigation:
— How popular do I have to be for my results to differ?
— If we count all of the pairs who mutually follow each other as “friends”, does the Friendship Paradox hold in my immediate circle?