TL;DR: I search “@awscloud” in their Twitter Bio and add their profiles to a public Twitter list.
One opportunity to get fresh news from AWS ecosystem is to follow relevant current or former Amazon (Web Services) employees called Amazonians on Twitter, but it could be difficult to stalk their profiles across the myriad of user-profiles talking about #AWS subjects…
Lately, I was working on a quick python Proof of Concept (another excuse to ramp-up my python 🐍 skills) to query Twitter API to found profiles with a specific twitter description/bio, and I questioned myself if I could use this to find more and more AWS employees 🙄
# Load the current already crawled profiles and list profiles with open('list.pkl', 'rb') as f: users_in_list = pickle.load(f) b = len(users_in_list) f.close() for idx, follower in enumerate(tw.Cursor(api.followers, id=user).items(), start=1): print(idx, follower.screen_name) # don't redo if already crawled profiles (already in the list) if not (follower.screen_name in users_in_list): for word in keywords: # if keyword, not already crawled and not protected profiles if word in str(follower.description.encode('utf-8')) and not (follower in hits) and follower.protected is False: print("===="*10) print("matched!") print("screen_name: ", follower.screen_name) print("description: ", follower.description.encode('utf-8')) hits.append(follower) # Add to twitter list api.add_list_member(screen_name=follower.screen_name, list_id=target_list) print("Added to list: ", follower.screen_name) print("===="*10) else: print("Already in list: ", follower.screen_name) users_in_list.append(follower.screen_name) # Dump result to the pickle local file: already crawled and list profiles with open('list.pkl', 'wb') as f: pickle.dump(users_in_list, f) f.close() # count c = len(users_in_list) h = len(hits) # Job summary print("===="*10) print("1-Number of users in list before crawling: ", b) print("2-Number of users in list after crawling: ", c) # delta d = c - b print("3-Crawled: ", d) print("4-Matched: ", h) print("===="*10)
==> Full source code available on Github.
I need to do a lot of code optimization and cleaning, I know, but it’s a working sample. I’ve got a few ideas to enhance a later re-usable version for other usages. Don’t hesitate to submit enhancements using PR.
At the moment, my methodology is pretty naive, I’m only searching “@awscloud” keyword on bio, and add corresponding users to a specific Twitter list.
At the moment, the best fishing ratio came from AWS CEO Andy Jassy, yeah, if you are AWS employee you are following your boss. I will try other well-known personalities in this space. I’ve tried Jeff but without great success.
Surprisingly, some Amazonians subscribed to this public Twitter list… We’ve come full circle here… 💪
One limitation is the throttling of a Twitter API, but in-fine, I just need to be patient like fishing… at the current limit, I’m able to parse 1200 twitter users per hour… 9600 in one day, it’s pretty slow.
I will enrich this list and maybe others lists (5000 profiles per list) with a scheduled job using Docker container to refresh, and with better stalking method :
- I think a good approach could be localization-based: “Seattle” (to get in touch with Product Teams), more bio keywords (
Former), and maybe people following @awscloud as multiple conditions? What do you think?
- I will give a try to an infinite loop using matched profiles and crawled profiles and auto-creation of a list when I reach 500 limits. (Think I’ll get ban before… 😲)
- Data Scientists folks, do you see any ML model that could fit my experiment? We’ll need a large dataset of Amazon employees twitter profiles to find recurring patterns.
That’s all folks!