Scrape most reviewed news and tweet using Python — GeeksforGeeks

Scrape most reviewed news and tweet using Python
Many websites will be providing trendy news in any technology and the article can be rated by means of its review count. Suppose the news is for cryptocurrencies and news articles are scraped from cointelegraph.com, we can get each news item reviewer to count easily and placed in MongoDB collection.
Modules Needed
- Tweepy: Tweepy is the Python client for the official Twitter API. Install it using following pip command:
- pip install tweepy
- MongoClient: The class MongoClient enables one to make successful MongoDB server connections with your code. Install it using following pip command:
- pip install pymongo
- Pyshorteners: Pyshorteners is used to shorten, brand, share, or retrieve data from links programmatically. Install it in the below ways
- pip install pyshorteners
Authentication
In order to fetch tweets through Twitter API, one needs to register an App through their twitter account. Follow these steps for the same:
- Open this link https://apps.twitter.com/ and click the button: ‘Create New App’
- Fill the application details. You can leave the callback url field empty.
- Once the app is created, you will be redirected to the app page.
- Open the ‘Keys and Access Tokens’ tab.
- Copy ‘Consumer Key’, ‘Consumer Secret’, ‘Access token’ and ‘Access Token Secret’ and paste them in the below code.
Below is the implementation.
#Python program to get top 3 trendy news item
import tweepy #This is needed for Python to interact with twitter API
import json
from datetime import date, timedelta, datetime
from pymongo import MongoClient
from html.parser import HTMLParser
import re
import bitly_api # This is needed for shortening of URL using bitly API. AS twitter can able to support display #certain text, we need to shorten the URL. Hence this is used
NewsArrayIndex = 0
NewsArray = [None] * 3
class MyHTMLParser(HTMLParser):
#This function collects the value of href and stores in NewsArrayIndex variable
def handle_starttag(self, tag, attrs):
# Only parse the ‘anchor’ tag.
global NewsArrayIndex
if tag == “a”:
# Check the list of defined attributes.
for name, value in attrs:
# If href is defined, print it.
if name == “href”:
#print(value + “\t” + News1)
NewsArray[NewsArrayIndex] = value
#print(NewsArray)
NewsArrayIndex += 1
#This function is the primary place to tweet the collected daily news
#News is retrieved from Coll_DailyNewsPlusReview collection (MongoDB collection)
#This collection holds the value of “News Headlines,Its review count,news link” and based upon the review count, top most #reviewed news are taken
#As twitter allows only 280 characters, the retrieved news link got shortened by using BITLY API
#Hashtags related to the news are added underneath the retrieved top 3 news (All together allowed characters are 280)
#Then top 3 news gets tweeted from a credential
#Finally per day basis the tweeted news are stored into another collection for audit purpose as well as for weekly posting
def tweetDailyNews():
try:
cursor_P = db1.Coll_DailyNewsPlusReview.find({“time”: date_str})#This is the collection name in mongodb
p0 = cursor_P[0]
News = p0.get(‘News’)
sortedNews = sorted(News, key=lambda x: int(x[1]), reverse=True)
print(sortedNews[0][0]+” — “ + sortedNews[0][1],sortedNews[1][0] + “..”+ sortedNews[1][1],sortedNews[2][0] + “..” + sortedNews[2][1])
hyperlink_format = ‘<a href=”{link}”>{text}</a>’
parser = MyHTMLParser()
dailyNews = “Impactful News of the Day” + “\n”
News0 = sortedNews[0][2]
parser.feed(hyperlink_format.format(link=News0, text=News0))
#print(News[1])
News1 = sortedNews[1][2]
print(“News1”,News1)
parser.feed(hyperlink_format.format(link=News1, text=News1))
News2 = sortedNews[2][2]
print(News2)
parser.feed(hyperlink_format.format(link=News2, text=News2))
#News shortening pattern
BITLY_ACCESS_TOKEN =”20dab258cc44c7d017bcd1c1f4b24484a37b8de9"
b = bitly_api.Connection(access_token = BITLY_ACCESS_TOKEN)
#print(NewsArray[0])
NewsArray[0] = re.sub(‘\n’, ‘’, NewsArray[0])
response1 = b.shorten(NewsArray[0])
response1 = response1[‘url’]
NewsArray[1] = re.sub(‘\n’, ‘’, NewsArray[1])
response2 = b.shorten(NewsArray[1])
response2 = response2[‘url’]
#print(“response..”,response2)
NewsArray[2] = re.sub(‘\n’, ‘’, NewsArray[2])
response3 = b.shorten(NewsArray[2])
response3 = response3[‘url’]
news1FewWords = sortedNews[0][0].split()
dailyNews += news1FewWords[0] + “ “ + news1FewWords[1] + “ “ + news1FewWords[2] + “….” + response1 + “\n”
news2FewWords = sortedNews[1][0].split()
dailyNews += news2FewWords[0] + “ “ + news2FewWords[1] + “ “ + news2FewWords[2] + “….” + response2+”\n”
news3FewWords = sortedNews[2][0].split()
dailyNews += news3FewWords[0] + “ “ + news3FewWords[1] + “ “ + news3FewWords[2]+ “….” + response3 + “\n” + “#bitcoin #cryptocurrency #blockchain #investor #altcoins #fintech #investment”
print(dailyNews)
#print(News[0][0–3])
status = api.update_status(status = dailyNews)
if status:
for i in range(3):
datas = {}
datas[‘time’] = str(date.today())
datas[‘posted_as’] = i
datas[‘news’] = sortedNews[i][0]
datas[‘shortenedlink’] = NewsArray[i]
datas[‘reviewcount’] = sortedNews[i][1]
datas[‘link’] = sortedNews[i][2]
db1.Collection_tweeted_news.insert(datas)
except Exception as e:
print(e)
print(“Error in getting today news data”, str(date_str))
#Main Program starts here
News1 = ‘ ‘
News2 = ‘ ‘
date_str = str(date.today())
print(“today”,date_str)
client = MongoClient(‘mongodb://localhost:27017/’)
db1 = client.xxxx#Connect your database here
#credentials to tweet
# consumer_key =”XXXXXXXX”
# consumer_secret =”XXXXX”
# access_token =”XXXXX”
# access_token_secret =”XXXX”
#authentication of consumer key and secret
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
# authentication of access token and secret
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)
tweetDailyNews()
Output: (On daily basis, if we ran the program, top 3 news differs)
Sample Output is shown below :
Impactful News of the Day
Veteran Investor Says….https://bit.ly/2X1x51V
Bitcoin Hashrate Drops….https://bit.ly/2T83xyS
The VC Who….https://bit.ly/3czxVKb
#bitcoin #cryptocurrency #blockchain #investor #altcoins #fintech #investment
Originally published at https://www.geeksforgeeks.org on July 4, 2020.