In order to have a river-style (Dave Winer) feed about book reviews in some great newspapers (NYT, The Economist, Le Monde, Japan Times), I created a RSSMix feed of the 4 and "borrowed" a script from : [http://www.idiotinside.com/2017/06/08/parse-rss-feed-with-python/](Parse RSS feed with Python)
Here is the script :
# coding: utf-8
import os
import sys
import feedparser
import console
#source : http://www.idiotinside.com/2017/06/08/parse-rss-feed-with-python/
feedparser._HTMLSanitizer.acceptable_elements.update(['iframe'])
feed = feedparser.parse("http://www.rssmix.com/u/8265752/rss.xml")
# RSSmix of Books reviews from : NYT, TE, LM, JT
feed_title = feed['feed']['title']
feed_entries = feed.entries
for entry in feed.entries:
article_title = entry.title
article_link = entry.link
article_published_at = entry.published # Unicode string
article_published_at_parsed = entry.published_parsed # Time object
article_description = entry.description
article_summary = entry.summary
#article_tags = entry.tags.label <--------- PB
console.set_color(0,1,0)
print ("{}".format(article_title))
console.set_color(1,1,1)
print ("{}".format(article_published_at))
console.set_color(0,0.75,1)
print ("{}".format(article_link))
console.set_color(1,1,1)
print ("{}".format(article_summary))
#print ("{}".format(article_tags)) <--------- PB
print (" ")
print ("....................")
print (" ")
file_name = os.path.basename(sys.argv[0])
print(file_name)
All in all, it works.
I nevertheless encounter a few problems :
- I would like to position the page at the most recent feed (top of the output), whereas the script positions it at the bottom
- I cannot figure out how to grab the entries' tags which would allow me to "filter" some entries
- It seems that the output keeps on growing… How do I eliminate entries e.g. older than 30 days ?
Thanks in advance for your help.