You are not logged in.

#1 2015-10-07 17:34:59

BL Die Hard
From: Around the Bend
Registered: 2015-09-29
Posts: 1,057

A Python Script That Will Show the BunsenLabs Forums Atom Feed:

I have been working on a python script to parse BunsenLabs atom feeds the last couple of days. Please excuse the length of the post in advance as I have a fair bit to cover.

I was attempting to use the conky 1.10 rss builtin to parse BL's atom feeds and was getting nowhere with it. After a bit of a search I found out about feedparser, which will parse a number of different rss and atom feeds. It is available in the debian repositories. I have decided to go with a feedparser implementation using python.

The script is built against python 2.7, which is the default for BL. It also has a dependency on feedparser. Get it via:

sudo apt-get install feedparser

Feedparser has no other dependencies than python.

Breakdown of What the Script Actually Does:
This script uses feedparser to load in the provided url, which is currently … &type=atom and loads up a dictionary of atom objects. It also sanitizes the html, removing potentially hazardous tags from the incoming Atom feed. Once, that is accomplished, I extract a title and message from the dictionary and run it through a regular expression test to remove the remaining html and bbcode tags and attempts to deal with unicodes it finds. The unicode handling is likely still a WIP as I have yet to find an efficient way to handle them that does not introduce odd behaviors such as empty return strings, etc that will break the formatting.  The script will work on however many posts are specified in numposts at the top of the script. I currently have it set to 6. Messages are truncated to 255 characters in order to give a bit of predictability to long posts. Once the two lists are setup, the script iterates through them and outputs them. It also includes some basic color and hr formatting that it sends back to conky so we can have a bit of formatting for the posts. Once the script executes, I pipe it back into bash for a final bit of formatting via the fmt commant.

I am still very much learning python, so I am sure some of you will discover a more efficient way to accomplish this. Since the incoming feed is specified utf-8, it is best to handle all the strings as utf-8 as well in order to avoid random errors. Unicode handling is very much a WIP. If anyone can think of a way to reliably handle a regular expression match for the unicode that will match it to a dictionary of unicodes and their remappings, I would love to hear about it. Currently, I am just using the .replace method to fix stuff that I notice breaks formatting.

The script is actually composed of two files: bl-rss does the heavy lifting via python and run-rss executes bl-rss and pipes its output back into fmt before it goes to conky. This gives some word wrapping control over the output. Place both scripts into your ~/bin directory. Modify your conky execpi command to look something like this:

{execpi 1800 /home/tknomanzr/bin/run-rss}

Notice I set the script to execute only once every thirty minutes. I did this for a few reasons. One: it is fairly heavy and probably should not be running too often. Two: reduce traffic to the server. Three: I am not sure how often the atom feed updates but it's a fairly longish time and there is no point in continually executing the script and hitting the server if there is no new info to obtain.
The conky is built for conky 1.10 so if you are using the old conky you may need to either build your own or translate mine back.

I need to rebuild the script to have a main method and take arguments so that it could do stuff like:
1. Specify the colors to use in the output to conky.
2. Specify the url to process. This would allow it to run against other atom or rss feeds.
3. Specify the number of posts to display in the conky.
4. Potentially specify the message length, though I find 255 to be a sane default.
5. Set the thickness of the ${hr} output to conky.
Currently, all of these options are specified as global variables at the top of bl-rss.

The Shell Script: run-rss

#! /bin/bash
$HOME/bin/bl-rss | fmt -t -w 68
exit 0

The Python Script: bl-rss

#! /usr/bin/python
# An rss and atom feedparser for BunsenLabs
# Hopefully ideal for running in a conky
# Author: William Bradley
# BunsenLabs Forum Handle: tknomanzr
# License: wtfpl. Use this script however you see fit.
# This is an exercise in learning python for me.
# This script can also be found at:
# TODO: build this into a proper command-line tool and allow the user to
# set options for colors, number of posts, hr thickness and the feed
# url to parse from the command line.
import sys;
import feedparser
import re
bunsen_labs_url = ""
posts = []
color_1 = "${color 0047ab}"
color_2 = "${color FF4500}"
hr = "${hr 1}"
alignr = "$alignr"
regexp = "&.+?;"
numposts = 6
title_list = []
description_list = []
name_list = []
# Get all the posts published by the feed and load them into
# dictionary object posts
def get_posts():
	feed = feedparser.parse(bunsen_labs_url)
	for i in range(0,numposts):
		posts.append({'title': feed['entries'][i].title,
		'description': feed['entries'][i].summary, 
		'name': feed['entries'][i].author})
	return posts
# Clean a string up, removing html and markdown
# and unescaping html
def clean_html(temp_str):
	for i in range(0,numposts):
		#temp_str = temp_str.decode('utf-8')
		# clean html and markdown out of titles
		temp_str = re.sub(r'<[^>]+>', "", temp_str)
		temp_str = temp_str.replace("#!", "Crunchbang")
		temp_str = temp_str.replace("&#039;", "'")
		temp_str = temp_str.replace("&#160;"," ")
		temp_str = temp_str.replace("&quot;", "'")
	return temp_str
# print the lists
def print_lists(title_list, description_list, name_list):
	print color_1
	for i in range(0,numposts):
		print title_list[i]
		print hr
		# print the description
		print color_2 + description_list[i]
		print alignr + color_1 + name_list[i]
		print color_1 + hr
posts = get_posts()
for i in range(0,numposts):
	# pull a title out of the dictionary, then clean it
	temp_str = posts[i]['title']
	temp_str = clean_html(temp_str)
	# add the title to the list of cleaned titles
	# pull a description out of the dictionary, then clean it
	temp_str = posts[i]['description']
	temp_str = clean_html(temp_str)
	temp_str = temp_str[:255]
	# add the description to the list of cleaned descriptions
	temp_str = posts[i]['name']
	temp_str = clean_html(temp_str)
	# add the name to name_list
# print the output	
print_lists(title_list, description_list, name_list)

The Conky: wtfbox-rss.conkyrc

${font Exo-Bold:size=9}${color 0047ab}BunsenLabs: ${hr 4}
${font monofur:size=9}${execpi 1800 /home/tknomanzr/bin/run-rss}

Final bits
All scripts will be published to my github shortly. Please see my signature for it's url. Given that the script interfaces with the www in a non-secure manner, I feel like I will keep mine located in my home directory in the odd event something went south on me. I am also not going to guarantee 100% correct formatting as parsing anything www based is a complex topic. I will attempt to continue to update as I notice issues arise.


Board footer

Powered by FluxBB