posterous

Prior posterous?

Posted on Updated on

So Posterous has been acquired by Twitter. Great. And Posterous Spaces will remain up and running without disruption. Great.

But then I read Is Twitter About to Ax Your Posterous Account? and Twitter has acquired shortform blogging company Posterous, Spaces will remain up and running for now writing:

“Twitter says that it will give users “ample notice” if it is going to make any changes to the service. We’ll take them at their word on this one, but if I was someone running a personal blog on Posterous, I would think about finding another place to host it soon.”

“So, in other words, Posterous will be available to you now, but we’ll let you know if we plan on shutting it down. That must be a fairly likely scenario to warrant that language being included in the initial announcement of the acquisition.”

hmmm…

Advertisements

Mining my Posterous blog: API, XML and plot

Posted on Updated on

Nielsen2011python_posterous

In our Responsible Business in the Blogosphere project we are mining the blogosphere. So far we have mostly considered the microblogsphere represented by Twitter. We got two research articles on that topic: Good Friends, Bad News – Affect and Virality in Twitter and A new ANEW: Evaluation of a word list for sentiment analysis in microblogs.

One of the reasons why we focus on Twitter is that the data we can get is structured. You get a structure in the JSON format back from the Twitter web service that is easy to handle. General blogs are somewhat more difficult to handle: First you need to find the blogs and then you need to extract the relevant data from the webpage. This is not particular easy though some interesting tools exist for these tasks.

There are some blogsites that provide relatively easy structured
information. The blogsite that I use, Posterous, provides an API that let users and programmers download information. There are actually two versions: The old one provides information in the XML format the other newer in JSON.

In my initial effort to make something useful I looked on my own blog using the old API. You need to call a URL with something like:

http://posterous.com/api/readposts?hostname=fnielsen&num_posts=50&page=1

XML is returned. I did not manage to parse the XML in a structured way (using standard Python libraries) but used an ad hoc approach to turn the XML into a JSON-like structure with the numerical fields converted to numbers and the ‘body’ field with the actual text maintained as HTML. Apart from the postings themselves there are substructures for comments and media files that you might want to handle.

In this first application I manage to plot the number of views of each blog post as a function of the date. The two articles that got the most views are one about the Milena Penkowa case, the other with with Natalie Portman that was in the news due to her recent film Black Swan. Earlier articles that received a substantial numbers of views – more than normal – were more nerdish accounts of my problems with Ubuntu. My most recent articles have a fairly low number of views. I have several theories why that is.

How easy it is to crawl all Posterous blogs I do not yet know. Compared to Twitter the data you get are less social. In Twitter you have loads of retweets and direct messages between users that you can analyze. In Posterous you do have what corresponds to friends and followers by what is called ‘my subscriptions’. You also have comments.

The Python code that does the plotting is here:

views = [ p['views'] for p in posterous ]
now = datetime.datetime.now(pytz.utc)
dates = [ dateutil.parser.parse(p['date']) for p in posterous ]
since = [ (now - date).days for date in dates ]

plot(since, views, 'yo-')
ylabel('Views')
xlabel('Days since now')
title('Views on Posterous')

for (d,v,p),a in zip(
filter(lambda (d, v, p): v > -6.64 * d + 5000, zip(since, views, posterous)),
['left', 'left', 'center', 'left']):
text(d,v,p['title'][:31] + '...', horizontalalignment=a)

The last for-loop is at least PG-13 rated and should not be attempted at home.

This is a test: Python program in Posterous with Markdown

Posted on

I am using Posterous as the blogging software andsometimes I include computer code in the blog post. Posterous doesn’t likethat. I have included code in the ‘’ tag, but Posterous formats thatin an ackward way. My previous post on Twitter retweet analysis wasformatted wrongly. Posterous claimsto support the Markdown language,so I tried to edit and insert the markdown tag in the raw HTML, but thenit went completely wrong: My code and results were erased!

Now I will try to include the erased code in this blog, submitted by emailand formatted with markdown. According to some markdown documentationcode needs to be indented at least four characters, so that is what I willdo:

from __future__ import division import pymongo from re import compile, search, IGNORECASE, UNICODE connection = pymongo.Connection() db = connection.twitter tweets = db.tweets pattern_url = compile(r"http://", IGNORECASE) stringpatterns_retweet = [ r"^RT @", r"^RTb", r"bRTb" ] patterns_retweet = [ compile(s, UNICODE) for s in stringpatterns_retweet ] total = 0 withurls = 0 retweets = [ 0 ] * len(stringpatterns_retweet) retweets_withurls = [ 0 ] * len(stringpatterns_retweet) for tweet in tweets.find({"delete": {"$exists": False}}):     total += 1     if search(pattern_url, tweet.get('text', '')):         withurls += 1         urlpresent = True     else:         urlpresent = False     for n in range(len(patterns_retweet)):         if search(patterns_retweet[n], tweet.get('text', '')):             retweets[n] += 1             if urlpresent:                 retweets_withurls[n] += 1     if not total % 10000:         print(""" Total         %23d    100.0%% With URLs     %23d    %5.1f%%""" % (total, withurls, 100*withurls/total))         for n in range(len(patterns_retweet)):             print("""Retweet %20s  %7d    %5.1f%% of total Retweet with URLs %10s  %7d    %5.1f%% of total                                          %5.1f%% of retweets                                          %5.1f%% of tweets with URLs""" % (                     stringpatterns_retweet[n], retweets[n],                     100*retweets[n]/total,                     stringpatterns_retweet[n],                     retweets_withurls[n], 100*retweets_withurls[n]/total,                     100*retweets_withurls[n]/retweets[n],                     100*retweets_withurls[n]/withurls))

Yet another social media. That’s posterous.

Posted on

I hear of posterous from nitoen of overskrift.dk. It seems to be yet another social media website, but now with some kind of (more tightly?) email integration. I created a website as fnielsen.posterous.com. Lets see what happens when I send an email to post@posterous.com.