Content from 2013-10

An Introduction to SubToMe (for My Competitors)

posted on 2013-10-27

A few days ago I integrated TBRSS with SubToMe. The idea behind SubToMe is to “grow the RSS pie,” for the benefit of all feed readers, by creating an open equivalent to one of the most basic affordance of closed platforms like Tumblr, Twitter, &c. – the convenient “follow” button.

The clever thing here is that there is no server, no database, no protocol; SubToMe is a JavaScript application that runs in the browser, maintaining a list of feed readers in localStorage. When the user clicks a SubToMe button (or uses the SubToMe browser extension), a modal dialog appears with their list of feed readers; they choose, and SubToMe redirects them.

The process for registering an application with SubToMe is simple, but not quite as simple as it looks: I had to refer to the source code to settle some doubts. So, in the spirit of “growing the RSS pie,” here are some notes addressed to my present and future competitors.

SubToMe only needs two pieces of information: the name of your application, and a URL template to construct the redirect. In the case of TBRSS, for example:

name  TBRSS
URL   https://tbrss.com/subscribe?url={url}

The endpoint should take the URL given, and return a form for the end user to confirm their subscription. (Don’t omit the form: without the extra step you are performing a CSRF against yourself.)

I trust your application already has a name.

The template must take at least one of three parameters: {url}, the location of the page where the end user used SubToMe; {feeds}, a comma-separated list of feeds extracted from the page; and {feed}, the first of {feeds}.

But, of the three, only {url} is useful. SubToMe’s feed extractor is simpleminded – equivalent to link[rel~=alternate][href] – and of course it has to be, since browsers and servers limit the length of URLs.

Presumably you can do better. Just fetch the {url} and do your own extraction.

Since SubToMe runs in the browser, you will have to register your application once per user, with an iframe in your HTML:

<iframe style="display:none;" src='https://www.subtome.com/register.html?name=<Name of your application>&url=<url of the subscription handler>' />

For TBRSS, for example:

<iframe style=display:none src=\"https://www.subtome.com/register.html?name=TBRSS&url=https%3A%2F%2Ftbrss.com%2Fsubscribe%3Furl%3D%7Burl%7D\"></iframe>

(Remember to URL-encode your endpoint.)

You must serve the iframe repeatedly, since there is no way to check whether a user has already been registered. The iframe is cheap – SubToMe uses an appcache, so the only overhead should be a 304 Not Modified from the manifest. Still, a request is a request, so we only serve the iframe once per session.

Nothing in the design of SubToMe prevents you from simply registering every visitor. I’m not sure if this is intentional. The more feeds readers are registered, the more potentially useful SubToMe becomes; but there is no mechanism for the end user to prefer one feed reader to another. TBRSS stops at registering logged-in users.

The name, SubToMe (short for “Subscribe To Me” – nothing to do with tomes or submission) – is awkward; but of course we live in latter days, and all the good names were taken long before we came.

Expanding entries

posted on 2013-10-16

I’ve added an experimental feature: a button to expand entries from truncated feeds inline. It is far from perfect, but is already very useful.

Extracting the content from the soup of credits, ads, and comments that is a modern blog entry or news article is much harder than it looks, but there is a very good, very general solution which treats the notional writer as a Markov process. The whole paper is worth reading, but I’ll quote the following, because it is one of the insights behind the whole project of TBRSS:

The use of full sentences usually means the author wants to make a more or less complex statement which needs grammatical constructs, long explanations etc. Extensive coding is required because the author (sender) does not expect that the audience (receivers) understand the information without explanation.

To put it another way: the intention to communicate is something that machines can recognize, and very reliably, because when we actually want to communicate, we have a lot to say.

This blog covers lisp, code


Unless otherwise credited all material copyright by Paul M. Rodriguez