Skip navigation.
Home

Xanga malformed RSS, Conversion filters to the rescue

Blogging | Semantic Web

I have been using liferea (LInux FEed REAder), as my primary RSS/Atom news reader for quite a while. I am not certain that it is the best reader in Linux, especially since its an early project in rapid development with frequent breakages, but generally it has provided more features than others I have tried. I tried to use Straw, its more stable cousin, but was already too addicted to Liferea's group management, and favicon eye candy. rssowl, a java app that was quite featureful, has a clever interface that would open a new tab for each feed read. However, it became more difficult for me to quickly arrow down through news. Perhaps with some settings changes it could behave like liferea, but my patience for it wasn't there. Another "problem" with rssowl, was that it was quite strict in interpreting the xml from rss feeds. Which alerted me to a problem.

Many sites are using malformed XML for their RSS/Atom feeds! If you have ever tried to maintain a multiuser blogging site, perhaps you have also found out how difficult it is to declare your site as XHTML 1.0 compliant, if you allow your users to use HTML tags in their posts, as on this site. I would suggest in those cases, to add a check xhtml 1.0 link and encourage advanced users to click on it to verify their post. Its starting to matter more, in a world with many alternate browsers and feed readers, standards compliance is essential.

That being said, a major offender is Xanga. Xanga is a blogging site, that seems to appeal to an audience who perhaps doesn't even know what blogging is when they started. Anyway, I had a couple friends who were using it, and was hard pressed without the help of google to find that it offered RSS feeds. I found that it works with the following convention..

http://www.xanga.com/rss.aspx?user=username

However, I could never get my news reader Liferea, to update the feed. No error messages were shown, just said "no new items". I ignored it for quite some time, and also was frustrated that friends who were using Straw, didn't have any problem reading the posts. I finally decided to look into it this morning and found the problem. See if you can find the problem with the following xml:

<rss version="0.91">
<channel>
    <title>User X Xanga Site</title>
    <link>http://www.xanga.com/UserX</link>
</channel>
<item>
    <title>7/24/2004 6:57:55 PM</title>
    <link>http://www.xanga.com/item.aspx?tab=weblogs&user=UserX&uid=0000000
    </link>
    <description>
    First post.
    </description>
</item>
</rss>

The <channel> tag is closed before the item list. According to the RSS 0.91 spec everything pertaining to an individual channel should be contained within the channel tag.

A workaround for Liferea.

I found a Xanga Conversion Filter which can be used to correct the problem with Liferea. Liferea can use an external filter to access feeds in unsupported formats. Edit your feed and check "Use Conversion Filter" and point to the the perl script saved from the above source. Perl to the rescue once again. I can now finally read Xanga feeds from Liferea. Should an RSS reader follow the rules strictly? Or should it attempt to correct bad rss or xml? Not sure but, but with auto-correction we may never find the problems and report them. I guess I should inform Xanga of this problem. I am wondering if they will care, considering in 10 minutes of perusing their site, including their FAQ I can't find any mention of RSS.