Skip navigation.
Home

Blogging

Xanga malformed RSS, Conversion filters to the rescue

Blogging | Semantic Web

I have been using liferea (LInux FEed REAder), as my primary RSS/Atom news reader for quite a while. I am not certain that it is the best reader in Linux, especially since its an early project in rapid development with frequent breakages, but generally it has provided more features than others I have tried. I tried to use Straw, its more stable cousin, but was already too addicted to Liferea's group management, and favicon eye candy. rssowl, a java app that was quite featureful, has a clever interface that would open a new tab for each feed read. However, it became more difficult for me to quickly arrow down through news. Perhaps with some settings changes it could behave like liferea, but my patience for it wasn't there. Another "problem" with rssowl, was that it was quite strict in interpreting the xml from rss feeds. Which alerted me to a problem.

Many sites are using malformed XML for their RSS/Atom feeds! If you have ever tried to maintain a multiuser blogging site, perhaps you have also found out how difficult it is to declare your site as XHTML 1.0 compliant, if you allow your users to use HTML tags in their posts, as on this site. I would suggest in those cases, to add a check xhtml 1.0 link and encourage advanced users to click on it to verify their post. Its starting to matter more, in a world with many alternate browsers and feed readers, standards compliance is essential.

That being said, a major offender is Xanga. Xanga is a blogging site, that seems to appeal to an audience who perhaps doesn't even know what blogging is when they started. Anyway, I had a couple friends who were using it, and was hard pressed without the help of google to find that it offered RSS feeds. I found that it works with the following convention..

http://www.xanga.com/rss.aspx?user=username

However, I could never get my news reader Liferea, to update the feed. No error messages were shown, just said "no new items". I ignored it for quite some time, and also was frustrated that friends who were using Straw, didn't have any problem reading the posts. I finally decided to look into it this morning and found the problem. See if you can find the problem with the following xml:

<rss version="0.91">
<channel>
    <title>User X Xanga Site</title>
    <link>http://www.xanga.com/UserX</link>
</channel>
<item>
    <title>7/24/2004 6:57:55 PM</title>
    <link>http://www.xanga.com/item.aspx?tab=weblogs&user=UserX&uid=0000000
    </link>
    <description>
    First post.
    </description>
</item>
</rss>

The <channel> tag is closed before the item list. According to the RSS 0.91 spec everything pertaining to an individual channel should be contained within the channel tag.

A workaround for Liferea.

I found a Xanga Conversion Filter which can be used to correct the problem with Liferea. Liferea can use an external filter to access feeds in unsupported formats. Edit your feed and check "Use Conversion Filter" and point to the the perl script saved from the above source. Perl to the rescue once again. I can now finally read Xanga feeds from Liferea. Should an RSS reader follow the rules strictly? Or should it attempt to correct bad rss or xml? Not sure but, but with auto-correction we may never find the problems and report them. I guess I should inform Xanga of this problem. I am wondering if they will care, considering in 10 minutes of perusing their site, including their FAQ I can't find any mention of RSS.

MKadera Enters the Semantic Web

Blogging

IMT developer Melissa Kadera has put up a weblog, and has recently discovered the joys of RSS feed readers or news aggregators. She wrote up some good information on resources for getting started in this space.

What is the Semantic Web you ask?

The semantic web is a universal medium for the exchange of data... smoothly interconnecting personal information management, enterprise application integration, and the global sharing of commercial, scientific and cultural data.

The principal technologies of the Semantic Web fit into a set of layered specifications. The current components of that framework are the RDF Core Model, the RDF Schema language and the Web Ontology language. These languages all build on the foundation of URIs, XML, and XML namespaces.

For more information, see the W3C's Semantic Web Activity Statement.

Blogs

Blogging | Help

You each will now have your own blog. Unlike what you may think about the world of teen angst blogging, this method of content creation is useful and is the main mechanism by which you will post intelligent content to this site. More than journaling, this is how you will write stories relating to areas of research and industry news that you want to share with the AWG or IMT in general. It is your platform to voice your expert opinion on various IT topics. Simply select a Topic appropriate for you content, you can even select multiple topics if they apply. Just want to share a link and a blurb? Use the Web Link node type under "create content". Have an image of an architectural diagram that explains your design, create an Image Node Type, or simply link to the image via html. When you post a blog entry you can choose whether or not you want AWG members to be able to comment on what you posted. The comment system allows for threaded discussions below the topic. These posts are not nodes in themselves, but can be linked to with an anchor tag. If you have a long post, it is better to respond to a blog entry with another blog entry rather than a comment. This is especially true if you would like your post to be world viewable since the comments are not world readable. But for conversations around a topic the comment system is effective.

XML feed