udim ([info]udim) wrote,
@ 2006-11-06 12:28:00
Previous Entry  Add to memories!  Tell a Friend  Next Entry
Tiny Mix Tapes to ATOM
The ATOM feed: http://stuff.pulkes.org/tmt2atom.php
php source

So I wasted a Saturday creating a website-to-rss php script for sites that don't have rss. Anyway I went back and forth between trying to use the XML parser, writing my own HTML parser, and trying to find an already written HTML parser:

  • Trying to parse HTML as XML doesn't work. Even if you strip most of the tags and add a dummy enclosing tag. XML is just too anal (at least PHP's) and most HTML is buggy (unescaped &'s for instance).

  • html tidy wasn't compiled in with dreamhost's php, and when I tried rolling my own I found they didn't have libtidy installed and I decided to give up on it.

  • Writing my own parser, I couldn't shake the nagging feeling that I was reinventing the wheel. Also, it didn't take long (only a couple of hours, but I have MANY hours to spare) to reach the first hurdle: PHP is SLOW! And then I remembered that PHP's XML parser uses libexpat, which is written in C, and it all went downhill from here.

I saw the light when I gave up on making a general script that could fit any website and resorted to old-fashioned regular expressions. Much faster, and it still ended up being a general script.

Now I just have to wait for TMT to update to see if it really works.

Update: 1. It works. 2. JWZ already did this a long time ago.

keywords: atom, rss, feed, tinymixtapes, tiny mix tapes, tmt2atom


Advertisement


(Read 3 comments)

Post a comment in response:

From:
Help
Identity URL: 
Username:
Password:
Don't have an account? Create one now.
Subject:
No HTML allowed in subject
   Help
Message:

 
Create an Account
Forgot your login or password?
Login w/ OpenID
English • Español • Deutsch • Русский…