November 29, 2007

Amazon Web Services

For the first time in a long time I have started to take a serious look at how the content for holdenweb.com is generated. The site is currently pretty much a testbed, and everything is generated from a database (even the occasional pieces of dynamic content, though most pages are simple static HTML). This allows us to ensure that all pages have a uniform style and that all the internal navigation links are consistent.

A couple of the content elements are generated programmatically. We create the Python news items by reading bookmarks I have tagged as python in my del.icio.us account (using methods to be described, I hope, in an upcoming Python Magazine article), and the books by searching for python using amazon.com's web services. Recent attempts to regenerate the content, however, have resulted in broken links to Amazon's site. Being Amazon they don't throw up an error page, they just try to sell you other products, but I don't want to piss our readers off with irrelevant links so I had to fix the code.

The issue causing the error turned out to be a change in the way that Amazon's site names its graphics files. When I first wrote the code all image names began with the product's ASIN (Amazon's product code), but now they get random-seeming names like 21rxsZ884SL.jpg. Since I was grabbing the leading digits and assuming that gave me the ASIN it's little wonder that the links were coming out wrong. During this effort I realized that I was using the API definition from November 10, 2004. Much to Amazon's credit the calls still work, despite the fact that the API seems to have been revised about twenty times since then.

The books have also become less relevant as more and more Monty Python content is issued, along with books about keeping snakes and the like, so I decided to hone up the search a little. It turns out that all I needed to do was add BrowseNode=5 to my request to limit it to computer and Internet books. Lo and behold, my test site is now populated with relevant literature, so after a few content tweaks I can republish the site and hopefully see the Amazon commission start to climb again. (Hope springs eternal). The new content probably won't be published until the weekend as there are a couple of other tweaks I'd like to make.

Since I was revising the code anyway I threw out the old expat-based parser and replaced it with code that used the friendlier ElementTree module, now a part of the Python standard library. This reduced the line count by about 35% and making it simpler, more robust, and easier to understand at the same time. I was quite amused to note that Amazon provide web service libraries for Java, C# and Visual Basic programmers. In Python the standard library already contains everything you need. We don't need no stinking libraries (though I am sure it would be easy to provide the same features that Amazon's libraries do from a Python library module).

Reading the updated specifications for Amazon web services has made me realize that there's a lot more you can do with them now than there was three years ago, so I shall be revisiting this topic before long. If you're doing neat things with Amazon and Python I'd be delighted to hear from you.

3 comments:

Bill Mill said...

Steve,

I write Amazon web service code all day. I recently shared my favorite AWS snippet on reddit: http://programming.reddit.com/info/5zcv3/comments/c02bz8v .

Steve said...

Oh. My. God.

I'm guessing the person who designed that response doesn't really understand XML at all.

Thanks for the warning!

Bill Mill said...

As long as you're not using the Inventory Management Service, which to be fair isn't really AWS but it is *an* Amazon Web Service, you should be fine :)