Monthly Archives: December 2005

All My RSS Subscriptions

“Hi, my name is John, and I’m addicted to news.” It’s true, I used Netscape’s old portal because they allowed you to pull RSS feeds and put them in boxes that were of the same status as blocks of information they supplied. After Netscape lost interest in that, I wrote my first RSS aggregator HotSheet because I wanted to pull a dozen or more feeds and get them in one “stream of news”. Now it’s about five years down the road from the earliest RSS stuff I can remember and RSS feeds are available for just about every website that updates regularly. Many people get them automatically with their weblogs simply because the tools they use (WordPress, Blogger, etc.) create them by default. So I monitor a large number of weblogs and other news sites via RSS and occassionally people ask me for a complete list of everything I subscribe to. So now you are going to see the depth of my addiction…
I roughly divide these up into the following categories: Comics, Java, Podcasting, Weblogs, and Misc. I’m using JetBrains Omea Reader 2.0 to read them right now but I’m not really enamoured with it. It’s just better than several others I’ve tried. How’s that for an endorsement 🙂
Here’s the complete list of everything I’m subscribed to as well as an OPML file containing all the URLs to subscribe to the various RSS feeds:

Is it any wonder I want to create next generation applications that filter down this mess to something you can actually read? Some channels barely have any traffic at all, others have a dozen or more per day. As an aggregate I’d guess there is more than a hundred items per day from them at the moment and it’s slow because of the holidays. If you add a feed like JavaBlogs to this mix you can figure on several hundred more just from it because it aggregates more than 1500 weblogs together itself.

Paper Bookmarks

I like these little page corner bookmarks. They are easy to make and you could do some clever custom variations with little effort.
I printed some on 110lb. cardstock and while they seem nice and very durable, I think they might be more likely to fall off the corner of a page because they are so stiff and solid. In this case a flimsier paper alternative may be a better choice. I intend to manufacture some to see.
While I’m talking about paper I’d like to plug two more paper project related things. One is the awesome pair of scissors I use. They are made by Fiskars and you can see them in the photo. There is nothing unique about small bladed sharp scissors intended for cutting small projects, what sets these apart is that the handles are actually large enough for my hands. I don’t know who they intend most fine point scissors for, but they are positively medieval in their design for non-Lilliputian people. There is never any padding, just a tiny metal loop for you to squeeze a finger into and cut cut cut. I’ve used little detail scissors like that before for projects like the paper automata cupid but my beloved Fiskars would make that same experience faster and ten times less painful to do all over again.
The other thing I’d like to plug is Jaime Zollar’s Paper Forest. It’s a cool weblog devoted to paper automata, cardstock models, etc. with regular postings of neat projects. It might not be as complete as some of the link farms you see out there for paper projects but he makes up for it by highlighting some of the most interesting things and offering pictures of most everything he talks about.

New Version of Audacity Will Be More Podcast Friendly And Thoughts On Podcasting “Helpers”

A lot of people already use Audacity for sound recording and/or for editing of podcasts. So any new release, especially one with some new features for podcasters is likely to be a big deal. The new version is still considered to be very early and too crash happy to be really used, it’s intended more as a technology preview of new features including:

  • Various changes to make editing easier and the UI friendlier
  • FTP upload of files directly from within Audacity

Sadly though, there’s still no way to set the genre to “podcast” in the ID3v2 data from within Audacity when you are exporting your MP3 files. Thus most podcasters will still end up using iTunes or some other program to set the type, etc. and that invalidates the idea of doing FTP directly from Audacity. It’s just not a one stop shop.
So that brings us to podcasting tools like EasyPodcast. You pick the MP3 file, it applies tags to the MP3 as well as a logo, it creates the RSS, it uploads both the RSS and the MP3 file.
Personally, I think this is a great idea, sort of. I don’t need the RSS generation, that’s being handled for me by Blogger and FeedBurner but I can see a whole host of little utilites that could all be hooked together for you to progress through to prevent errors and streamline the process. Right now I record with Audacity, then edit (also with Audacity), use iTunes to apply ID3v2 tags to the resulting MP3 file, gather the times of various events in my show, create a Blogger entry to go with the show, upload my show via FTP, and finally post my new Blogger entry to my weblog and the show is done. In the background FeedBurner notices the new weblog entry and updates the RSS feed it is providing to anyone interested in the show. Seven steps plus an automated one. I’ll bet others have even more.
Software to take you through all of that could be cool. Especially if you have multiple people having to upload things or several people working on the same show. But do I want a special tool for podcasters that includes a dozen different podcasting steps and I select and order only the ones I want (not just the three that the EasyPodcast guy thought I needed) or would I be better off with a tool that was all about creating the workflow?
But then that thought leads to the Automator software that ships with Mac OS X. Because, I mean, it’s all about the workflow. They just realize that a simple linear workflow is not that complicated a thing and with steps that can be written by anybody we can have simple ones (rename this MP3 file) to highly sophisticated ones (scan the file for dead air and isolated “um” and “uh” noises and strip them from the file). Then I can mix those into any workflow I can come up with. Perhaps even one that includes show prep or post show activities. Why doesn’t a simple workflow tool like that exist, doing one in Java that would work on any major OS would be fairly trivial and tools like the Java Plugin Framework should make it pretty easy? Good question.

Celestron SkyScout Answers “What Star Is That?”

It will likely be insanely expensive (it incoporates optics, a small computer, a GPS, and additional sensors) and it might not even work, but the SkyScout is an insanely great idea for a product.
With it you can just look at a star or other sky object and it will tell you about it using both text and audio or you can reverse the process, picking the object you are interested in from a built in list of several thousand and the SkyScope will guide you to it. USB connectivity and the ability to take memory cards (SD format) allow it to be updated and expanded.

Nostalgia Looks Better On Paper Sometimes

Since the Atari 2600 represented not only the first real console game that was a mega-seller and also thousands of hours of gameplay for me personally, I jumped at the chance to buy an Atari Flashback 2 recently (under $30 retail). It has two original looking joysticks, the feel and weight don’t seem quite the same but they look right, and the new console is much smaller and replaces the old metal toggles with plastic buttons but it even incorporates the small strip of faux wood that was on the original! In general it resembles a 2600 which went through a shrinking machine.
It comes with 40 games built in (20 old and 20 new) and even though the various paddle games were some of the best, there’s no paddles or paddle games. Why we needed 20 new games rather than 20 more classics to try and improve the overall selection is my first problem with the console. More Activision titles and some more 2600 classics would have boosted the quality. As it is though, you can play Combat, Missle Command, Pitfall, Asteroids, Adventure and many others. They don’t include a full manual in the box, just a few pages to tell you how to plug it in and get started, but Atari has a full manual in PDF form on their web page that describes each game individually.
Now, before I say what I’m about to say, I want you to know that there are lots of old games which are just as much fun today as they were 20+ years ago. Galaga, Ms. Pac-man, Donkey Kong, etc. are still great games so just age alone doesn’t make a game bad. But most of the games on the 2600 are just bad. The console had such profound limitations in the size of game programs that gameplay really suffered. It’s not the crude sound capability or the even cruder graphics which hold it back, it’s the games themselves which are weak.
I played Sonic The Hedgehog recently and it was just as much fun today as when it came out. Most of the 2600 games I played were not. I guess we were far more desperate for entertainment then and we saw more of the potential in the medium than we saw what was actually produced.
Sometimes something is just much better in your memory than in really was… My Flashback 2 went back to the store and I got my money back. I guess I’ll have to wait for the Super Nintendo in a box instead.

Data Processing On A Huge Scale: Google’s Story

Years ago, I naively thought that Google somehow had amazing machines and software that managed to do most everything in real-time even though the huge amounts of data they process pretty much preclude doing any such thing if I had bothered to think about it rationally. I imagined that they were processing each site they crawled as soon as they found it and into the search engine it went. Each news item from RSS was similarly fed straight into an index and made available immediately and no batch processing of reams of data was done.
Fortunately, such magical thinking has not persisted. Google does not use elves in a hollow tree to produce their results, they use intelligent engineers and many of the same tools available to you and me. They have developed all kinds of innovative solutions in order to be dealt with the huge amounts of data they have. Those solutions include:

  • Building a truly enormous array of commodity PCs on which they run Linux to handle the computing needs for all of Google. When individual computers fail, their software simply shifts the workload to other functional machines. Supposedly, they buy large quantities of parts in bulk and make their purchases in a variety of ways to avoid being gouged by vendors.
  • They created a distributed filesystem that spreads all files across hard drives on three separate machines in order to reduce the chance of failure causing loss of data.
  • Built software that makes it easy to handle machine failures, distribute computing tasks across a large number of CPUs, etc.

The best thing about all of this is that they haven’t been particularly quiet about how they do a lot of it. For example, if you go to their Research Publications site you’ll see papers about The Google File System and Web Search for a Planet: The Google Cluster Architecture.
Now, I’m not going to snow you on this, if you aren’t of a technical bent, this stuff is going to be a hard boring slog. Michael Chabon it’s not. But, if data analysis of truly ginormous data sets interests you, then you want to read their paper on MapReduce: Simplified Data Processing on Large Clusters [PDF].
It’s all about how they split up many data analysis processing in such a way that it is easy to write the algorithm to process the data and not spend time worrying about hardware failures, how many machines you might be allocated to run your software, or how to optimally use those machines to get the data processed in the least amount of time. Instead, it forms a kind of support system that reminded me of using the genetic programming package JGAP. I’ll talk in a future entry about how JGAP can make it easy to find optimal or near optimal solutions for problems that would be tedious or impossible for humans. But the important thing it did was to make it easy for me to focus on the specifics of my problem and not on the mechanics of a framework. MapReduce is one of Google’s means to achieve that same kind of focus and I think it makes for a really interesting read.
The Java Nutch project includes a Java version of MapReduce and a distributed file system that you could use as part of your own huge data set processing so reading these articles isn’t just an academic exercise. You can actually put this to use if you have a project that needs it. Be sure to check out the wiki for the Nutch project for more helpful information.

Jive Messenger Becomes Wildfire Server And Gets A Speed Boost

I recently mentioned that we installed Jive Messenger at work to get a good instant messaging server that we could control and which didn’t result in important conversations leaving the building to talk to distant servers. In the time since I wrote that, Jive Messenger has been renamed to Wildfire Server and it has had a dramatic speed improvement. Jive Software: Wildfire Optimization is an article briefly detailing the optimization Jive Software did for the new version of the server and might be instructive if you haven’t done optimization on a Java project before.

It’s Not Highbrow Humor But Google Video Can Be Seriously Funny

The Internet has seen a large set of text, image, and video items which circulate around through email and forums for years and years. Some of it is urban myth, obscene, funny, strange, amazing, and everything in between. But videos typically don’t get passed around as much in email just due to their sheer size. Google Video has become a catch all for these videos so you can just point to them and everybody can enjoy. Here’s a few of my favorites.