Google has a regular internal lecture series on mostly technical topics with guest speakers and some of their own employees. This is their TechTalks Series and the best part of it is that you and I can see them too. They record and digitize the talks (close to 300 of them to date) and make them available on Google Video, using a consistent tag on each one so you can easily search for them and see if any interest you. The link above will do the search and show you all the videos so far.
Some of the titles that caught my eye:
Privacy Preserving DataMining
Turning Email Upside Down: RSS/Email and IM2000
Strike Up The Brand: How to Design for Branding
Ruby And Google Maps
Ruby Sig: How To Design A Domain Specific Language
Note: I’m not endorsing any of these, I haven’t had a chance to view them yet. They just looked interesting to me.
Recently, there was a very popular weblog entry detailing the use of acts_as_ferret for adding search to your Ruby code. For a variety of reasons I detail below, I had considered acts_as_ferret and had decided to instead try acts_as_solr for the new version of LOL.com I’m writing.
Unlike Ferret, which is a port of the Lucene search engine to Ruby & C, Solr actually is Apache Lucene. It’s a separate server which runs under Java and your application sends XML requests to it in order to add new data, remove data, or make queries. acts_as_solr hides all the details of that so searching appears as seamless as if were all inside your application.
Installation of the plugin is as simple as:
script/plugin install http://opensvn.csie.org/acts_as_solr/trunk
You can include “acts_as_solr” in model objects you want to be searchable. You can even use similar syntax to what you would use for acts_as_ferret to perform basic operations:
Model.find_by_solr(query) or Model.find_id_by_solr(query)
The acts_as_solr page I linked to above will give you all the details but it’s hardly any different from acts_as_ferret in either installation or actual use and getting and installing Solr is quite easy (launching the server is no more difficult than “java -jar start.jar”). The difference is all in how it is written and details behind the scenes.
Some Advantages I See To acts_as_solr Vs. acts_as_ferret
- I first looked into Solr and acts_as_solr when I heard from an acquintance that the project he was working on was using acts_as_ferret and their 3Gb database of search information had gotten corrupted. To my knowledge at least twice since then they have attempted to fix their problems but have still had corruptions occur. Lucene is used a lot of places and it’s pretty well tested chunk of technology. Solr is simply a wrapper for existing Lucene technology so it should be pretty stable.
- I posted a small bug with the installation to the Trac site for acts_as_solr a week or so ago and it and another that someone else posted have both been fixed since. Clearly it is in active development as are Lucene and Solr so you are depending upon projects which seem to be getting some development love. I have heard some complaints about Ferret not being updated in a while, though I have not confirmed that independently. The reason I haven’t been able to confirm it is because the Ferret website at http://ferret.davebalmain.com/trac has been down every time I tried to hit it in the last day or so. Hmmm…
- Because searching is a web service in this system, other websites or tools which need access to the same search database don’t have to be written in Ruby and you don’t have to graft a web service onto your own application to offer up remote queries. Just hook them up to the search server like you would hook them up to the same database.
- You can you can more easily move search onto another machine to transfer the searching load off the web servers running your main Ruby on Rails application. Again, this is treating your search engine as a self contained unit like your database server.
If you have had good or bad experiences with acts_as_ferret or acts_as_solr, particularly in environments with large numbers of individual items being put into the search engine or large amounts of data overall going into it, please post them as comments to this entry. I’m hoping to get this on Digg and DZone so people are aware that there are alternatives and those with some experience with either one or the other can speak up.