01/19/10

Sphinx 0.9.9 Review, A Cautionary Tale

After my previous raves about Sphinx in general and Thinking Sphinx in particular, I was excited to get my hands on the new Sphinx 0.9.9 release that was finally made available at the beginning of December via the Sphinx Search site.

Given that our Sphinx usage is what I think would fall the “advanced cases” heading, I expected probably a day or two of upgrade headaches before we’d be back on track. Worth it, said I, for the potential to get working index merge, which could set the stage for indexes that happen more often than once every four hours (our current index takes about 3 hours, plus time to transfer files between the building Sphinx machine and the search daemon machines).

Alas, our upgrade did not go according to plan.

This Monkey Patches Going to Heaven

Given how prompt Pat Allen (creator of Thinking Sphinx) has been in addressing and fixing bugs in the past, I don’t doubt that many of our upgrade headaches from the TS side will probably be fixed soon (if not already, since I emailed him most of our issues). That said, we required about five monkey patches to get the most recent version of TS with 0.9.9 working the same as our previous TS with 0.9.8 did. The patches ranged from patching the total_entries method (if search can’t be completed it throws exception) to real time updates not working (via client#update), to searches that used passed a string to TS where it expected an int throwing an exception.

This does not include “expected” differences, such as the fact that search is now lazily evaluated, so if you previously wrapped your search statements in a begin-rescue block to catch possible errors, your paradigm needs to shift.

It also appears that the after_commit plugin bundled with TS 0.9.8 has been modified such that it does not available to models in our project by default. Never figured out a fix for that bug, since by the time I noticed it, I had also become aware of an even bigger 0.9.9 detriment: overall performance. Reviewing our New Relic stats since we updated to 0.9.9, we have found an across-the-board decrease of about 50% to our Sphinx calls. I parsed the Sphinx logs to try to ascertain if the slowness was spawning from Sphinx or TS, and it appears to be Sphinx as the main culprit.

Performance

TS 0.9.8

Considering 290227 searches.
Average time is 0.0318751770166445.
Median time is 0.005.
Top 10% average is 0.193613017710702 across 29022 queries

TS 0.9.9

Considering 843569 searches.
Average time is 0.0430074540435297.
Median time is 0.006.
Top 10% average is 0.286621461425376 across 84356 queries

Many of our queries take 0.00 or 0.01, so the median doesn’t look too much different between the two, but the average time (which is what New Relic picks up on) is 35% slower in Sphinx alone, and about 50% slower once all is said and done. An action on our site that does a Sphinx search for similar items (and nothing else) consistently averaged 200 ms for weeks before our upgrade, and has averaged almost exactly 300 ms for the week since the upgrade.

Conclusion: Proceed with Caution

Would be nice if I had more time to debug why this slowness has come about, but the bottom line for us is that, after spending about 3 days patching TS to get it to work in the first place, and with the “after_commit” anomaly still on our plate (not to mention overall memory usage increasing by about 20%), I have ultimately decided to return to TS 0.9.8 until such time that a release of Sphinx is made available that speaks directly to its performance compared to previous versions. I think the Sphinx team is doing a great job, but amongst juggling the numerous new features, it seems that performance testing relative to 0.9.8 didn’t make the final cut?

Or there could always be some terrible misconfiguration on our part. But given that we changed our configuration as little as possible in moving from 0.9.8->0.9.9, if we are screwing up, I would say it is for perfectly understandable reasons.

A three day window of a pure search action. First two days with TS 0.9.9 average 300, yesterday after reverting back to 0.9.8 about 200 ms

A three day window of a pure search action in our app. First two days with TS 0.9.9 average 300 ms, yesterday after reverting back to 0.9.8 about 200 ms

01/10/10

Best Linux Git GUI/Browser

Since moving to Linux, I’ve also moved to Git (from SVN) and have found it to be a reliable friend that, as a technology, is a significant step up from SVN. But, as a usable productivity tool, I definitely felt the sting of Git’s “hardcore h4x0r” roots in its lack of a GUI that is in the same league as Tortoise SVN. rubyminegitcrop

But there is hope. And it comes from the unlikeliest of sources: my IDE, Rubymine.

Rubymine’s Git integration is superb. It supports hierarchical browsing of your current branch, in exactly the manner of Tortoise. It also offers:

  • Directory-based, graphical means to be able to revert changed files or directories
  • Ability to see a history of changes to a file (and return to an older specific version, if desired) along with one-click access to visual diff
  • Ability to mass merge files in different branches by batch selecting them and choosing a merge method (i.e., “Use branch A version” or “Use branch B version”… it also supports manual merging via a graphical merge tool)
  • One click comparison of your current file to the current repository file

In a nutshell, nearly all of the efficiencies that TortoiseSVN provided as a graphical source control tool for Subversion, Rubymine provides for Git. With one exception — that I have implored the creators of Rubymine to consider adding — the ability to see the history for a directory (rather than a file) within your project. Knowing the crack team of Rubymine developers, the feature will probably be on the way soon, but even before its arrival, they’ve still managed to build the best pound-for-pound Git graphical interface I’ve been able to uncover.

01/8/10

Git move commit to another branch

Rubymine’s stellar Git integration means that I seldom have to tinker with Git on the command line, but an exception to this is when I switch branches and forget to switch back before making my next commit. D’oh!

The answer is git cherry-pick.

The syntax is simple:

git cherry-pick [commit ID]

You can run “git log” to get the ID of your commit. You can also use a graphical tool like Giggle, which lets you see all commits to all branches.

If you had the misfortune of your checkin not being on any given branch, you can run “git reflog” to see all checkins to all branches, and merge your master branch with the fake branch that your checkin went to. See Stack Overflow for more details on what to do in this scenario.

01/7/10

Javascript play sound (wav, MP3, etc) in one line

Alright, enough with the 10 page tutorials from 2006 describing detailed browser-specific implementation tricks to get sound playing. Google needs a refresh. In my experience in the 2010′s, here’s all you need to do to make a simple sound play in your web page:

Step one: add an element to your page

<div id="sound_element"></div>

Step two: play a sound via that element

document.getElementById("sound_element").innerHTML= 
"<embed src='"+sound_file_url+"' hidden=true autostart=true loop=false>";

Or, if you’re using jQuery:

$('#sound_element').html(
"<embed src='"+sound_file_url+"' hidden=true autostart=true loop=false>");

I’ve tried it in Firefox, Chrome and IE and it works like a charm for me. I’d imagine that your user has to have some basic sound software installed in their computer, but at this point, I’d reckon 99% of users do.

Feel free to add to the comments if you find any embellishments necessary to get this working.