Slow MySQL query? I'll give you a hint.

Comments

I have a confession to make: I’ve placed too much trust in MySQL’s query planner. By the phrase, “too much trust,” I mean to say, “any trust, at all, ever.”

Simple Schema

So, consider the case of… Oh, pretty much any Rails application, anywhere. For purposes of this discussion, let’s consider a simple blog, with articles and comments. Here’s the schema:

    ActiveRecord::Schema.define do
    
      create_table :articles, :force => true do |t|
        t.string   :title
        t.text     :body
        t.boolean  :published, :null => false, :default => false
        t.timestamps
      end
    
      create_table :comments, :force => true do |t|
        t.belongs_to :article
        t.string     :email
        t.text       :body
        t.timestamps
      end
    
      add_index :comments, :article_id
    
    end

No rocket science here. Article has_many :comments and Comment belongs_to :article. We put an index on the article_id column in the comments table. There’s a good chance we’ll end up doing something like chaining from an association scope, which will result in a query for comments by article_id.

Simple Select

Now, let’s say we want to do something really simple, like post a list of the 3 most recent comments on our blog’s homepage. But we only want to include comments against published articles, just in case we make a mistake and need to “un-publish” one later – we don’t want comments against articles a person can’t read showing up on our homepage!

    Comment.joins(:article).
            where(:articles => {:published => true}).
            order('comments.id desc').limit(3)

This results in a query that looks like:

    SELECT  `comments`.* FROM `comments` 
    INNER JOIN `articles` ON `articles`.`id` = `comments`.`article_id` 
    WHERE `articles`.`published` = 1 
    ORDER BY comments.id desc 
    LIMIT 3

Surprising Sluggishness!

All goes well for a few months. You post a bunch of articles, some of which end up with huge comment threads, because you’re a master of your craft (or a master troll). Next thing you know, you’ve got 100 articles, around 20,000 comments, and a big performance problem. That simple query you wrote above is taking over half a second to execute, every single time. All of that time, just to retrieve 3 records? Something has got to be wrong with that. You fire up an EXPLAIN, and you see this in your query plan:

Using where; Using temporary; Using filesort

A temporary table and a filesort? Really?? Then you notice that the query planner is starting off with the articles table and using the comments table’s index on article_id for the join. You wonder for a moment if there is any likely scenario in which this is the most performant optimization plan for this query, given the ORDER and LIMIT you supplied.

Subselect or Supplementation

The fix is easy enough. You can split things into two queries, either using something like Valium to get the valid article_ids first, then eliminating the need for the join, or (preferably) using a subquery in the DB:

    # Rails 3.1 only (Or with Squeel)
    Comment.where(:article_id => Article.where(:published => true)).
            order('comments.id desc').
            limit(3)
    SELECT  `comments`.* 
    FROM `comments`  
    WHERE `comments`.`article_id` 
      IN (
        SELECT `articles`.`id` 
        FROM `articles`  
        WHERE `articles`.`published` = 1
      ) 
    ORDER BY comments.id desc 
    LIMIT 3

An even easier and more performant way, in this case, is to just give MySQL a nudge in the right direction:

    Comment.from('comments use index (PRIMARY)').
            joins(:article).
            where(:articles => {:published => true}).
            order('comments.id desc').limit(3)
    SELECT  `comments`.* 
    FROM comments use index (PRIMARY)
    INNER JOIN `articles` ON `articles`.`id` = `comments`.`article_id` 
    WHERE `articles`.`published` = 1 
    ORDER BY comments.id desc 
    LIMIT 3

Rage Rising

It’s great that we have these workarounds available, but I find it more than a little disappointing that the results of this benchmark script comparing MySQL, PostgreSQL, and SQLite (yes, SQLite!) look like this:

                    user     system      total        real
    mysql       0.020000   0.010000   0.030000 (  0.617876)
    sqlite      0.000000   0.000000   0.000000 (  0.001234)
    postgresql  0.000000   0.000000   0.000000 (  0.002423)

If the query being described were something abnormal, I’d understand. I expect to need hints and goofy hacks when optimizing an edge case. Not when doing something as simple as outlined above.

Anyway, I hope the above explanation helps you implement workarounds in your own queries, or maybe gives you the last nudge you need to finally get around to checking out PostgreSQL. :)

comments powered by Disqus