MySql Use “Distinct” and “Order by” with Multiple Columns AKA Apply “Order by” before “Group”

I’ve had a devil of a time trying to get Google to tell me how to write a Mysql query that allows us to somehow perform a MySql query that 1) filters rows on a distinct column 2) returns other columns in the query besides the distinct column and 3) allows us to order by a column.  In our case, we (and you, if you’re running Savage Beast!) have a list of most recent forum posts on the site — currently, if you list all recent posts, the search will just find all posts and order by date of creation, but this makes for some dumb-looking output since you often end up with a list where 10 of the 20 posts are all from the same forum topic.  All the user really wants to know is what topics have a new post in them, and to get a brief glimpse as to what that new post might be.

Thus, we want to create a query that returns the new posts, ordered by date of creation, that have a distinct topic_id.

Here’s the SQL that can make it happen:

Post.find_by_sql(”select posts.* from posts LEFT JOIN posts t2 on posts.topic_id=t2.topic_id AND posts.created_at < t2.created_at WHERE t2.topic_id IS NULL ORDER BY posts.created_at DESC”)

Hope that Google sees fit to lead other people here instead of struggling to get GROUP BY to order results beforehand (GROUP BY ‘posts.topic_id’ works, but it returns the first post in each distinct topic, rather than the last post as we desire), or get SELECT DISTINCT to return more than one column, as many forum posters unhelpfully suggested in all the results I was getting.

3 Responses to “MySql Use “Distinct” and “Order by” with Multiple Columns AKA Apply “Order by” before “Group””

  1. Bob Weber Says:

    I am facing a similar problem, but I’m having difficulty seeing how your query works.

  2. Bill Says:

    Hi Bob,

    Not quite sure what aspect of the query you find confusing, but I’ll break it down as best I understand it:

    Basically, there are two tables, a “posts” and a “topics” table. We are trying to get entries from the posts table where each post has a unique topic_id. Thus, the query selects all data fields from within the post table (select posts.* from posts), joins the post to other posts in the same the topic (LEFT JOIN posts t2 on posts.topic_id=t2.topic_id), picks the most recent post within the topics table (posts.created_at < t2.created_at). I’m still not clear myself what the “t2.topic_id IS NULL” clause is does, but without it, I believe you get duplicate entries. The ORDER BY posts.created_at DESC means that the final SQL result orders your list of unique posts starting with the most recently created.

    Hope this helps to clarify things some. If we’re lucky a SQL expert will stop by eventually who knows the purpose of the t2.topic_id IS NULL fragment.

  3. Yebo Says:

    Cool I was faced with a similar problem for a multi user blog and this post helped me

Leave a Reply