Tuesday, February 8, 2011

Get top results for each group (in Oracle)

How would I be able to get N results for several groups in an oracle query.

For example, given the following table:

|--------+------------+------------|
| emp_id | name       | occupation |
|--------+------------+------------|
|      1 | John Smith | Accountant |
|      2 | Jane Doe   | Engineer   |
|      3 | Jack Black | Funnyman   |
|--------+------------+------------|

There are many more rows with more occupations. I would like to get three employees (lets say) from each occupation.

Is there a way to do this without using a subquery?

  • I'm not sure this is very efficient, but maybe a starting place?

    select * from people p1 join people p2 on p1.occupation = p2.occupation join people p3 on p1.occupation = p3.occupation and p2.occupation = p3.occupation where p1.emp_id != p2.emp_id and p1.emp_id != p3.emp_id

    This should give you rows that contain 3 distinct employees all in the same occupation. Unfortunately, it will give you ALL combinations of those.

    Can anyone pare this down please?

    From Bill James
  • I don't have an oracle instance handy right now so I have not tested this:

    select *
    from (select emp_id, name, occupation
          rank() over ( partition by occupation order by emp_id) rank
          from employee)
    where rank <= 3
    

    Here is a link on how rank works: http://www.psoug.org/reference/rank.html

    AviD : Didnt he specify without a subquery...?
    Tony Andrews : Yes, but he may well have meant "without using a subquery that selects from the same table again". This solution uses a subquery but only accesses the table once.
    From jop
  • tested this in SQL Server (and it uses subquery)

    select emp_id, name, occupation from employees t1 where emp_id IN (select top 3 emp_id from employees t2 where t2.occupation = t1.occupation)

    just do an ORDER by in the subquery to suit your needs

  • This produces what you want, and it uses no vendor-specific SQL features like TOP N or RANK().

    SELECT e.name, e.occupation 
    FROM emp AS e 
      LEFT OUTER JOIN emp AS e2 
        ON (e.occupation = e2.occupation AND e.emp_id <= e2.emp_id) 
    GROUP BY e.emp_id 
    HAVING COUNT(*) <= 3 
    ORDER BY e.occupation;
    

    In this example it gives the three employees with the lowest emp_id values per occupation. You can change the attribute used in the inequality comparison, to make it give the top employees by name, or whatever.

    jop : I think this will fail on the GROUP BY clause. There are no aggregate functions in your select list.
    Bill Karwin : I did test this query before posting it. SQL doesn't require aggregate functions to use GROUP BY.

0 comments:

Post a Comment