Code Question: Get top results for each group (in Oracle)

How would I be able to get N results for several groups in an oracle query.

For example, given the following table:

|--------+------------+------------|
| emp_id | name       | occupation |
|--------+------------+------------|
|      1 | John Smith | Accountant |
|      2 | Jane Doe   | Engineer   |
|      3 | Jack Black | Funnyman   |
|--------+------------+------------|

There are many more rows with more occupations. I would like to get three employees (lets say) from each occupation.

Is there a way to do this without using a subquery?

From stackoverflow oneself

I'm not sure this is very efficient, but maybe a starting place?

select * from people p1 join people p2 on p1.occupation = p2.occupation join people p3 on p1.occupation = p3.occupation and p2.occupation = p3.occupation where p1.emp_id != p2.emp_id and p1.emp_id != p3.emp_id

This should give you rows that contain 3 distinct employees all in the same occupation. Unfortunately, it will give you ALL combinations of those.

Can anyone pare this down please?

From Bill James
I don't have an oracle instance handy right now so I have not tested this:
```
select *
from (select emp_id, name, occupation
      rank() over ( partition by occupation order by emp_id) rank
      from employee)
where rank <= 3
```
Here is a link on how rank works: http://www.psoug.org/reference/rank.html

AviD : Didnt he specify without a subquery...?

Tony Andrews : Yes, but he may well have meant "without using a subquery that selects from the same table again". This solution uses a subquery but only accesses the table once.

From jop
tested this in SQL Server (and it uses subquery)

select emp_id, name, occupation from employees t1 where emp_id IN (select top 3 emp_id from employees t2 where t2.occupation = t1.occupation)

just do an ORDER by in the subquery to suit your needs

From Leon Tayson
This produces what you want, and it uses no vendor-specific SQL features like TOP N or RANK().
```
SELECT e.name, e.occupation 
FROM emp AS e 
  LEFT OUTER JOIN emp AS e2 
    ON (e.occupation = e2.occupation AND e.emp_id <= e2.emp_id) 
GROUP BY e.emp_id 
HAVING COUNT(*) <= 3 
ORDER BY e.occupation;
```
In this example it gives the three employees with the lowest emp_id values per occupation. You can change the attribute used in the inequality comparison, to make it give the top employees by name, or whatever.

jop : I think this will fail on the GROUP BY clause. There are no aggregate functions in your select list.

Bill Karwin : I did test this query before posting it. SQL doesn't require aggregate functions to use GROUP BY.

From Bill Karwin

Code Question

Tuesday, February 8, 2011

Get top results for each group (in Oracle)

0 comments:

Post a Comment

Blog Archive