Sunday, March 27, 2011

SQL - min() gets the lowest value, max() the highest, what if I want the 2nd (or 5th or nth) lowest value?

The problem I'm trying to solve is that I have a table like this:

a and b refer to point on a different table. distance is the distance between the points.

| id | a_id | b_id | distance | delete |
| 1  |  1   |   1  |  1       |   0    |
| 2  |  1   |   2  |  0.2345  |   0    |
| 3  |  1   |   3  |  100     |   0    |
| 4  |  2   |   1  |  1343.2  |   0    |
| 5  |  2   |   2  |  0.45    |   0    |
| 6  |  2   |   3  |  110     |   0    |
....

The important column I'm looking is a_id. If I wanted to keep the closet b for each a, I could do something like this:

update mytable set delete = 1 from (select a_id, min(distance) as dist from table group by a_id) as x where a_gid = a_gid and distance > dist;
delete from mytable where delete = 1;

Which would give me a result table like this:

| id | a_id | b_id | distance | delete |
| 1  |  1   |   1  |  1       |   0    |
| 5  |  2   |   2  |  0.45    |   0    |
....

i.e. I need one row for each value of a_id, and that row should have the lowest value of distance for each a_id.

However I want to keep the 10 closest points for each a_gid. I could do this with a plpgsql function but I'm curious if there is a more SQL-y way.

min() and max() return the smallest and largest, if there was an aggregate function like nth(), which'd return the nth largest/smallest value then I could do this in similar manner to the above.

I'm using PostgeSQL.

From stackoverflow
  • Does PostgreSQL have the analytic function rank()? If so try:

    select a_id, b_id, distance
    from
    ( select a_id, b_id, distance, rank() over (partition by a_id order by distance) rnk
      from mytable
    ) where rnk <= 10;
    
    Rory : Postgres doesn't have this functon, but you're on the right track and your suggestion is helping me find the answer.
  • I love postgres, so it took it as a challenge the second I saw this question.

    So, for the table:

        Table "pg_temp_29.foo"
     Column |  Type   | Modifiers 
    --------+---------+-----------
     value  | integer |
    

    With the values:

     SELECT value FROM foo ORDER BY value;
     value 
    -------
         0
         1
         2
         3
         4
         5
         6
         7
         8
         9
        14
        20
        32
    (13 rows)
    

    You can do a:

    SELECT value FROM foo ORDER BY value DESC LIMIT 1 OFFSET X
    

    Where X = 0 for the highest value, 1 for the second highest, 2... And so forth.

    This can be further embedded in a subquery to retrieve the value needed. So, to use the dataset provided in the original question we can get the a_ids with the top ten lowest distances by doing:

    SELECT a_id, distance FROM mytable
     WHERE id IN
      (SELECT id FROM mytable WHERE t1.a_id = t2.a_id
       ORDER BY distance LIMIT 10);
     ORDER BY a_id, distance;
    
     a_id | distance 
    ------+----------
        1 |   0.2345
        1 |        1
        1 |      100
        2 |     0.45
        2 |      110
        2 |   1342.2
    
    Elijah : Dang, I should have given you an example from the dataset you provided.
    Rory : That won't work, because I want the lowest value for each value of a_id.
    Elijah : If you remove the DESC (for descending) you will get a list of the lowest values: SELECT value FROM foo ORDER BY value LIMIT 1 OFFSET 1; Anyways, my hope was to demonstrate the principal with a simpler dataset.
  • Try this:

    SELECT  *
    FROM    (
        SELECT a_id, (
         SELECT b_id
         FROM mytable mib
         WHERE mib.a_id = ma.a_id
         ORDER BY
          dist DESC
         LIMIT 1 OFFSET s
         ) AS b_id
        FROM (
         SELECT DISTINCT a_id
         FROM mytable mia
         ) ma, generate_series (1, 10) s
        ) ab
    WHERE   b_id IS NOT NULL
    

    Checked on PostgreSQL 8.3

0 comments:

Post a Comment