The problem I'm trying to solve is that I have a table like this:
a and b refer to point on a different table. distance is the distance between the points.
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 2 | 1 | 2 | 0.2345 | 0 |
| 3 | 1 | 3 | 100 | 0 |
| 4 | 2 | 1 | 1343.2 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
| 6 | 2 | 3 | 110 | 0 |
....
The important column I'm looking is a_id. If I wanted to keep the closet b for each a, I could do something like this:
update mytable set delete = 1 from (select a_id, min(distance) as dist from table group by a_id) as x where a_gid = a_gid and distance > dist;
delete from mytable where delete = 1;
Which would give me a result table like this:
| id | a_id | b_id | distance | delete |
| 1 | 1 | 1 | 1 | 0 |
| 5 | 2 | 2 | 0.45 | 0 |
....
i.e. I need one row for each value of a_id, and that row should have the lowest value of distance for each a_id.
However I want to keep the 10 closest points for each a_gid. I could do this with a plpgsql function but I'm curious if there is a more SQL-y way.
min() and max() return the smallest and largest, if there was an aggregate function like nth(), which'd return the nth largest/smallest value then I could do this in similar manner to the above.
I'm using PostgeSQL.
-
Does PostgreSQL have the analytic function rank()? If so try:
select a_id, b_id, distance from ( select a_id, b_id, distance, rank() over (partition by a_id order by distance) rnk from mytable ) where rnk <= 10;
Rory : Postgres doesn't have this functon, but you're on the right track and your suggestion is helping me find the answer. -
I love postgres, so it took it as a challenge the second I saw this question.
So, for the table:
Table "pg_temp_29.foo" Column | Type | Modifiers --------+---------+----------- value | integer |
With the values:
SELECT value FROM foo ORDER BY value; value ------- 0 1 2 3 4 5 6 7 8 9 14 20 32 (13 rows)
You can do a:
SELECT value FROM foo ORDER BY value DESC LIMIT 1 OFFSET X
Where X = 0 for the highest value, 1 for the second highest, 2... And so forth.
This can be further embedded in a subquery to retrieve the value needed. So, to use the dataset provided in the original question we can get the a_ids with the top ten lowest distances by doing:
SELECT a_id, distance FROM mytable WHERE id IN (SELECT id FROM mytable WHERE t1.a_id = t2.a_id ORDER BY distance LIMIT 10); ORDER BY a_id, distance; a_id | distance ------+---------- 1 | 0.2345 1 | 1 1 | 100 2 | 0.45 2 | 110 2 | 1342.2
Elijah : Dang, I should have given you an example from the dataset you provided.Rory : That won't work, because I want the lowest value for each value of a_id.Elijah : If you remove the DESC (for descending) you will get a list of the lowest values: SELECT value FROM foo ORDER BY value LIMIT 1 OFFSET 1; Anyways, my hope was to demonstrate the principal with a simpler dataset. -
Try this:
SELECT * FROM ( SELECT a_id, ( SELECT b_id FROM mytable mib WHERE mib.a_id = ma.a_id ORDER BY dist DESC LIMIT 1 OFFSET s ) AS b_id FROM ( SELECT DISTINCT a_id FROM mytable mia ) ma, generate_series (1, 10) s ) ab WHERE b_id IS NOT NULL
Checked on PostgreSQL 8.3
0 comments:
Post a Comment