Sunday, March 6, 2011

Group repeated rows in TSQL

I have the following table and data in SQL Server 2005:

create table LogEntries (
  ID int identity,
  LogEntry varchar(100)
)

insert into LogEntries values ('beans')
insert into LogEntries values ('beans')
insert into LogEntries values ('beans')
insert into LogEntries values ('cabbage')
insert into LogEntries values ('cabbage')
insert into LogEntries values ('beans')
insert into LogEntries values ('beans')

I would like to group repeated LogEntries so that I have the following results:

LogEntry  EntryCount
beans     3
cabbage   2
beans     2

Can you think of any way to do this in TSQL outside of using a cursor?

From stackoverflow
  • SQL not exactly my strong point but won't

    SELECT LogEntry, COUNT(1) AS Counter FROM LogEntries GROUP BY LogEntry
    

    do it?

    Dheer : Nope this would result in beans - 5 Cabbage - 2
    Paul : Ah, fell it to the bear pit too. Oh well. At least I'm in good company :)
  • I don't think that you can do this with one query. In order to provide counts in a query, you need to group using the LogEntry column. However, this will just give you total counts per LogEntry, and not the counts of the numbers of entries in sequence that you are looking for. I think that a cursor is called for (or bring the whole dataset to your application, and use logic there to get the results that you want).

  • unless my brain has yet to boot up this morning

    SELECT LogEntry, COUNT(LogEntry) as EntryCount FROM LogEntries GROUP BY LogEntry

    kpollock : ah b*gger, fell into the same trap as Russ Cam...
    Dheer : Nope this would result in beans - 5 Cabbage - 2
    kpollock : worse, someone picked me up on it before I finished typing my retraction!!! :-)
  • Now I have looked at the eactual question closely enough :-)

    Hmm, on reconsidering, why not just use a cursor? The performance isn't always worse than straight SQL - and it'd certianly be easy for other people to follow the code when they come to look at it. Wrap it in a stored proc or function and you'd be able to use it pretty much anywhere you might need.

  • This is a set-based solution for the problem. The performance will probably suck, but it works :)

    CREATE TABLE #LogEntries (
      ID INT IDENTITY,
      LogEntry VARCHAR(100)
    )
    
    INSERT INTO #LogEntries VALUES ('beans')
    INSERT INTO #LogEntries VALUES ('beans')
    INSERT INTO #LogEntries VALUES ('beans')
    INSERT INTO #LogEntries VALUES ('cabbage')
    INSERT INTO #LogEntries VALUES ('cabbage')
    INSERT INTO #LogEntries VALUES ('carrots')
    INSERT INTO #LogEntries VALUES ('beans')
    INSERT INTO #LogEntries VALUES ('beans')
    INSERT INTO #LogEntries VALUES ('carrots')
    
    SELECT logentry, COUNT(*) FROM (
        SELECT logentry, 
        ISNULL((SELECT MAX(id) FROM #logentries l2 WHERE l1.logentry<>l2.logentry AND l2.id < l1.id), 0) AS id
        FROM #LogEntries l1
    ) AS a
    GROUP BY logentry, id
    
    
    DROP TABLE #logentries
    

    Results:

    beans   3
    cabbage 2
    carrots 1
    beans   2
    carrots 1
    

    The ISNULL() is required for the first set of beans.

    Lance Fisher : Brilliant! Thanks, this is a lot simpler than the cursor I wrote.
    kpollock : I am curious, how is the performance vs a cursor?
    kpollock : I am going senile too, as now I look at it, I have used this sort of set based solution myself in the past.
  • I think this will do it... didn't check too thoroughly though

    select 
        COUNT(*),subq.LogEntry 
    from 
    (
        select 
         ROW_NUMBER() OVER(ORDER BY id)-ROW_NUMBER() OVER(PARTITION BY logentry ORDER BY id) as t,*
        from 
         LogEntries
    ) subq 
    group by 
        subq.t,subq.LogEntry 
    order by 
        MIN(subq.ID)
    
    Lance Fisher : Thanks, it does work. I haven't used the partition keyword before. That's interesting.
    Jonas Lincoln : This is a very elegant and cunning use ov the OVER() function. Nice!
    kpollock : NEAT! I don't know partitionng well enough... should do!

0 comments:

Post a Comment