Sunday, April 17, 2011

Find the last value in a "rolled-over" sequence with a stored procedure?

Suppose I had a set of alpha-character identifiers of a set length, e.g. always five letters, and they are assigned in such a way that they are always incremented sequentially (GGGGZ --> GGGHA, etc.). Now, if I get to ZZZZZ, since the length is fixed, I must "roll over" to AAAAA. I might have a contiguous block from ZZZAA through AAAAM. I want to write a sproc that will give me the "next" identifier, in this case AAAAN.

If I didn't have this "rolling over" issue, of course, I'd just ORDER BY DESC and grab the top result. But I'm at a bit of a loss now -- and it doesn't help at all that SQL is not my strongest language.

If I have to I can move this to my C# calling code, but a sproc would be a better fit.

ETA: I would like to avoid changing the schema (new column or new table); I'd rather just be able to "figure it out". I might even prefer to do it brute force (e.g. start at the lowest value and increment until I find a "hole"), even though that could get expensive. If you have an answer that does not modify the schema, it'd be a better solution for my needs.

From stackoverflow
  • You'd have to store the last allocated identifier in the sequence.

    For example, store it in another table that has one column & one row.

    CREATE TABLE CurrentMaxId (
        Id CHAR(6) NOT NULL
    );
    
    INSERT INTO CurrentMaxId (Id) VALUES ('AAAAAA');
    

    Each time you allocate a new identifier, you'd fetch the value in that tiny table, increment it, and store that value in your main table as well as updating the value in CurrentMaxId.

    The usual caveats apply with respect to concurrency, table-locking, etc.

  • I think I'd have tried to store the sequence as an integer, then translate it to string. Or else store a parallel integer column that is incremented at the same time as the alpha value. Either way, you could sort on the integer column.

  • A problem here is that you can't really tell from the data where the "last" entry is unless there is more detail as to how the old entries are deleted.

    If I understand correctly, you are wrapping around at the end of the sequence, which means you must be deleting some of your old data to make space. However if the data isn't deleted in a perfectly uniform manner, you'll end up with fragments, like below:

    ABCD   HIJKL NOPQRS   WXYZ
    

    You'll notice that there is no obvious next value...D could be the last value created, but it might also be L or S.

    At best you could look for the first or last missing element (use a stored procedure to perform a x+1 check just like you would to find a missing element in an integer sequence), but it's not going to provide any special result for rolled-over lists.

    Coderer : The sequence doesn't always start at the beginning. Sometimes, it starts at BBBB, while sometimes it starts at YYYY, so there's already "room" at the beginning of the alphabet
  • Since I don't feel like writing code to increment letters, I'd create a table of all valid IDs (AAAAAA through ZZZZZZ) with an integer from 1 to X for those IDs. Then you can use the following:

    SELECT @max_id = MAX(id) FROM Possible_Silly_IDs
    
    SELECT
        COALESCE(MAX(PSI2.silly_id), 'AAAAAA')
    FROM
        My_Table T1
    INNER JOIN Possible_Silly_IDs PSI1 ON
        PSI1.silly_id = T1.silly_id
    INNER JOIN Possible_Silly_IDs PSI2 ON
        PSI2.id = CASE WHEN PSI1.id = @max_id THEN 1 ELSE PSI1.id + 1 END
    LEFT OUTER JOIN My_Table T2 ON
        T2.silly_id = PSI2.silly_id
    WHERE
        T2.silly_id IS NULL
    

    The COALESCE is there in case the table is empty. To be truly robust you should calculate the 'AAAAAA' (SELECT @min_silly_id = silly_id WHERE id = 1) in case your "numbering" algorithm changes.

    If you really wanted to do things right, you'd redo the database design as has been suggested.

  • Here's code that I think will give you your Next value. I created 3 functions. The table is just my simulation of the table.column with your alpha ids (I used MyTable.AlphaID). I assume that it's as you implied and there is one contiguous block of five-character uppercase alphabetic strings (AlphaID):

    IF OBJECT_ID('dbo.MyTable','U') IS NOT NULL
        DROP TABLE dbo.MyTable
    GO
    CREATE TABLE dbo.MyTable (AlphaID char(5) PRIMARY KEY)
    GO
    -- Play with different population scenarios for testing
    INSERT dbo.MyTable VALUES ('ZZZZY')
    INSERT dbo.MyTable VALUES ('ZZZZZ')
    INSERT dbo.MyTable VALUES ('AAAAA')
    INSERT dbo.MyTable VALUES ('AAAAB')
    GO
    IF OBJECT_ID('dbo.ConvertAlphaIDToInt','FN') IS NOT NULL
        DROP FUNCTION dbo.ConvertAlphaIDToInt
    GO
    CREATE FUNCTION dbo.ConvertAlphaIDToInt (@AlphaID char(5))
    RETURNS int
    AS
    BEGIN
    RETURN 1+ ASCII(SUBSTRING(@AlphaID,5,1))-65
                  + ((ASCII(SUBSTRING(@AlphaID,4,1))-65) * 26)
                  + ((ASCII(SUBSTRING(@AlphaID,3,1))-65) * POWER(26,2))
                  + ((ASCII(SUBSTRING(@AlphaID,2,1))-65) * POWER(26,3))
                  + ((ASCII(SUBSTRING(@AlphaID,1,1))-65) * POWER(26,4))
    END
    GO 
    
    IF OBJECT_ID('dbo.ConvertIntToAlphaID','FN') IS NOT NULL
        DROP FUNCTION dbo.ConvertIntToAlphaID
    GO
    CREATE FUNCTION dbo.ConvertIntToAlphaID (@ID int)
    RETURNS char(5)
    AS
    BEGIN
    RETURN CHAR((@ID-1) / POWER(26,4) + 65)
          + CHAR ((@ID-1) % POWER(26,4) / POWER(26,3) + 65)
          + CHAR ((@ID-1) % POWER(26,3) / POWER(26,2) + 65)
          + CHAR ((@ID-1) % POWER(26,2) / 26 + 65)
          + CHAR ((@ID-1) % 26 + 65)
    
    END
    GO 
    IF OBJECT_ID('dbo.GetNextAlphaID','FN') IS NOT NULL
        DROP FUNCTION dbo.GetNextAlphaID
    GO
    CREATE FUNCTION dbo.GetNextAlphaID ()
    RETURNS char(5)
    AS
    BEGIN
        DECLARE @MaxID char(5), @ReturnVal char(5)
        SELECT @MaxID = MAX(AlphaID) FROM dbo.MyTable
        IF @MaxID < 'ZZZZZ'
            RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(@MaxID)+1)
        IF @MaxID IS NULL
            RETURN 'AAAAA'
        SELECT @MaxID = MAX(AlphaID) 
        FROM dbo.MyTable 
        WHERE AlphaID < dbo.ConvertIntToAlphaID((SELECT COUNT(*) FROM dbo.MyTable))
        IF @MaxID IS NULL
            RETURN 'AAAAA'
        RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(@MaxID)+1)
    END
    GO
    
    SELECT * FROM dbo.MyTable ORDER BY dbo.ConvertAlphaIDToInt(AlphaID)
    GO
    SELECT  dbo.GetNextAlphaID () AS 'NextAlphaID'
    

    By the way, if you don't want to assume contiguity, you can do as you suggested and (if there's a 'ZZZZZ' row) use the first gap in the sequence. Replace the last function with this:

    IF OBJECT_ID('dbo.GetNextAlphaID_2','FN') IS NOT NULL
        DROP FUNCTION dbo.GetNextAlphaID_2
    GO
    CREATE FUNCTION dbo.GetNextAlphaID_2 ()
    RETURNS char(5)
    AS
    BEGIN
        DECLARE @MaxID char(5), @ReturnVal char(5)
        SELECT @MaxID = MAX(AlphaID) FROM dbo.MyTable
        IF @MaxID < 'ZZZZZ'
            RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(@MaxID)+1)
        IF @MaxID IS NULL
            RETURN 'AAAAA'
        SELECT TOP 1 @MaxID=M1.AlphaID
        FROM dbo.Mytable M1
        WHERE NOT EXISTS (SELECT 1 FROM dbo.MyTable M2 
                          WHERE AlphaID = dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(M1.AlphaID) + 1 )
                         )
        ORDER BY M1.AlphaID
        IF @MaxID IS NULL
            RETURN 'AAAAA'
        RETURN dbo.ConvertIntToAlphaID(dbo.ConvertAlphaIDToInt(@MaxID)+1)
    END
    GO
    
  • To return the next ID for a given ID (with rollover), use:

    SELECT  COALESCE
            (
            (
            SELECT  TOP 1 id
            FROM    mytable
            WHERE   id > @id
            ORDER BY
                    id
            ),
            (
            SELECT  TOP 1 id
            FROM    mytable
            ORDER BY
                    id
            )
            ) AS nextid
    

    This query searches for the ID next to the given. If there is no such ID, it returns the first ID.

    Here are the results:

    WITH mytable AS
            (
            SELECT  'AAA' AS id
            UNION ALL
            SELECT  'BBB' AS id
            UNION ALL
            SELECT  'CCC' AS id
            UNION ALL
            SELECT  'DDD' AS id
            UNION ALL
            SELECT  'EEE' AS id
            )
    SELECT  mo.id,
            COALESCE
            (
            (
            SELECT  TOP 1 id
            FROM    mytable mi
            WHERE   mi.id > mo.id
            ORDER BY
                    id
            ),
            (
            SELECT  TOP 1 id
            FROM    mytable mi
            ORDER BY
                    id
            )
            ) AS nextid
    FROM    mytable mo
    
    id      nextid
    -----   ------
    AAA     BBB
    BBB     CCC
    CCC     DDD
    DDD     EEE
    EEE     AAA
    

    , i. e. it returns BBB for AAA, CCC for BBB, etc., and, finally, AAA for EEE which is last in the table.

    Coderer : Could you edit to explain what this actually does?
  • I think the lowest-impact solution for my needs is to add an identity column. The one thing I can guarantee is that the ordering will be such that entries that should "come first" will be added first -- I'll never add one with identifier BBBB, then go back and add BBBA later. If I didn't have that constraint, obviously it wouldn't work, but as it stands, I can just order by the identity column and get the sort I want.

    I'll keep thinking about the other suggestions -- maybe if they "click" in my head, they'll look like a better option.

0 comments:

Post a Comment