Friday, May 6, 2011

Database-Generated Human-Friendly Codes

I'd like to create some human-friendly codes to identify my objects.

I'm thinking about using the following rules:

  • 6-digit random number
  • the first character is not zero
  • each code has an edit distance value of 2 or greater* from every other such code
  • maybe a checksum too

I'd like my MS SQL database to enforce that the codes I use are not only each unique, but also conform to the above criteria too.

How would I write a database check constraint to enforce rules such as these?

How could I make the database use such numbers as default values for inserted rows?

*so a single keystroke typo won't retreive a different-than-intended record

From stackoverflow
  • Use a stored procedure to check whatever constraints you would like for inserted data.

    A stored procedure can also be used to generate these things if needed.

    Although letting the database do all this for you seems like a nice thing, doing it all in code can be easier to maintain in the long run(unless you have dedicated DBAs who love maintaining this kind of stuff).

    I think your idea and algorithm is smart. If you're going to go that far with the requirements, I would say a checksum is a great thing to have. The checksum alone can catch typo errors, regardless of edit distance.

    Zack Peterson : You're right. With a simple single-digit MOD 10 checksum added to a simple unique number, I'd end up with edit distances of 2 or greater.
  • Create a stored proc that calculates your numeric value; use that stored proc as the DEFAULT() value for the column definition in your table definition. Note: I haven't tried this, so I don't know if it's completely possible.

  • How many id's do you need?

    You could declare the column as an identity, and set the start value to 100000, and the increment to 12. That would produce a six digit number, with edit distance of 2.

    Also, as a bonus, this is pretty fast. But you may run out of numbers as this isn't all that dense.

    CREATE TABLE [Items]
    (
        [id] int IDENTITY(100000,12) NOT NULL primary key,
        [Data] varchar(50) NULL
    )
    
    Zack Peterson : I'll have to think about that. I may need another digit.
    Zack Peterson : I'd also rather that the numbers are in a random order so they don't imply unintended information such as the total number of records.
  • Write a one-time-use program to populate a table of all (or many) possible valid codes in a scrambled order with an integer primary key.

    Code table:

    Id   HumanFriendlyCode
    
    1    100124
    2    991302
    3    201463
    4    157104
    ...  ...
    

    Then just relate the objects table to the rows in that codes table with an auto-incrementing integer foreign key and a unique constraint.

    Thing table:

    Id                                    CodeId  ...
    
    e9d29b14-0ea6-4cfd-a49f-44bcaa7212eb  1       ...
    91906bb7-14ed-4acc-bf23-c4bd1631797f  2       ...
    41ace075-f9f8-46b7-b114-cb17765c4e76  3       ...
    2fba1a58-7a91-4da6-a4a2-7cacef8603db  4       ...
    

    Anyone ever done something like this?

  • This check constraint will enforce a checksum in a database column:

    ALTER TABLE tblCode
    ADD CHECK (
        CAST(SUBSTRING(CAST(Code AS VARCHAR), 6, 1) AS INTEGER) =
        (
            CAST(SUBSTRING(CAST(Code AS VARCHAR), 1, 1) AS INTEGER) +
            CAST(SUBSTRING(CAST(Code AS VARCHAR), 2, 1) AS INTEGER) +
            CAST(SUBSTRING(CAST(Code AS VARCHAR), 3, 1) AS INTEGER) +
            CAST(SUBSTRING(CAST(Code AS VARCHAR), 4, 1) AS INTEGER) +
            CAST(SUBSTRING(CAST(Code AS VARCHAR), 5, 1) AS INTEGER)
        ) % 10
    )
    

    The sixth digit must be modulo 10 of the sum of the first five digits.

  • Your enemy as far as UI goes is code length. I suggest you add alphabetic characters (omitting zero, oh, one, and eye). It will cut then length substantially, and make collisions less likely, especially for transpositions.

0 comments:

Post a Comment