Programming Answer: What to do when I want to use database constraints but only mark as deleted instead of deleting?

I am working in a project where database items are not deleted, but only marked as deleted. Something like this:

id   name     deleted
---  -------  --------
1    Thingy1  0
2    Thingy2  0
3    Thingy3  0

I would like to be able to define something like a UNIQUE constraint on the name column. Seems easy, right?

Let's imagine a scenario in which "Thingy3" is deleted, and a new one is created (perhaps years later). We get:

id   name     deleted
---  -------  --------
1    Thingy1  0
2    Thingy2  0
3    Thingy3  1
...
100  Thingy3  0

From the user's point of view, he deleted an item and created a new one. Much like deleting a file, and creating a new file. So it's obvious to him that the new item is unrelated and unattached to any data connected to the old item.

That's already handled, since the DB only cares about the id, and since the new item has an id of 100 instead of 3, they are utterly different.

My difficulty arises when I want to prevent the user from creating another "Thingy3" item. If I had a UNIQUE constraint that only looked at items that aren't marked deleted, then I would have solved one problem.

(Of course, then I'd have to deal with what happens when someone does an undo of the delete...)

So, how can I define that sort of a constraint?

From stackoverflow

Not sure about SQLServer2005, but you can define compound constrainst/indexes
```
CREATE UNIQUE INDEX [MyINDEX] ON [TABLENAME] ([NAME] , [DELETED])
```
As pointed out by SteveWeet, this will only allow you to delete/create twice.

Steve Weet : This will only allow one delete of a 'Thingy3' you can't create it, delete it, create it again and then delete it again.

Robert Gould : Good point, this won't allow N updates, fixing it
For example, you can add an illegal character (*) to the deleted name. But you still have problems undeleting a deleted item. So probably the better idea is to prohibit double names even if they are deleted.

You can clean the deleted records after an amount of time (or move them to a separate table).

belugabob : Modifying the user entered data is always going to be problematic, as you can't necessarily predict whether the data will end with an illegal character already. Admittedly, the character will only be added to 'deleted' records, so that introduces a certain element of predictability. Either way, I prefer Steve Weet's solution, as it support undeletes.
Create a unique constraint on (name, deleted). This will mean you can only have one deleted per name, however.

The obvious work-around for that works under ANSI-92 but not on MS SQLServer: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=299229
For a simple table called [Test] with columns ID(int), Filter(nvarchar), Deleted(bit)
```
ALTER TABLE [dbo].[Test] ADD 
    CONSTRAINT [DF_Test_Deleted] DEFAULT (0) FOR [Deleted],
    CONSTRAINT [IX_Test] UNIQUE  NONCLUSTERED 
    (
        [filter],
        [Deleted]
    )  ON [PRIMARY]
```
belugabob : This approach become problematic when a record is 'deleted' a record with the same 'Filter' value is added and the an attempt to 'delete' this record is made - as the constraint is then violated.
Add a random hash after the unique name. Something that is easily reversible. Possibly separate with an underscore or some other character.

Edit after comment: You could simply add underscore and the current timestamp.

scraimer : Brilliant! Heck, I could even change the name to be something like "Thingy3 (deleted on 1/1/2000 #1)" for extra info to the user!
You could add the id value to the end of the name when a record is deleted, so when someone deletes id 3 the name becomes Thingy3_3 and then when they delete id 100 the name becomes Thingy3_100. This would allow you to create a unique composite index on the name and deleted fields but you then have to filter the name column whenever you display it and remove the id from the end of the name.

Perhaps a better solution would be to replace your deleted column with a deleted_at column of type DATETIME. You could then maintain a unique index on name and deleted at, with a non-deleted record having a null value in the deleted_at field. This would prevent the creation of multiple names in an active state but would allow you to delete the same name multiple times.

You obviously need to do a test when undeleting a record to ensure that there is no row with the same name and a null deleted_at field before allowing the un-delete.

You could actually implement all of this logic within the database by using an INSTEAD-OF trigger for the delete. This trigger would not delete records but would instead update the deleted_at column when you deleted a record.

The following example code demonstrates this
```
CREATE TABLE swtest (  
    id   INT IDENTITY,  
    name  NVARCHAR(20),  
    deleted_at DATETIME  
)  
GO  
CREATE TRIGGER tr_swtest_delete ON swtest  
INSTEAD OF DELETE  
AS  
BEGIN  
    UPDATE swtest SET deleted_at = getDate()  
    WHERE id IN (SELECT deleted.id FROM deleted)
    AND deleted_at IS NULL      -- Required to prevent duplicates when deleting already deleted records  
END  
GO  

CREATE UNIQUE INDEX ix_swtest1 ON swtest(name, deleted_at)  

INSERT INTO swtest (name) VALUES ('Thingy1')  
INSERT INTO swtest (name) VALUES ('Thingy2')  
DELETE FROM swtest WHERE id = SCOPE_IDENTITY()  
INSERT INTO swtest (name) VALUES ('Thingy2')  
DELETE FROM swtest WHERE id = SCOPE_IDENTITY()  
INSERT INTO swtest (name) VALUES ('Thingy2')  

SELECT * FROM swtest  
DROP TABLE swtest
```
The select from this query returns the following
```
id      name       deleted_at
1       Thingy1    NULL
2       Thingy2    2009-04-21 08:55:38.180
3       Thingy2    2009-04-21 08:55:38.307
4       Thingy2    NULL
```
So within your code you can delete records using a normal delete and let the trigger take care of the details. The only possible issue (That I could see) was that deleting already deleted records could result in duplicate rows, hence the condition in the trigger to not update the deleted_at field on an already deleted row.

belugabob : Voted this up, because it was exactly what I was considering for my own application. Trying to think of any downsides, but it seems solid enough so far.

scraimer : I love the "deleted_at" idea!
The problem with a compound unique constraint is that it's not possible to have multiple records with the same name that are deleted. This means that the system will break once you delete a third record. I wouldn't recommend appending stuff to the names because, theoretically, a duplicate situation could arise. Also, by doing so, you are basically corrupting the data in the database as well as adding cryptic business logic to the data itself.

The only possible solution, database wide, is to add a trigger that checks that the inserted/updated data is valid. It's also possible to put the checks outside of the database, into code.
It might be worth considering using a "recycle bin" table. Instead of keeping the old records in the same table with a flag, move them to its own table with its own constraints. For instance, in the active table you do have a UNIQUE constraint on name, but in the recycle bin table you don't.
Instead of deleted column use end_date column. When user deletes a record add the current date in end_date column. Any records where end_date column is NULL are your current records. Define a unique constraint on two columns name and end_date. Due to this constraint you never have a scenario where valid record name is duplicated. Any time user wants to undelete a record, you need to set the end_date column to null and if this violates the unique constraint then you show a message user to user that the same name already exists.

scraimer : That's what Steve Weet already said at http://stackoverflow.com/questions/771197/what-to-do-when-i-want-to-use-database-constraints-but-only-mark-as-deleted-inste/771337#771337

The type of constraint you require is a table-level CHECK constraint i.e. a CHECK constraint consisting of a subquery which tests NOT EXISTS (or equivalent) for the table e.g.

CREATE TABLE Test 
(
   ID INTEGER NOT NULL UNIQUE, 
   name VARCHAR(30) NOT NULL, 
   deleted INTEGER NOT NULL, 
   CHECK (deleted IN (0, 1))
);

ALTER TABLE Test ADD
   CONSTRAINT test1__unique_non_deleted
      CHECK 
      (
         NOT EXISTS 
         (
            SELECT T1.name
              FROM Test AS T1
             WHERE T1.deleted = 0
             GROUP
                BY T1.Name
            HAVING COUNT(*) > 1
         )
      );

INSERT INTO Test (ID, name, deleted) VALUES (1, 'Thingy1', 0)
;
INSERT INTO Test (ID, name, deleted) VALUES (2, 'Thingy2', 0)
;
INSERT INTO Test (ID, name, deleted) VALUES (3, 'Thingy3', 1)
;
INSERT INTO Test (ID, name, deleted) VALUES (4, 'Thingy3', 1)
;
INSERT INTO Test (ID, name, deleted) VALUES (5, 'Thingy3', 0)
;
INSERT INTO Test (ID, name, deleted) VALUES (6, 'Thingy3', 0)
;

The last INSERT (ID = 6) will cause the constraint to bite and the INSERT will fail. Q.E.D.

...ooh, nearly forgot to mention: SQL Server doesn't yet support CHECK constraints with that contain a subquery (I tested the above on ACE/JET, a.k.a. ). While you could use a FUNCTION I've read this is unsafe due to SQL Server testing constraints on a row-by-row basis (see David Portas' Blog). Until this full SQL-92 feature is supported in SQL Server, my preferred workaround is to use the same logic in a trigger.

Programming Answer

Saturday, April 30, 2011

What to do when I want to use database constraints but only mark as deleted instead of deleting?

0 comments:

Post a Comment

Blog Archive