Advanced Methods for Ensuring Data Integrity in SQL

Halis Manaz
6 min readMar 9, 2023

--

Introduction

PostgreSQL is a powerful relational database management system that offers a wide range of advanced features to ensure data integrity. Data integrity refers to the accuracy and consistency of data over time, which is essential for any database system to function effectively. In my first post, I explained some of the main features of PostgreSQL that help maintain data integrity. If you haven’t read it yet, you can access it here.

In this article, we will be exploring some of the more advanced and lesser-known methods of maintaining data integrity in PostgreSQL. Specifically, we will be looking at cascade actions, triggers, and commits. These methods can help in ensuring that your data remains accurate and consistent, even as it evolves.

To better understand these methods, we will use the same real-life scenario we used in the previous article. By doing so, we can see how these methods can be used in practice to maintain data integrity. You can obtain the table layout from the previous article using the code below.

CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
address VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE
);

CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
order_date DATE,
customer_id INTEGER REFERENCES customers(customer_id),
product_id INTEGER CHECK (product_id > 0)
);

You can also add the following dummy values to the tables to explain through examples.

INSERT INTO customers (name, address, email)
VALUES
('John Smith', '123 Main St', 'john@example.com'),
('Jane Doe', '456 Oak Ave', 'jane@example.com'),
('Bob Johnson', '789 Elm St', 'bob@example.com');

INSERT INTO orders (order_date, customer_id, product_id)
VALUES
('2022-01-01', 1, 100),
('2022-01-02', 2, 101),
('2022-01-03', 3, 102),
('2022-01-04', 1, 103),
('2022-01-05', 2, 104);

By the end of this article, you will have a comprehensive understanding of the advanced data integrity methods in PostgreSQL, and you will be able to use them to maintain the accuracy and consistency of your data. Let’s get started.

Cascading Actions

When a referenced row in the primary key table is deleted or updated, the corresponding rows in the foreign key table should also be deleted or updated automatically. This is known as a “cascading action”. The CASCADE keyword in PostgreSQL is used for foreign key constraints. CASCADE is used most commonly with ON DELETE and ON UPDATE statements to set rules for deletions and updates. Also,SET NULL, SET DEFAULT and RESTRICT statements are used with CASCADE. Before we add cascading actions, we should delete existing foreign key constraints to add a new one.

ALTER TABLE orders
DROP CONSTRAINT customers_customer_id_fkey;

And add a new one with ON DELETE and ON UPDATE

ALTER TABLE orders
ADD CONSTRAINT customers_customer_id_fkey_with_cascade
FOREIGN KEY orders.customer_id
REFERENCES customers customers.customer_id
ON DELETE CASCADE
ON UPDATE CASCADE;

Let’s check CASCADE with deleting and updating.

UPDATE customers SET customer_id = 11 WHERE customer_id = 1;
DELETE FROM customers WHERE customer_id = 2;

To see results

SELECT * FROM orders;

Notice that there are no records whose customer_id is 1 or 2. Because the customer_id value of 2 was automatically deleted, the customer_id values of 1 have been changed to 11.

Triggers

One way to maintain data integrity is to ensure that the data entered into the database is accurate and consistent. In the case of the orders table, it's important to prevent any future dates from being entered into the order_date column. This can be achieved using triggers.

Triggers are special functions that are executed automatically in response to certain events, such as an insert or an update. In this case, we can create two triggers — one for inserting new records and one for updating existing ones — to check whether the order_date is in the future. If it is, the trigger will raise an exception and prevent the data from being entered into the database.

By using triggers, we can ensure that the data in the orders table remains accurate and consistent over time. This is just one example of the advanced data integrity methods available in PostgreSQL.

Create check_order_date function

CREATE OR REPLACE FUNCTION check_order_date()
RETURNS TRIGGER AS $$
BEGIN
IF NEW.order_date > CURRENT_DATE THEN
RAISE EXCEPTION 'Order date cannot be in the future';
END IF;
RETURN NEW;
END;
$$ LANGUAGE plpgsql;

Create two types of triggers: one for inserting and one for updating.

CREATE TRIGGER check_order_date_insert
BEFORE INSERT ON orders
FOR EACH ROW
EXECUTE FUNCTION check_order_date();
CREATE TRIGGER check_order_date_update
BEFORE UPDATE ON orders
FOR EACH ROW
EXECUTE FUNCTION check_order_date();

Please try to insert an invalid order with an order_date in the future to test if the triggers are working properly.

INSERT INTO orders (order_date, customer_id, product_id)
VALUES ('2100-03-10', 2, 101);

Result:

ERROR:  Order date cannot be in the future
CONTEXT: PL/pgSQL function check_order_date() line 4 at RAISE
SQL state: P0001

Voila!

Commits

In PostgreSQL, COMMIT is a method used to ensure that all the inserted data is correct, especially when you need to insert a large amount of data into a database. Rather than inserting the data one by one, which can be time-consuming and require a lot of effort, you can use the COMMIT method to insert all the data at once. This way, if there is an error, your database will not be accidentally affected, as all changes will be rolled back. COMMIT is a part of the SQL standard and is supported by most relational database management systems, including PostgreSQL.

In PostgreSQL, BEGIN is used to start a new transaction. A transaction is a sequence of SQL statements that are treated as a single unit of work. Transactions are used to ensure that all the statements in the transaction are executed successfully, or if there is an error, to roll back all the changes made by the statements in the transaction.COMMIT is used to end a transaction by committing all the changes made by the statements in the transaction to the database. Once a transaction is committed, the changes are permanent and cannot be rolled back. In short, BEGIN starts a transaction, and COMMIT ends it by committing all the changes made by the statements in the transaction to the database.

For example, if you wanted to insert multiple rows into a table and commit them all at once, you could use the following SQL statement:

BEGIN;
INSERT INTO orders (order_date, customer_id, product_id)
VALUES ('2022-01-06', 3, 105),
('2022-01-07', 1, 106),
('2022-01-08', 2, 107),
('2022-01-09', 3, 108),
('2022-01-10', 1, 109),
('2022-01-11', 2, 110),
('2022-01-12', 3, 111),
('2022-01-13', 1, 112),
('2022-01-14', 2, 113),
('2022-01-15', 3, 114);
COMMIT;

This will begin a transaction, insert the rows, and then commit the changes all at once. If any errors occur during the transaction, the database will automatically roll back the changes. It’s important to note that COMMIT should only be used when you are sure that all of the data being inserted is correct. If you are unsure about the data or if you need to make changes to it after insertion, it's best to avoid using COMMIT and instead, insert the data one row at a time so that you can make changes as needed.

That being said, when used correctly, COMMIT can be a powerful tool for maintaining data integrity in your PostgreSQL database.

Conclusion

In this article, we explored some of the more advanced methods of maintaining data integrity in PostgreSQL. We covered cascading actions, triggers, and commits, and saw how they can be used to ensure the accuracy and consistency of data over time.

Cascading actions allow us to automatically update or delete related data in the foreign key table when the primary key table is updated or deleted. Triggers allow us to automatically execute certain functions in response to certain events, such as inserting or updating data. Commits allow us to group transactions together to ensure that they are executed together or not at all.

By using these advanced data integrity methods, we can ensure that our databases remain accurate and consistent, even as they evolve over time. This is essential for any database system to function effectively and is a critical aspect of data management. I hope this article has been helpful in understanding the advanced data integrity methods in PostgreSQL.

Thanks For Reading, Follow Me For More

Previous Article

--

--

Halis Manaz

Hi, I am Halis. I am interested in Python, data analysis, and machine learning. I share what I learned, what I found interesting, and my portfolio projects