10 major SQL programming errors
SQL (Structured Query Language) is a simple language, but it is not as simple when you work with datasets that contain millions of records.
When you work with medium and large tables, writing the most efficient SQL statements is a priority. Before you start writing SQL statements for any platform – Oracle, SQL Server, MySQL, or any other – these are the 10 most common SQL programming errors you should avoid.
Forgotten primary keys
Each table requires a primary key for performance. If you don’t have a primary key in any table, your tables don’t meet standard SQL requirements and suffer from performance problems. Primary keys are automatically set as cluster indexes, which speeds up queries. They are always unique, so you can use an auto-incremental numeric value if you do not have another column in a table that meets the unique requirement.
Primary keys are your first step towards relational databases. They refer to the external keys in the relational tables. For example, if you have a table with a list of clients, the “CustomerId” column shall be unique for each client. This may be your column of the primary key. Your CustomerId value will then be placed in the “Order” table to link the two tables together. Always use the primary key in every table you create, regardless of its size.
Poorly managed data redundancy
Data redundancy is good for backups, but not for tabular data. Each table must contain a unique set of data that does not repeat the data elsewhere in the table. This is one of the most difficult ideas for a new SQL developer. It is very easy to forget the normalization rules and repeat data in several tables for convenience, but it is not necessary and represents a bad design of a table.
For example, suppose you have a client table that contains the client’s address. Since the address refers to the client, it is in the right place. Then you create an “Order” table and add the client address to the “Order” table. This type of data redundancy is poorly designed.
The Customer table and the “Order” table can communicate with each other using the primary and external keys. What happens if you forget to update the customer address in the order table? As a result, you now have two addresses for the customer, and you do not know which one is the exact one.
Bottom line: always store data in one place and use the relationship between primary and external keys to request data.
Avoid NOT IN or IN and use JOIN instead
NOT IN and IN operators are poorly optimized. They are convenient, but you can usually replace them with a simple JOIN operator. Take a look at a sample request.
SELECT *FROM Customer
WHERE NOT IN (SELECT CustomerId FROM Order)
The above application returns the data set of all customers who do not have an order. In this statement, the SQL database extracts all orders from the Order table and then filters the set of records based on the main external query in the Customer table. If you have millions of orders, this is an extremely slow query.
An alternative, more efficient option is as follows.
SELECT * FROM Customer c
LEFT JOIN Order o on c.CustomerId = o.CustomerId
WHERE o.CustomerId IS NULL
The LEFT JOIN operator returns the same data set as the previous operator, but it is much more optimized. It combines two tables by primary and external key, which increases the query speed and avoids NOT IN and IN sentences.
Forgotten NULL values and empty string values
Discussions between empty and empty lines between database administrators continued for decades. You can use NULL values if no values are available, or you can use actual literal values such as zero-length strings or 0 integer values.
What you use in the database must be the same for all tables, otherwise, queries may become chaotic. Remember that NULL values do not match, for example, a zero-length string, so your queries should take these values into account if there is no standard in the table design.
When you determine what you want to use, make sure that your queries take these values into account. For example, if you enable NULL for a username, you must execute a query using the NULL filter (NOT NULL or IS NULL) in your queries to include or exclude these entries.
The asterisk symbol in SELECT operators
Always define the columns that you want to return to your requests. This standard is for performance and security. Take the following request for example.
SELECT * FROM Customer
The request returns all customer values, including any confidential data you may have stored in the table. What if you keep a customer password or social security number on the table? We hope these values are encrypted, but even having a hashed value can help hackers. It is also a performance problem if you have dozens of columns in the table.
Instead of the above query, always define each column. The following statement is an example.
SELECT CustomerId, FirstName, LastName FROM Customer
The above operator defines each column and limits the size of the recordset and what the hacker can see in case of data hacking.
A cycle with too many cursors
Cursors, cyclic structures in SQL language, is the basis of database performance. They allow you to go through millions of records and run operators for each of them individually. While this may seem like an advantage, it can reduce database performance.
Circles are common in programming languages, but they are inefficient in SQL programming. Most database administrators reject SQL procedures with cursors embedded.
It is best to write the procedure differently to avoid a negative impact on database performance if possible. You can replace most cursors with a well-written SQL statement. If you cannot avoid this, the cursors should be saved for scheduled tasks that run during off-peak hours.
Cursors are used in query reports and data conversion jobs, so they cannot always be avoided. Simply limit them to as many as possible in the production databases that perform daily queries to your database.
Data inconsistencies in-field assignment procedures
When you declare the columns of a table, you must assign each column a data type. Make sure that this data type covers all the necessary values to be saved. Once you have defined the data type, you can only store this type of value in a column.
For example, you probably need a decimal precision of 2-3 points in the column that stores the total order value. If you assign this column as an integer, your database can only store integers without decimal values. What happens to decimal places depends on your database platform.
It may automatically crop the values or generate an error. Any alternative may cause a serious error in your application. Always keep in mind what you need to keep when developing your tables.
This includes writing queries – when you write your queries and pass parameter values to the stored procedure, the variable must be declared with the correct data type. Variables that do not represent the column data type will also generate errors or crop data in the process.
Logical OR and AND operations
It’s easy to miss a simple logical order when writing your requests. Operators AND and OR can change the data set significantly. You can avoid common SQL errors by using parentheses or by organizing your operators to represent the logic to be applied.
Let’s take a look at an SQL statement that mixes AND and OR.
SELECT CustomerId
FROM Customer
WHERE FirstName = 'Jack' AND LastName = 'Destroyer' OR CustomerId > 0
The purpose of the above statement is to obtain any customer with the name “Jack” and “Destroyer” and the customer ID is greater than zero.
However, since we have mixed the AND operator with OR, all records where the client ID is above zero are returned. We can overcome this logical obstacle by using parentheses. Let us add them to the above statement.
SELECT CustomerId
FROM Customer
WHERE (FirstName = 'Jack' OR LastName = 'Destroyer') AND CustomerId > 0
We changed the logic for this statement. Now the first set of brackets returns all entries where the client’s name is Jack or Destroyer. With this filter, we tell SQL to return only those values where CustomerId is greater than zero.
These types of logical statements must be well verified before they are released into production.
The subqueries should return a single record
Subqueries are not the optimal way to write SQL statements, but sometimes they cannot be avoided. When you use subqueries, they must always return one record, otherwise, your query will not be executed.
Let’s have a look at an example.
SELECT CustomerId,
(SELECT OrderId FROM Order o WHERE c.CustomerId = o.CustomerId)
FROM Customer c
In the above query, we get a list of customer IDs from the Customer table. Please note that we get the order identifier from the order table where the customer identifier matches. If there is only one order, this query works fine. However, if there is more than one order for a customer, the subquery returns more than one entry and the request is not executed. You can avoid this problem by using the “Top 1” operator.
Let’s change the request to the next one.
SELECT CustomerId,
(SELECT Top 1 OrderId FROM Order o WHERE c.CustomerId = o.CustomerId ORDER BY OrderDate)
FROM Customer c
In the above query, we extract only one record and organize the records by date. This request receives the first order placed by the customer.
JOIN to indexes
The table should always be well indexed and you can take advantage of these performance improvements by using the JOIN operator for the columns assigned to the index. In the previous examples, we used the primary and external keys of the table.
The column of the primary key is always an index, so you don’t have to worry about performance. However, the external key must also have an index.
Any JOIN operators that you use must have an index for the column. If you do not have an index, consider adding it to the table.
Conclusion
Relational databases are ideal for most internal procedures, but you need to create the right SQL statement and optimize tables and queries for maximum performance.
Avoid these ten SQL errors and you will be on your way to creating a fast and efficient database for any small, medium, or large online business.
Enteros
About Enteros
IT organizations routinely spend days and weeks troubleshooting production database performance issues across multitudes of critical business systems. Fast and reliable resolution of database performance problems by Enteros enables businesses to generate and save millions of direct revenue, minimize waste of employees’ productivity, reduce the number of licenses, servers, and cloud resources and maximize the productivity of the application, database, and IT operations teams.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Enhancing Anomaly Management in Banking & Insurance with Enteros: A Comprehensive Approach to Database Optimization
- 28 October 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Transforming Insurance Analytics: How Enteros Enhances Database Performance on AWS EC2
In today’s data-driven landscape, the insurance industry faces increasing demands for agility, efficiency, and customer-centricity. The rise of digita…
Optimizing Cost Allocation in Real Estate: Leveraging Enteros and Data Lakes for Efficient Database Management
- 27 October 2024
- Technology Business
The real estate sector operates on intricate financial models where effective cost allocation plays a vital role in maintaining profitability and scal…
Optimizing Healthcare Costs with Enteros: Managing Blended Rates and Cloud Billing for Improved Financial Health
In the healthcare sector, managing and forecasting costs associated with IT infrastructure is essential for balancing patient care with operational ef…