The most important indicators to track while monitoring Azure SQL databases

Home > Enteros’ Blog – Thoughts on Database Technology, Machine / Deep learning, and a Generative AI > Software Engineering > The most important indicators to track while monitoring Azure SQL databases

Key Azure SQL Databases metrics

These are the categories in which the primary KPIs we’ll be looking at fall:

Performance metrics such as compute, storage, and worker utilization and limits are measured.
Active connections to a database are measured as connectivity metrics.

We’ll also look at the audit logs for Azure SQL Databases, which can be used to monitor database activity and identify potential vulnerabilities to the database instances under consideration.

Metrics of performance

Azure provides several alternative pricing models and service tiers to accommodate various types of database workloads. These allow you to specify how much processing power and storage your databases have, enabling you to better forecast prices and determine how effectively they perform in terms of task execution and traffic handling.

Azure SQL single database instances and pools can be purchased using either database transaction units (DTUs) or virtual cores, with the former being the most cost-effective option (vCores). The DTU-based architecture provides choices of specific compute (measured in DTUs) and storage bundles based on Azure’s Basic, Standard, and Premium service tiers, as well as a selection of detailed add and storage bundles. According to the level you select, the resource restrictions for that database will be determined.

The vCore model is suited for more sophisticated workloads. It provides two types of computing tiers: provided and serverless compute. While you can specify the exact amount of added resources (measured in vCores) to be allocated to workloads in the provisioned compute tier. You can also use the serverless tier, which allows you to automatically scale available calculate resources within a configured range and automatically pause databases when they are not in use. Those database instances with uncertain usage patterns, such as those with frequent periods of inactivity and newly created database instances with no usage history and consequently more difficult to size appropriately, should use this tier as a starting point for their design.

Monitoring the following performance metrics can help you understand how well your databases use their resources and whether they are available.

CPU and DTU limits and utilization are computed metrics.
Storage metrics include storage limitations and utilization.
Metrics for processing requests include the availability of database workers and sessions to do so.

It is essential to monitor these metrics to ensure that you are using the right service tier for your workloads, whether using serverless databases, single database instances, elastic pools, or a combination of these technologies.

Calculate the metrics

Name	Description	Metric type
CPU percentage	Percentage of vCores in use, based on the service tier	Resource: utilization
DTU percentage	Percentage of DTUs in use, based on the service tier	Resource: utilization
CPU limit	Total number of vCores allocated to a database	Resource: other
DTU limit	Total number of DTUs allocated to a database	Resource: other
App CPU percentage	Percentage of vCores in use, based on total number of vCores allocated in the serverless tier	Resource: utilization
Memory percentage	Percentage of memory in use	Resource: utilization
App CPU billed	Amount of compute billed (in vCore seconds)	Resource: utilization

DTU and CPU percentage compared to DTU and CPU restrictions are two metrics to keep an eye on.

According to your buying model (DTU or vCore), you should keep track of which utilization metrics are most important. This metric reflects the percentage of computing resources. A database has consumed that or flexible pool offered or hosted on a cloud platform. Monitoring consumption—and receiving alerts when high utilization occurs—can assist you in determining if you are on the verge of exhausting the maximum number of vCores or DTUs you have allotted for your instance or pool. Active queries will time out if a database reaches these capacity limitations.

Inefficient database searches are one potential source of high CPU consumption. Inefficient queries consume more resources, and they may result in delays since your application must wait for your database to complete queries before it can continue processing the rest of the data. Observing a pattern of continuously high CPU and DTU utilization and comparing it to the proportion of available workers to see if it is also regularly high can assist you in determining whether inefficient queries are the cause. In some cases, you may be needed to enhance the computing capacity of a database or pool to meet demand, or you may be required to determine whether or not you need to improve query performance. Azure SQL Database resource logs can provide additional information about query performance, such as runtime statistics and whether the query timed out, allowing you to understand better why a database instance is experiencing high use.

App CPU and memory percentages are two metrics to keep an eye on.

When designing a serverless-tier database, the number of virtual cores (vCores) allocated to the database is determined as a minimum and a maximum. The number of virtual seats (vCores) that the database employs automatically scales in response to demand while remaining within a specific range. Aside from that, it sets the service objective (also known as the compute size) that Azure employs for your database, affecting the restrictions for other resources such as RAM and workers. Using the maximum number of virtual cores you have defined and the resulting maximum memory limit, the application CPU percentage and memory percentage measure vCore and memory utilization for serverless databases, respectively.

App for keeping track of things CPU percentage and RAM percentage can be used to detect whether or not your serverless databases are under-or over-utilizing their available resource allocation. A continuously low virtual core utilization rate, for example, may suggest that you have allocated too many virtual cores (vCores) to your database and that you can safely decrease. To evaluate whether high utilization is caused by a large number of running queries or by a few long-running questions that require optimization, you can compare the app CPU percentage data to other metrics such as the percentage of operating workers.

App CPU billing is a metric to keep an eye on.

In serverless databases, the App CPU billed statistic is utilized to calculate the billing charges associated with each transaction. Azure accounts for serverless-tier databases depending on the amount of computing (i.e., CPU and memory) used per second as well as the amount of provisioned storage used in a single second. The compute utilization of active serverless databases is only measured by the app CPU billed. If a serverless database is halted due to a prolonged period of inactivity, only provisioned storage is charged.

The cost of hosting serverless databases can fluctuate based on database activity and resource usage, so it’s crucial to keep track of this parameter to ensure that you’re always aware of how much money you’re spending to keep your serverless databases running. For example, you can compare the App CPU charged to your database’s CPU and memory consumption to identify if there is a problem with inefficient queries taking too much CPU or memory. If you encounter difficulty, you can get support from your system administrator.

Metrics about storage

Name	Description	Metric type
Storage percentage	Percentage of database space in use, based on the service tier	Resource: utilization
XTP storage percentage	Percentage of storage in use for in-memory online transaction processing	Resource: utilization
Storage	Amount of space allocated to a database	Resource: other

Storage % as a function of storage size is a metric to watch.

The availability of sufficient database storage for updating or producing data is critical for the correct operation of your applications’ data storage. Clients will see an error message when a database does not have enough space to accommodate INSERT or UPDATE statements, and the application will not be updated appropriately.

To avoid running out of space in an Azure database, you should establish alerts on the database’s storage utilization (also known as storage percentage). This will help you see when you are about to run out of capacity in an Azure database, depending on your purchasing model (DTU or vCore). Store space can be reclaimed through mitigation measures such as decreasing database transaction logs, which can quickly consume available storage space if consumption is high; however, these measures must be implemented in conjunction with other measures.

It is also possible to compare this statistic and the size of your database’s storage space to aid in capacity planning, which is the process of evaluating the number of resources required for workloads to maximize cost efficiency and performance. Depending on how storage utilization remains low over time, you may be able to scale a database down to meet service objectives with smaller data size limits, resulting in lower costs in the long run.

XTP storage percentage is a metric to keep an eye on.

Online transaction processing (OLTP) is supported by the Azure SQL Database, which saves data in memory-optimized tables to increase the speed of web-based applications and other applications. Trading, gaming, and Internet of Things (IoT) services are examples of applications that benefit from this technology, available for Premium and Business Critical service tiers.

There is a limit on in-memory storage for databases that use online transaction processing (OLTP) as with other resources. This restriction is dependent on DTU and vCore limitations. Optimized tables should not run out of space; therefore, keeping an eye on in-memory OLTP consumption (also known as XTP storage %) can help prevent this. OLTP databases, both single and pooled, surpassing their in-memory storage constraints, will issue errors and stop any currently running operations (e.g., INSERT, UPDATE, CREATE). It is possible to recover storage space by eliminating data from memory-optimized tables or to upgrade your service tier if this situation occurs.

Metrics for requests

Name	Description	Metric type
Workers percentage	Percentage of available workers in use, based on service tier limits	Resource: Utilization
Sessions percentage	Percentage of concurrent sessions in use, based on service tier limits	Resource: Utilization
Deadlocks	Total number of blocking queries running on a database	Resource: other

Workers and sessions percentages are two metrics to keep an eye on.

Workers are responsible for processing incoming database requests (such as queries, logins, and logouts), and sessions are connections to a currently active database. Using the database administration console, you can specify the maximum number of concurrent workers and sessions that each of your databases can handle. This is determined by the service tier and compute size that each database has been configured to support. For example, on a single database instance, the Basic level permits a maximum of 300 concurrent connections and 30 concurrent workers to run simultaneously. When your database’s workers are exhausted and unable to process new requests, clients will receive an error message. The database will reject further queries until more workers can execute them. Additionally, when the database’s session limit is reached, it will deny any additional connections.Monitoring session and worker usage and CPU consumption may help you guarantee that your databases are always ready to execute requests. It can help you evaluate whether or not you need to improve database queries to make them more efficient. Example: High consumption of all three resources could indicate that your database is underprovisioned, increasing the time it takes to process each request. If this is the case, you may be required to upgrade to a higher service level. Consequently, you will run queries efficiently and will not encounter tier constraints.When a database attempts to handle an excessive number of ineffective queries, it may experience high worker or session utilization as a result. Workers may be unable to process new requests because of the increased resource consumption caused by these types of queries. For example, a service may launch multiple workers to retrieve data in response to an end-user request—one to retrieve the first record and another to retrieve the details of the first record—to fulfill the request. This approach is frequently inefficient and can result in network and downstream server latency, among other things. Using a stored procedure to combine the two queries, you can reduce the number of active workers dedicated to one question and improve the efficiency of these operations.Deadlocks are a metric to keep an eye out for.

To be specific, a database deadlock occurs when a single unit of work, such as a request to connect to a database, takes a lock on resources required by another transaction, thereby blocking both processes from continuing ahead. When this occurs, the Azure SQL Database’s underlying Server SQL databases engine will terminate one transaction to allow the second transaction to proceed uninterrupted. Once the other transaction has been completed, an application can retry any closed transactions.
You should look into the queries creating the deadlocks if you notice an increase in draws. It is possible to find out more about where the deadlock occurred by looking at Azure’s resource logs. It is also possible to use Azure’s Query Performance Insight to discover the exact queries that caused the deadlocks and study recommendations for improving their performance. Suppose a question locks a database resource for an extended period. In that case, bids may include dividing large transactions into numerous smaller ones to lessen the time the database resource is locked.By monitoring essential performance indicators, you may identify the earliest signs of problems in a database and establish if performance degradations are caused by wasteful queries or instances that do not have the processing power or storage space to manage the workloads being served. Following that, we’ll take a look at several critical connectivity indicators that can assist you in monitoring connections to your database and identifying possibly malicious activities on a database instance.

Metrics for network connectivity

Your environment’s databases are maintained by a database server, which serves as an administrative hub for managing logins, firewall and threat-detection policies, audit rules, and other aspects of the environment’s database infrastructure. If you have this capability, it allows you to monitor the connections to your databases and apply appropriate access controls to protect them from unusual activity such as illegal access or SQL injections.A gateway redirects or proxies traffic to the proper database cluster whenever a client requests information from a database. Once within the group, traffic is routed to the appropriate database for processing. Redirect and gateways support proxy connection policies. Redirect policies are suggested for improved latency and are the default policy for traffic coming from within Azure. When using a redirect policy, a gateway routes initial client connections to the node hosting the database, then routes subsequent connections straight to the appropriate cluster. The proxy policy directs all traffic to and from Azure, and it is the default policy for traffic that does not originate in Azure.

Azure generates the following metrics to assist you in monitoring database traffic and ensuring that traffic is sent to the proper databases.

Name	Description	Metric type
Connection failed	Total number of failed connections to a database at a given point in time	Other
Blocked by firewall	Total number of connections to a database blocked by your server’s firewall at a given point in time	Other
Connection successful	Total number of successful connections to a database at a given point in time	Resource: Utilization

Connections that have failed and connections that the firewall has blocked are metrics to keep an eye on.

Azure keeps track of the number of unsuccessful connections and the number of links that have been blocked by a firewall, among other things. Transient faults or a database surpassing resource limits could result in failed connections; therefore, setting up an alert to warn you of any anomalies in this metric will help you rectify an issue before it affects users. An increase in the number of blocked connections may result from an incorrectly configured firewall policy. You may take the troubleshooting process a step further by analyzing your audit logs to learn more about who is attempting to connect to a database and whether or not the person should have access to the database in question.

Connection success is a metric to keep an eye on.

A rapid decline in the number of successful connections indicates that your services cannot query your databases properly. You can use this measure in conjunction with other metrics, such as the number of unsuccessful relationships or the number of connections blocked by a firewall to find the root cause. For example, if many connections are being denied, this may be the result of a misconfigured firewall or network settings.

The presence of an unexpected increase in the number of successful connections could be a symptom of malicious activity. This could result from a distributed denial of service (DDoS) assault, which attempts to drain application resources by flooding the system with requests. To discover the source of these types of seizures (e.g., IP address or geographic region), you can analyze your database audit logs and create a firewall rule to prevent them from taking place. Connection metrics provide a high-level perspective of database traffic. Still, they do not give any information about accessing a database or why it is essential. The inability to distinguish between legal and potentially malicious database traffic makes it more difficult to ascertain where database traffic originates. By checking audit logs, which we will examine in further depth next, it is possible to obtain more insight into the users and services accessing a database and their behaviors and actions.

Auditing and threat detection are two essential functions.

Several built-in security and compliance capabilities, such as audit logging and threat detection, are available in Azure SQL Database to aid in the protection of databases from an attack. These actions and activity groups are recorded in audit logs, which include the following steps:

logins that were successful and those that were unsuccessful
queries and storage procedures that have been executed
ownership, permissions, schemas, and database user passwords can all be changed.

The database instance uses a default audit policy when you enable auditing, which you can modify to match your individual needs.

Alerts when a threat is detected

More detailed audit logs provide more information about potential security vulnerabilities, and they can be used in conjunction with other advanced features, such as threat detection alerts to expedite the process of mitigating those threats. Detailed information about database activity is revealed by signs, including the client’s IP address, a description of the database access, the time the database was viewed, and whether or not the database included sensitive material. Combined with threat detection, audit logging and threat detection give you a complete picture of database activity, including hostile behavior, allowing you to guarantee that your databases are not just performant but also safe.

About Enteros

IT organizations routinely spend days and weeks troubleshooting production database performance issues across multitudes of critical business systems. Fast and reliable resolution of database performance problems by Enteros enables businesses to generate and save millions of direct revenue, minimize waste of employees productivity, reduce the number of licenses, servers, and cloud resources and maximize the productivity of the application, database, and operations teams.

The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.

Are you interested in writing for Enteros’ Blog? Please send us a pitch!

Revolutionizing Healthcare IT: Leveraging Enteros, FinOps, and DevOps Tools for Superior Database Software Management

21 November 2024
Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Optimizing Real Estate Operations with Enteros: Harnessing Azure Resource Groups and Advanced Database Software

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Revolutionizing Real Estate: Enhancing Database Performance and Cost Efficiency with Enteros and Cloud FinOps

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Enteros in Education: Leveraging AIOps for Advanced Anomaly Management and Optimized Learning Environments

Database Performance Management

In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…

Key Azure SQL Databases metrics

RELATED POSTS

Revolutionizing Healthcare IT: Leveraging Enteros, FinOps, and DevOps Tools for Superior Database Software Management

Optimizing Real Estate Operations with Enteros: Harnessing Azure Resource Groups and Advanced Database Software

Revolutionizing Real Estate: Enhancing Database Performance and Cost Efficiency with Enteros and Cloud FinOps

Enteros in Education: Leveraging AIOps for Advanced Anomaly Management and Optimized Learning Environments