Preamble
Microsoft Azure SQL Database is a platform-as-a-service (PaaS) database offering for modern cloud applications. It’s a fully managed service that operates on the most recent version of the SQL Server database engine, allowing you to build highly available and performant database instances without worrying about hardware upgrades, patches, or backups. SQL Database can store, consolidate, and manage relational and non-relational (i.e., NoSQL) data, allowing you to store, reduce, and manage various data formats (e.g., XML, JSON) in the same database instances.
Databases can be deployed as single database instances or as part of an elastic pool in Azure. Single databases are self-contained instances with resources suitable for applications requiring a single data source. Elastic pools are a set of databases that can enable more complex applications by handling a variety of workloads. Pools share the same resources, making them perfect for grouping instances with similar usage patterns—typically low average utilization with occasional spikes—to save money and ensure they always have adequate resources to perform at their best. Azure delivers various performance and security features for each provided database, including automatic tuning and artificial intelligence-powered performance insights.
Monitoring Azure SQL databases allows you to keep track of operating costs and spot performance issues that could jeopardize the operation of your applications. An underprovisioned database instance, for example, maybe unable to process queries efficiently, resulting in slower application performance and a negative user experience. Azure SQL databases generate the following telemetry data to help you monitor performance and costs:
- Tracked resource utilization, database connections, available storage, deadlocks, and other metrics.
- Deadlocks, for example, are recorded in resource logs, which are a type of Azure platform log that contains additional information on database performance.
- All database actions, including details about executed queries, are recorded in audit logs.
These telemetries provide you with a better picture of what’s going on in your database, but which data you should watch depends on the database architecture. You choose which pricing model and service tier to employ when providing a database. It decides how much compute, memory, I/O, and storage capacity your databases have, directly impacting performance and prices. We’ll look at this data in more depth in this article and how the Azure SQL Database configuration settings decide which information you should monitor.
This tutorial uses the metric terminology established in our Monitoring 101 guide, which lays out a foundation for metric collection and alerting.
Key Azure SQL Database metrics
The primary metrics we’ll be looking at are divided into the following groups:
- Compute, storage, worker consumption, and restrictions are the performance metrics.
- Active database connections are measured via connectivity metrics.
We’ll also look at audit logs for Azure SQL Database, which can be used to track database activity and identify potential threats to your database instances.
Metrics of performance
To accommodate different types of database workloads, Azure offers a variety of purchase models and service levels. These allow you to specify the amount of processing power and storage your databases have, better forecasting expenses, and assess how well they execute tasks and handle the traffic.
Database transaction units (DTUs) and virtual cores are the two buying methods for Azure SQL single database instances and pools (vCores). The DTU-based architecture offers specified compute (measured in DTUs) and storage packages based on Azure’s Basic, Standard, and Premium service tiers. The tier will determine the resource restrictions for the database you select.
For increasingly sophisticated workloads, the vCore model is recommended. Provisioned and serverless compute tiers are available. You can specify the exact amount of computing resources (measured in vCores) to provision for workloads with the provisioned add tier. Still, the serverless deck allows you to auto-scale available compute resources within a configured range and pause databases when they are not in use. This tier is excellent for database instances with unexpected usage patterns, such as frequent periods of inactivity and newly launched database instances with no usage history and hence more difficult to size appropriately.
Monitoring the following performance measures can help you understand your databases’ resource consumption and availability:
- CPU and DTU limits and consumption are computed metrics.
- Metrics for storage: storage limitations and use
- Request metrics include the number of database workers and sessions available to process requests.
These metrics can help you ensure you’re utilizing the correct service tier for your workloads, whether using serverless databases, single database instances, elastic pools, or a mix.
Calculate the metrics
Name
|
Description
|
|
---|---|---|
CPU percentage
|
Percentage of vCores in use, based on the service tier
|
Resource: utilization
|
DTU percentage
|
Percentage of DTUs in use, based on the service tier
|
Resource: utilization
|
CPU limit
|
Total number of vCores allocated to a database
|
Resource: other
|
DTU limit
|
Total number of DTUs allocated to a database
|
Resource: other
|
App CPU percentage
|
Percentage of vCores in use, based on total number of vCores allocated in the serverless tier
|
Resource: utilization
|
Memory percentage
|
Percentage of memory in use
|
Resource: utilization
|
App CPU billed
|
Amount of compute billed (in vCore seconds)
|
Resource: utilization
|
DTU and CPU % vs. DTU and CPU limits are two metrics to watch.
Which utilization measure you should track depends on your purchase model (DTU or vCore). These metrics indicate how much of a provisioned or serverless database’s or elastic pool’s compute resources are being utilised. Monitoring utilization—and receiving alerts when it gets too high—can help you determine if you’re approaching the maximum number of vCores or DTUs you’ve set aside for your instance or pool (i.e., your DTU and CPU limits). Active queries will time out if a database reaches certain limits.
Inefficient database searches are one possible source of high CPU consumption. Inefficient queries use additional resources (such as CPU, memory, and workers) and might cause delay since your application must wait for the database to run queries. If you find high CPU and DTU regularly used, comparing it to the percentage of available workers and seeing if it’s also high can help you figure out if inefficient queries are to fault. You may need to raise the compute size of a database or pool to meet demand in some cases or determine whether query performance needs to be enhanced. The Azure SQL Database resource logs can provide more information regarding query performance, such as runtime statistics and whether the query timed out, providing a better understanding of why a database instance is overburdened.
App CPU and memory % are two metrics to keep an eye on.
When configuring it, you define a minimum and a maximum number of vCores that can be allocated to a serverless-tier database. The database’s vCores automatically scale based on demand while remaining within that range. This range also determines Azure’s service objective (i.e., compute size) for your database, setting restrictions for other resources like memory and workers. Based on the maximum vCores you’ve specified, and the resulting memory limit, the app CPU percentage and memory % represent vCore and memory consumption for a serverless database.
App for monitoring You can use CPU and RAM percentages to see if your serverless databases use too much or too little resources. Low vCore use, for example, may signal that you have allocated too many vCores to your database and can safely decrease. To evaluate if high usage is caused by too many running queries or a few long-running queries that need improvement, compare the app CPU percentage data to metrics such as the percentage of running workers.
App CPU billed is a metric to keep an eye on.
The App CPU billed statistic is used to compute serverless database billing charges. Azure charges serverless-tier databases based on the amount of computing (i.e., CPU and memory) and provided storage consumed per second. App CPU billing only accounts for active serverless database compute utilization. Only provisioned storage is charged for halted serverless databases due to periods of inactivity.
Because serverless costs vary depending on database activity and resource usage, it’s critical to keep track of this measure to know how much you’re spending on hosting your serverless databases. Whether you’re experiencing more extraordinary expenses, compare App CPU billed to your database’s CPU and memory usage to discover if there’s an underlying issue, such as inefficient queries taking too much CPU or memory.
Storage-related metrics
Your purchase model (DTU or vCore) will determine which utilization metric you should monitor. Both of these metrics measure the percentage of used compute resources for a provisioned or serverless database or elastic pool. Monitoring utilization—and alerting on high utilization—can help you determine if you are close to reaching the max number of vCores or DTUs you have allocated for your instance or pool (i.e., your DTU and CPU limits). Once a database hits these limits, active queries will time out.
Name
|
Description
|
|
---|---|---|
Storage percentage
|
Percentage of database space in use, based on the service tier
|
Resource: utilization
|
XTP storage percentage
|
Percentage of storage in use for in-memory online transaction processing
|
Resource: utilization
|
Storage
|
Amount of space allocated to a database
|
Resource: other
|
Storage percentage compared. Storage size is a metric to keep an eye on.
You’ll need enough database storage to update or create data for your apps to function effectively. Clients will receive an error message if a database does not have enough space to execute INSERT or UPDATE statements, and the application will not be updated as a result.
Setting storage usage (i.e., storage percentage) notifications on a database will assist you in predicting when it will run out of capacity. Putting alerts on a database provided online (i.e., storage percentage) will help you know when someone is about to run out of space. Azure database management systems are subject to the memory caps (i.e., your storage size) defined by the DTU and vCore purchase models, so establishing alerts on a website provided online will help you better understand when it is about to run out of space. Mitigation strategies such as decreasing database transaction data, which can quickly consume available storage, can assist reclaim space if storage utilization is high.
Comparing this measure to the storage size of your database can help with capacity planning, which is the process of evaluating the number of resources required to support workloads to save costs and improve performance. For example, if storage utilization remains low over time, you may be able to scale a database down to a service objective with a smaller data size limit, lowering expenses.
XTP storage percentage is a metric to keep an eye on.
In-memory online transaction processing (OLTP) in Azure SQL Database saves data in memory-optimized tables to increase application performance. This technology is offered in the Premium and Business Critical service levels. It is best suited for applications that need to invest vast amounts of data in a short length of time, such as trading, gaming, or IoT services.
In-memory storage is capped for OLTP databases, just like other resources, based on DTU and vCore constraints. Can monitor In-memory OLTP consumption (i.e., XTP storage %) to ensure that optimized tables do not run out of space. Will abort any active operations if single or pooled databases surpass their in-memory OLTP storage constraints (e.g., INSERT, UPDATE, CREATE). You can either delete data from memory-optimized tables or upgrade your service tier to free up storage space if this happens.
Metrics for requests
Name
|
Description
|
|
---|---|---|
Workers percentage
|
Percentage of available workers in use, based on service tier limits
|
Resource: Utilization
|
Sessions percentage
|
Percentage of concurrent sessions in use, based on service tier limits
|
Resource: Utilization
|
Deadlocks
|
Total number of blocking queries running on a database
|
Resource: other
|
Workforce and session percentages are two metrics to keep an eye on.
Workers handle incoming database requests (queries, logins, and logouts), and sessions are active database connections. Your service tier and compute size determine the maximum number of concurrent workers and sessions each of your databases can serve. For example, on a single database instance, the Basic tier allows for 300 simultaneous connections and 30 concurrent workers. Clients will receive an error notice if your database runs out of workers to process new requests, and the database will reject further queries until workers become available. Similarly, additional connections are rejected when the database’s session limit is reached.
Monitoring session and worker usage, and CPU consumption may help guarantee that your databases are always able to process requests and detect if database queries need to be optimized. For example, high utilization of all three resources could suggest that your database is under-provisioned, increasing the time required to process a request. You may need to upgrade to a higher service tier in that situation. It ensures that queries are executed quickly, and that tier restriction is not exceeded.
A database attempting to process too many ineffective queries could result in high worker or session consumption. These kinds of searches might use up many resources and prevent workers from working on new requests. For example, a service might send out numerous workers to retrieve data for an end-user request—one to get the first record and another to get the details. This procedure is inefficient and can result in network and downstream server latency. These workflows can be more efficient by integrating the two queries into a stored procedure, decreasing the number of active workers dedicated to a single question.
Deadlocks are a metric to keep an eye on.
When a transaction—a single unit of work, such as a request to connect to a database—holds a lock on resources (e.g., critical, row, page, table) that another transaction requires, a database deadlock occurs, causing both transactions to be blocked. When this happens, the Azure SQL Database’s underlying SQL server database engine will stop one trade from allowing the other to continue. Once the other transaction has been completed, an application can retry terminated transactions.
If you notice a spike in deadlocks, look into which queries generate the problem. I may find more information about where the lock occurred in Azure’s resource logs. You can utilize Azure’s Query Performance Insight to discover the particular queries that caused the deadlocks and analyze recommendations for optimizing them. Submissions may include activities like breaking lengthy transactions into many smaller ones to lessen the amount of time a query locks a database resource.
You may discover early indicators of database difficulties and assess if inefficient queries or instances cause performance degradations with insufficient processing power or storage space to handle workloads by monitoring key performance metrics. Following that, we’ll look at several critical connectivity metrics that can help you monitor database connections and detect possibly dangerous activities on a database instance.
Metrics for connectivity
A database server manages each database instance or pool in your system, acting as an administrative hub for logins, firewall and threat-detection policies, audit rules, etc. This allows you to keep track of database connections and apply appropriate access controls to protect them from unusual activities like illegal access and SQL injections.
A gateway redirects or proxies traffic to the proper database cluster when a client makes a database request. Traffic is directed to the appropriate database once inside the group. Redirect and gateways support proxy connection policies. The gateway routes initial client connections to the node holding the database then routes subsequent connections straight to the appropriate cluster with the redirect policy, which is recommended for improved latency and is the default policy for traffic from Azure. The proxy policy, the default policy for connections outside of Azure, sends all traffic through the gateway.
Azure generates the following metrics to assist you to monitor database traffic and verifying that the gateway distributes access to the correct databases:
Name
|
Description
|
|
---|---|---|
Connection failed
|
Total number of failed connections to a database at a given point in time
|
Other
|
Blocked by firewall
|
Total number of connections to a database blocked by your server’s firewall at a given point in time
|
Other
|
Connection successful
|
Total number of successful connections to a database at a given point in time
|
Resource: Utilization
|
Connections that have failed and connections that the firewall has blocked are two metrics to watch.
Azure keeps records of both unsuccessful connections and connections blocked by a firewall. Failed relationships can be caused by transient faults or a database hitting its resource limits, so setting up an alert to warn you of any anomalies in this statistic will help you rectify a problem before it affects users. An increase in blocked connections, on the other hand, could indicate a misconfigured firewall policy. You can troubleshoot further by looking through your audit logs to see who is attempting to connect to a database and whether or not they should have access.
Connection success is a metric to keep an eye on.
When the number of successful connections drops suddenly, services cannot query your databases. To find the fundamental reason, compare this measure to the number of failed connections or connections blocked by a firewall. A firewall or network misconfiguration, for example, could be the root of the problem if the number of blocked connections is considerable.
A significant spike in the number of successful connections, on the other hand, could be a sign of malicious activity, such as a distributed denial of service (DDoS) assault, which tries to deplete application resources by flooding them with requests. You can examine your database audit logs to establish the source of these attacks (e.g., IP address, geographic region) and then set up a firewall rule to prevent them.
Connection metrics give you a high-level picture of database traffic, but they don’t tell you who’s accessing the database or why they’re using it. It makes determining whether database traffic is coming from legitimate or potentially malicious sources more complex. Audit logs, which we’ll look at next, provide extra context into the users and services accessing a database and their actions.
Threat detection and auditing
To assist protect databases from an attack, Azure SQL Database includes various built-in security and compliance capabilities, such as audit logging and threat detection. All database actions and action groups are recorded in audit logs, including:
- successful and unsuccessful logins
- queries and storage procedures that have been executed
- database ownership, rights, schemas, and user passwords have all changed.
When you enable auditing on a database instance, it uses a default audit policy, but you can customize it to meet your needs.
Alerts for threat detection
Audit logs offer more excellent information on potential security issues and can combine them with other sophisticated features like threat detection alerts to assist neutralize assaults faster. In watches, the client’s IP address, the name of the database access, the time the database was seen, and whether or not the database includes sensitive data are all displayed.
Better visibility into your Azure SQL databases
We looked at the main metrics and logs to monitor to fully understand the health and quality of life of your Azure SQL databases in this post.
Enteros
About Enteros
IT organizations routinely spend days and weeks troubleshooting production database performance issues across multitudes of critical business systems. Fast and reliable resolution of database performance problems by Enteros enables businesses to generate and save millions of direct revenue, minimize waste of employees’ productivity, reduce the number of licenses, servers, and cloud resources and maximize the productivity of the application, database, and IT operations teams.
The views expressed on this blog are those of the author and do not necessarily reflect the opinions of Enteros Inc. This blog may contain links to the content of third-party sites. By providing such links, Enteros Inc. does not adopt, guarantee, approve, or endorse the information, views, or products available on such sites.
Are you interested in writing for Enteros’ Blog? Please send us a pitch!
RELATED POSTS
Revolutionizing Healthcare IT: Leveraging Enteros, FinOps, and DevOps Tools for Superior Database Software Management
- 21 November 2024
- Database Performance Management
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Optimizing Real Estate Operations with Enteros: Harnessing Azure Resource Groups and Advanced Database Software
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Revolutionizing Real Estate: Enhancing Database Performance and Cost Efficiency with Enteros and Cloud FinOps
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…
Enteros in Education: Leveraging AIOps for Advanced Anomaly Management and Optimized Learning Environments
In the fast-evolving world of finance, where banking and insurance sectors rely on massive data streams for real-time decisions, efficient anomaly man…