Azure SQL DB
68 TopicsLesson Learned #533: Intermittent Azure SQL Database Connectivity and Authentication Issues
While working on a recent service request, we helped a customer troubleshoot intermittent connection and authentication failures when accessing Azure SQL Database using Active Directory (Entra ID) authentication from a Java-based application using HikariCP with JDBC/ODBC. They got the following error: com.zaxxer.hikari.pool.HikariPool$PoolInitializationException: Failed to initialize pool: Failed to authenticate.. Request was throttled according to instructions from STS. Retry in 29701 ms. java.sql.SQLTransientConnectionException: HikariPool-application1 - Connection is not available, request timed out after The first insight was focusing in the error message: Request was throttled according to instructions from STS. Retry in 29701 ms. This message seems it is returned by the Azure Active Directory Security Token Service (STS) when the client is sending too many token requests in a short period of time, exceeding the allowed threshold. We don't have all the details about, but, in high-concurrency environments (e.g., multiple threads, large connection pool) causes each thread to independently request a new token and we could reach a limit in this service, even, if the connection pool retries frequently or fails authentication, the number of token requests can spike. This is the reason, that HikariCP tries to initialize or refresh connections quickly, as many threads attempt to connect at once, and all trigger token requests simultaneously, STS throttling is reached. In order to avoid this situation, could be different topics, like, ensure our application caches tokens and reuses them across threads, using Managed Identity, increase the retry after delay, or perhaps, depending on HikariCP configuration, pre-warm connections gradually. Of course, discuss with your EntraID administration is other option.Lessons Learned #534: Azure SQL Database Connections with Managed Identity and Python ODBC
We worked on a service request that our customer trying to enable their Python application, hosted on Azure App Service, to connect securely to Azure SQL Database using a user-assigned managed identity. They attempted to use the Microsoft ODBC Driver for SQL Server with the managed identity for authentication. During our troubleshooting process we found several issues/error messages causing by an incorrect settings in the connection string: The initial connection string used the 'ActiveDirectoryInteractive' authentication method, which is not compatible with managed identities. The correct approach is to use 'ActiveDirectoryMsi' or 'ActiveDirectoryManagedIdentity' for system/user-assigned managed identities. Switching to 'ActiveDirectoryMsi' led to a pyodbc error: pyodbc.Error: (FA005, [FA005] [Microsoft][ODBC Driver 18 for SQL Server]Cannot use Access Token with any of the following options: Authentication, Integrated Security, User, Password. (0) (SQLDriverConnect)). The FA005 error message indicated a mismatch between the use of an access token and the connection string properties. Specifically, when passing an access token, the connection string must not include conflicting authentication parameters such as User, Password, or Integrated Security.Lesson Learned #532:Power BI Refresh Failure Due to Connection Pool Exhaustion in Azure SQL Database
We've been working on a service request that a customer experienced frequent failures when refreshing Power BI reports connected to an Azure SQL Database. The error message indicated a problem with acquiring connections from the data source pool, leading to unsuccessful report refreshes. We found the following error message: A request for a connection from the data source pool could not be granted. Retrying the evaluation may solve the issue. The exception was raised by the IDbCommand interface. Analyzing the details of the case, we found that the issue occurred regardless of whether Entra ID or SQL authentication was used, we don't have issue at Azure SQL Database level and not login error, but, we identified a high number of simultaneous connection attempts from Power BI to Azure SQL Database. We also reviewed the configuration of Power BI Desktop and noted that it loads multiple tables in parallel during refresh operations. This behavior triggers a surge of concurrent connections to the database, which in this scenario resulted in exhaustion of the connection pool at the application layer. We suggested to reduce the parallel table loading setting in Power BI Desktop, using File > Options and settings > Options > Data Load and Under Parallel loading of tables. Later adjusted the setting to a higher value to find a balance between performance and stability.Lesson Learned #531: Scalar UDF vs Parallelism
Last week I worked on a support case where our customer reported that the exact same query, executed against two identical databases with the same resources, was taking significantly longer on one of them. Both databases had the same number of rows, up-to-date statistics, and identical indexes. We started by collecting the execution plans, and I’d like to share what we found. Comparing both execution plans, in the XML of the execution plan that is taking more time, we found the following line in <QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="TSQLUserDefinedFunctionsNotParallelizable"> However in the XML of execution plan that is taking less time we found <QueryPlan DegreeOfParallelism="1" ContainsInlineScalarTsqlUdfs="true"> So, based on this difference, it is clear that the query is using a Scalar UDF but in one of the database, based on the definition of this Scalar UDF function is not possible to run the query in parallel. But in the other database even using Scalar UDF it is possible. As both databases are using the same compatibility level of 160, we started to analyze what is different on both that leads to this behavior, sharing with you an example. DROP TABLE IF EXISTS dbo.TestData; GO CREATE TABLE dbo.TestData ( ID INT IDENTITY(1,1) PRIMARY KEY, Value1 INT, Value2 INT ); INSERT INTO dbo.TestData (Value1, Value2) SELECT ABS(CHECKSUM(NEWID()) % 10000), ABS(CHECKSUM(NEWID()) % 10000) FROM sys.all_objects a CROSS JOIN sys.all_objects b WHERE a.object_id < 150 AND b.object_id < 150; Let's create the Scalar function that blocks the parallel execution. CREATE OR ALTER FUNCTION dbo.fn_BlockParallel (@v1 INT) RETURNS INT AS BEGIN DECLARE @x INT; SELECT @x = DATEDIFF(MILLISECOND, GETDATE(), SYSDATETIME()); RETURN ISNULL(@x, 0); END; When I executed the following query I see in the XML file the following - <QueryPlan DegreeOfParallelism="0" NonParallelPlanReason="TSQLUserDefinedFunctionsNotParallelizable" CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="216"> SELECT ID, dbo.fn_BlockParallel(Value1) FROM dbo.TestData WHERE Value1 > 100 OPTION (MAXDOP 4); GO If I modified the code for a new Scalar UDF, I see: <QueryPlan DegreeOfParallelism="1" CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="272" ContainsInlineScalarTsqlUdfs="true"> CREATE OR ALTER FUNCTION dbo.fn_BlockParallel (@v1 INT) RETURNS INT AS BEGIN DECLARE @x INT; SELECT @x = v1 * 2; RETURN @x; END; So, even when using compatibility level 160, certain constructs inside scalar UDFs can prevent inlining, which in turn blocks query parallelism. When performance varies between environments, one of the things to check is whether scalar UDFs are involved, and if they are eligible for inlining. To detect the issue quickly, look at the execution plan XML and check the attributes DegreeOfParallelism, ContainsInlineScalarTsqlUdfs, and NonParallelPlanReason.Lesson Learned #530: Comparing Execution Plans to Expose a Hidden Performance Anti-Pattern
One of the most powerful features of SSMS Copilot is how it lets you compare execution plans and immediately show you performance issues. In this case, I would like to share with you my lesson learned comparing two queries and how they behave very differently inside the engine. We have the following queries, these are using a table _x_y_z_MS_HighCPU that contains 4 millon of rows. The column TextToSearch is a varchar(200) datatype. -- Query 1 SELECT COUNT(*) FROM [MSxyzTest].[_x_y_z_MS_HighCPU] WHERE TextToSearch = N'Value: 9'; -- Query 2 SELECT COUNT(*) FROM [MSxyzTest].[_x_y_z_MS_HighCPU] WHERE TextToSearch = 'Value: 9'; Since the query texts are different, each will have a different query ID in Query Store. By running the following T-SQL, for example, I can identify the query IDs. SELECT qsqt.query_sql_text, qsq.query_id, qsp.plan_id, qsp.query_plan_hash, qsp.last_execution_time FROM sys.query_store_query_text qsqt JOIN sys.query_store_query qsq ON qsqt.query_text_id = qsq.query_text_id JOIN sys.query_store_plan qsp ON qsq.query_id = qsp.query_id WHERE qsqt.query_sql_text LIKE '%SELECT COUNT(*)%' -- FROM [[MSxyzTest]].[[_x_y_z_MS_HighCPU]]%' ORDER BY qsp.last_execution_time DESC; Queries 1 and 2 can be compared directly. Using Copilot, I ran the following prompt: Compare the execution plans for the two queries (query id 1 and query id 2 using Query Store. Highlight any differences in operators, estimated vs actual row counts, or implicit conversions. Running the following prompt : CPU Usage: Please, show the top resource-consuming queries in the current database using Query Store data. Include query text, execution count, duration, CPU time, and logical reads. We could see the impact of using an antipattern:100Views0likes0CommentsLesson Learned #528: Arithmetic overflow error converting IDENTITY to data type int
Some days ago, we were working on a service request where our customer was experiencing an issue with their application while reading a CSV file from Azure Blob Storage and writing the data into an Azure SQL Database table. They discovered that thousands of rows were being skipped. While analyzing the issue, we identified the following error message: Arithmetic overflow error converting IDENTITY to data type int. After a investigation, we found the root cause: The target table in production had an INT-typed IDENTITY column. The identity value had reached the maximum value of an INT (2,147,483,647). The application was configured with SET IDENTITY_INSERT ON, so it was trying to explicitly insert identity values from the CSV file. When it attempted to insert a value above the INT limit, it failed with an arithmetic overflow. How to reproduce the issue in SQL: CREATE TABLE dbo.TestDemo ( ID INT IDENTITY(1,1) PRIMARY KEY, Name NVARCHAR(100) ); -- Set the identity value to the maximum possible value for INT -- Only for testing... DBCC CHECKIDENT ('dbo.TestDemo', RESEED, 2147483646); SET IDENTITY_INSERT dbo.TestDemo ON; INSERT INTO dbo.TestDemo (ID, Name) VALUES (2147483647, 'Last Row'); INSERT INTO dbo.TestDemo (ID, Name) VALUES (2147483648, 'Overflow'); SET IDENTITY_INSERT dbo.TestDemo OFF;110Views0likes0CommentsLesson Learned #359: TCP Provider: Error code 0x68 (104) (SQLExecDirectW)
Today, we got a service request that our customer faced the following error message connecting to the database: (pyodbc.OperationalError) ('08S01', '[08S01] [Microsoft][ODBC Driver 18 for SQL Server]TCP Provider: Error code 0x68 (104) (SQLExecDirectW)'). This customer is using Python in a Linux environment. Following I would like to share my lessons learned about this error message.25KViews2likes4CommentsLesson Learned #527:Calling Azure OpenAI with Managed Identity via sp_invoke_external_rest_endpoint
A day ago, I was working on a service request where our customer got the following error message: {"response":{"status":{"http":{"code":401,"description":""}},"headers":{"Date":"Mon, 07 Jul 2025 19:36:30 GMT","Content-Length":"297","Content-Type":"application\/json","apim-request-id":"cabfb91a-5ede-4371-91d5-XXX","x-ms-client-request-id":"Not-Set","strict-transport-security":"max-age=31536000; includeSubDomains; preload","x-content-type-options":"nosniff"}},"result":{"error":{"code":"PermissionDenied","message":"The principal `XXXX-YYYY-4d9e-8e70-13c98fb84e7a` lacks the required data action `Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action` to perform `POST /openai/deployments/{deployment-id}/chat/completions` operation."}}}. Following I would like to share my experience resolving this issue. The first thing was to try reproducing the issue in our lab. So, I began integrating Azure OpenAI with Azure SQL Database to perform sentiment analysis based on fictitious feedback. I created a SQL table called CustomersInfo which contains fictitious customer feedback: CREATE TABLE CustomersInfo ( CustomerID INT PRIMARY KEY IDENTITY(1,1), Name NVARCHAR(100), Feedback NVARCHAR(MAX), Sentiment NVARCHAR(20) NULL ); INSERT INTO CustomersInfo (Name, Feedback) VALUES ('Anna', 'The product arrived damaged and no one responded to my messages.'), ('John', 'I loved the service, it was fast and the product is excellent.'), ('Emily', 'It was okay, but I think packaging could be better.'), ('David', 'I will never buy here again, terrible service.'), ('Sophia', 'Everything was perfect, thank you for the follow-up.'), ('Michael', 'Delivery time was average, but the product did not meet expectations.'), ('Laura', 'Great overall experience, I would recommend it.'), ('James', 'I expected more quality for the price I paid.'), ('Isabella', 'Easy to order, great customer support.'), ('Robert', 'I didn’t like it, but at least they gave me a refund.'); I configured Azure OpenAI and permissions by creating an Azure OpenAI resource Endpoint: https://openaiexample.openai.azure.com Model: gpt-4 Roles granted to the EntraID user that is connecting to the database: Cognitive Services OpenAI User Cognitive Services User I then enabled the SQL database to call the OpenAI endpoint using Managed Identity: CREATE DATABASE SCOPED CREDENTIAL [https://openaiexample.openai.azure.com] WITH IDENTITY = 'Managed Identity', SECRET = '{"resourceid":"https://cognitiveservices.azure.com"}'; DECLARE @response NVARCHAR(MAX); DECLARE Payload NVARCHAR(MAX) = '{ "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Classify the sentiment of the following customer feedback as Positive, Negative, or Neutral. Feedback: The product arrived damaged and no one responded to my messages." } ], "max_tokens": 50 }'; EXEC sp_invoke_external_rest_endpoint @url = 'https://openaiexample.openai.azure.com/openai/deployments/gpt-4.1-jmjurado/chat/completions?api-version=2025-01-01-preview', @method = 'POST', @credential = [https://openaiexample.openai.azure.com], Payload = Payload, @response = @response OUTPUT; DECLARE @json NVARCHAR(MAX) = @response; SELECT JSON_VALUE(c.value, '$.message.content') AS Respuesta FROM OPENJSON(@json, '$.result.choices') AS c; To encapsulate this logic, we created a stored procedure: CREATE OR ALTER PROCEDURE AnalizarSentimiento @Texto NVARCHAR(MAX), @Sentimiento NVARCHAR(50) OUTPUT AS BEGIN DECLARE @response NVARCHAR(MAX); DECLARE Payload NVARCHAR(MAX) = '{ "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "Classify the sentiment of the following customer feedback as Positive, Negative, or Neutral. Feedback: ' + @Texto + '" } ], "max_tokens": 50 }'; EXEC sp_invoke_external_rest_endpoint @url = 'https://openaiexample.openai.azure.com/openai/deployments/gpt-4.1-jmjurado/chat/completions?api-version=2025-01-01-preview', @method = 'POST', @credential = [https://openaiexample.openai.azure.com], Payload = Payload, @response = @response OUTPUT; SELECT @Sentimiento = JSON_VALUE(c.value, '$.message.content') FROM OPENJSON(@response, '$.result.choices') AS c; END; Now, I'm ready to execute the procedure and retrieve the data: DECLARE @Sentimiento NVARCHAR(50); EXEC AnalizarSentimiento @Texto = 'The product arrived damaged and no one responded to my messages.', @Sentimiento = @Sentimiento OUTPUT; SELECT @Sentimiento AS Resultado; However, I got the following result {"response":{"status":{"http":{"code":401,"description":""}},"headers":{"Date":"Mon, 07 Jul 2025 19:36:30 GMT","Content-Length":"297","Content-Type":"application\/json","apim-request-id":"cabfb91a-5ede-4371-91d5-XXX","x-ms-client-request-id":"Not-Set","strict-transport-security":"max-age=31536000; includeSubDomains; preload","x-content-type-options":"nosniff"}},"result":{"error":{"code":"PermissionDenied","message":"The principal `XXXX-YYYY-4d9e-8e70-13c98fb84e7a` lacks the required data action `Microsoft.CognitiveServices/accounts/OpenAI/deployments/chat/completions/action` to perform `POST /openai/deployments/{deployment-id}/chat/completions` operation."}}} Analyzing the error message, it appears that principal ID that I'm using for this Managed Identity to perform the call has not access to the endpoint. Even adding the client_id keyword didn't resolve the issue. CREATE DATABASE SCOPED CREDENTIAL [https://openaiexample.openai.azure.com] WITH IDENTITY = 'Managed Identity', SECRET = '{"client_id":"YYYYY-ZZZZ-4b75-XXX-9ab8e8c14c1e", "resourceid":"https://cognitiveservices.azure.com"}'; After several troubleshooting steps and by analyzing the error message, I identified that the client_id from the error message referenced in the response belongs to the system-assigned managed identity of the Azure SQL Server. Once I granted the necessary permissions to this identity, I was able to connect and successfully perform the operation.276Views0likes0CommentsLesson Learned #525: Tracking Command Timeouts in Azure SQL: Beyond Query Store with Extended Events
A few days ago, we were working on a support case where our customer was intermittently experiencing command timeouts. What made the case interesting was that queries which usually completed in under one second suddenly started taking more than 10 seconds to execute. Since the application — developed in Python using the ODBC Driver 18 for SQL Server — had a command timeout set to 5 seconds, the following error was triggered every time the threshold was exceeded: Error executing command, retrying in 5 seconds. Attempt 1 of 3 with new timeout 5. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') The application had built-in retry logic, dynamically increasing the timeout in each of the three retry attempts, to allow time for the query to complete and to log enough data for post-error analysis. Example logs from the retry logic: (RunCommandTimeout) - Thread: 39808 - Error executing command, retrying in 5 seconds. Attempt 1 of 3 with new timeout 5. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') INFO:root:Connecting to the DB jmjuradotestdb1 - Thread id 39808 - (Attempt 1/3) INFO:root:Connected to the Database in jmjuradotestdb1 - Thread id 39808 - 0.0445 seconds --- (RunCommandTimeout) - Thread: 39808 - Error executing command, retrying in 9 seconds. Attempt 2 of 3 with new timeout 9. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') INFO:root:Connecting to the DB jmjuradotestdb1 - Thread id 39808 - (Attempt 1/3) INFO:root:Connected to the Database in jmjuradotestdb1 - Thread id 39808 - 0.0532 seconds --- (RunCommandTimeout) - Thread: 39808 - Error executing command, retrying in 13 seconds. Attempt 3 of 3 with new timeout 13. Error: ('HYT00', '[HYT00] [Microsoft][ODBC Driver 18 for SQL Server]Query timeout expired (0) (SQLExecDirectW)') (RunCommandTimeout) - Thread: 39808 - Loop:2/5 Execution Time: 9.7537 seconds My first prompt using SSMS Copilot was this "Review the queries that experienced a command timeout or were aborted in the last 30 minutes. Include query text, queryid, duration, and the reason and code for the abort if available." and I got the following results. So, all points that the query 216 got command timeouts. My next question, was, for query ID 216, show the number of total executions reporting that is 28 executions. The response showed 28 executions, but this number didn’t match the number of aborted and non-aborted executions observed in the application logs, why this difference? Checking the table sys.query_store_runtime_stats I found 10 rows all having execution_type = 3, and total executions 28. So, that's mean that Query Store aggregates query execution data over a fixed interval. So, the execution_type is an indicator that at least an execution during this runtime interval was aborted. So, at least several of them were aborted and other not. To obtain a more granular and accurate picture, I created an Extended Events session to capture these events using ring_buffer target. CREATE EVENT SESSION [CommandAborted] ON DATABASE ADD EVENT sqlserver.attention( ACTION ( sqlserver.client_app_name, sqlserver.client_hostname, sqlserver.username, sqlserver.database_name, sqlserver.sql_text ) ) ADD TARGET package0.ring_buffer WITH (MAX_MEMORY = 4096KB, EVENT_RETENTION_MODE = ALLOW_SINGLE_EVENT_LOSS); GO ALTER EVENT SESSION [CommandAborted] ON DATABASE STATE = START; after reproducing the command timeout scenario again, I was able to see only the aborted executions. So, in this case, 28 executions were executed and 7 executions were aborted. WITH RingBufferXML AS ( SELECT CAST(t.target_data AS XML) AS target_data FROM sys.dm_xe_database_session_targets t JOIN sys.dm_xe_database_sessions s ON t.event_session_address = s.address WHERE t.target_name = 'ring_buffer' AND s.name = 'CommandAborted' ) SELECT x.value('@name', 'varchar(50)') AS event_name, x.value('@timestamp', 'datetime2') AS event_time, x.value('(action[@name="client_app_name"]/value)[1]', 'nvarchar(256)') AS client_app_name, x.value('(action[@name="sql_text"]/value)[1]', 'nvarchar(max)') AS sql_text, x.value('(data[@name="duration"]/value)[1]', 'bigint') AS duration_microseconds, CAST(x.value('(data[@name="duration"]/value)[1]', 'bigint') / 1000000.0 AS decimal(10,3)) AS duration_seconds FROM RingBufferXML CROSS APPLY target_data.nodes('//event') AS tab(x) WHERE x.value('@name', 'varchar(50)') = 'attention' and x.value('(action[@name="client_app_name"]/value)[1]', 'nvarchar(256)') = 'TEST-DataCon' ORDER BY event_time DESC;138Views0likes0CommentsLesson Learned #524: Optimizing Power BI with DirectQuery
In some situations, customers that are using Power BI and DirectQuery reported performance issues depending on how the query has been defined by Power BI. At DataCon 2025 this June in Seattle, I had the great opportunity to present some performance recommendations in this area, based on following articles that we published on our blog some time ago: Lesson Learned #247: All started with the phrase: In PowerBI Direct Query is slow - Indexed views | Microsoft Community Hub Lesson Learned #249: All started with the phrase: In PowerBI Direct Query is slow-Partitioned table | Microsoft Community Hub Lesson Learned #250: All started with the phrase: In PowerBI Direct Query is slow-ColumnStore Index | Microsoft Community Hub In this folder you could find all the materials that we used to deliver this session. This lab helps us better understand where performance gains can be achieved in our database — making it easier to identify what to optimize and how. Also, using the Copilot feature added in SQL Server Management Studio v.21 I would like to share some prompt that we used during the lab that it was very useful during the troubleshooting scenario that we divided in 3 areas: Analysis Phase: List all tables in the 'Fact' and 'Dimension' schemas with space usage in MB and number of rows List all tables in the 'Fact' and 'Dimension' schemas with their structure, including data types, primary keys, foreign keys and indexes. Then provide optimization suggestions for DirectQuery scenarios in Power BI Show the name of the tables and their relation among them List all foreign key relationships between tables in the 'Fact' and 'Dimension' schemas, showing the cardinality and referenced columns Could you please let me know what is the meaning of every table? Describe all schemas in this database, listing the number of tables and views per schema Create a textual data model (ER-style) representation showing how all Fact and Dimension tables are connected. Maintenance Plans: List all statistics in the database that have not been updated in the last 7 days, showing table name, number of rows, and last update date List all indexes in the database with fragmentation higher than 30%. Provide the T-SQL to rebuild each table in the 'Dimension' and 'Fact' schemas in ONLINE mode, and another T-SQL statement for updating automatic statistics List all tables with allocated space but zero rows, or with excessive reserved space not used by actual data Performance Troubleshooting Phase: I have this query, what are the improvements for better performance that we could apply? Please simplify the query and explain it. Explain in plain language what the following SQL query does, including the purpose of each subquery and the final WHERE clause Show a histogram of data distribution for key columns used in joins or filters, such as SaleDate, ProductCategory, or Region Can this query be transformed into a schemabound indexed view that pre-aggregates the sales by [Fiscal Month Label] to improve DirectQuery performance?151Views0likes0Comments