info@techdevops.com
|
437-991-3573
|
Data Engineering Services
|
TechDevOps.com
|
| | | |
Experts in Microsoft SQL Server on Windows, Linux, Containers | Clusters, Always On, FCI | Migrations, Cloud, Performance
|
| | | |
|
Distribution Agent Undistributed Commands value is not changing
by BF (Principal Consultant; Architecture; Engineering)
2016-01-26
I had this issue recently with an On-Prem Microsoft SQL Server 2014 SP1 Server that was configured to replicate transactions real-time to an Azure SQL Database as-a-service. Both Replication Monitor and Jobs Activity Monitor were showing the Distribution Agent as running. However the number of Undistributed Command was 290,000 and holding for over an hour. So, was it actually doing anything?
First, identify the issue:
#1. Configure the Distribution Agent with Verbose Logging
-Subscriber [AZR01XXXX-DEV.DATABASE.WINDOWS.NET] -SubscriberDB [DBCloud] -Publisher [XXXXDB01] -Distributor [XXXXDB01] -DistributorSecurityMode 1 -Publication [PubDBDBcloud] -PublisherDB [DB] -Continuous -OutputVerboseLevel 3 -Output "Y:\ReplOutput\verbose.txt"
#2. Stop the Distribution Agent
#3. Start the Distribution Agent
At this point the verbose.txt log file is created and starting to populate. I noticed right away that it is processing a single Transaction. A very large single transactions with a lot of commands. As you can see below it was putting an entry in the verbose log for each CommandID it had processed. Essentially each separate CommandID was a replication stored procedure call to the Subscriber. Over 290000 SP calls to the Subscriber. That may seen like a lot but I have seen more before in the past and the Replication System was able to keep up much faster that this. Recall: The Subscriber here lives is in Azure.
2016-01-22 11:56:22.797 Last transaction timestamp: 0x00025cee00086bea000400000000 Transaction seqno: 0x00025cee00086bea0004 Command Id: 290754 Partial: 0 Type: 30 Command: <>
2016-01-22 11:56:22.797 Last transaction timestamp: 0x00025cee00086bea000400000000 Transaction seqno: 0x00025cee00086bea0004 Command Id: 290755 Partial: 0 Type: 30 Command: <>
2016-01-22 11:56:22.797 Last transaction timestamp: 0x00025cee00086bea000400000000 Transaction seqno: 0x00025cee00086bea0004 Command Id: 290756 Partial: 0 Type: 30 Command: <>
2016-01-22 11:56:22.812 Last transaction timestamp: 0x00025cee00086bea000400000000 Transaction seqno: 0x00025cee00088441000f Command Id: 1 Partial: 0 Type: 47 Command: -2147483642
Next:
Execute: sp_browsereplcmds @xact_seqno_start='0x00025cee00086bea0004', @xact_seqno_end='0x00025cee00086bea0004' Output: CALL [sp_MSupd_dboSEARCH_XXXX] (upd meaning update statement) select * from MSrepl_transactions select * from MSrepl_commands
I can now see it was was an UPDATE Statement executed that updated every record in the table (290000 records). Again, not a big deal with On_Prem to On-Prem Transactional Replication but amuch slower with Azure Subscriber. I have seen it in the millions before and handled within 30-45minutes.
At this point we know that the Distribution Agent is in fact performing work. The CommandID in the verbose log is increating which means it is updating the Subscriber. It is best to let that complete. It is not a good idea to Stop & Start the Distribution Agents during a large update statement processing as it will have to start that Transaction over at CommandID=1 and go again to CommandID=290000.
Note: Undistributed Commands value will not change until the update statement commands reach 290,000. Then it commits and updates the value.
Verbose Log Output:
This will indicate the processing details like Total Num Cmds: 290756 and Total Run Time (ms) : 7424969. Excellent information.
2016-01-22 11:56:22.812 sp_MSget_repl_commands timestamp returned: 0x0x00025cee00088478001d00000000, 290757, local rowcount: 290757 2016-01-22 11:56:26.640 Parameterized values for above command(s): {{1, 16486630}, {1, 16486631}, {1, 16486632}, {1, 16486633}, {1, 16486634}, {1, 16486635}, {1, 16486636}, {1, 16486637}, {1, 16486638}, {1, 16486639}, {1, 16486640}, {1, 16486641}, {1, 1, 16486642}, {1, 16486643}, {1, 16486644}, {1, 16486645}, {1, 16486646}, {1, 16486647}, {1, 16486648}, {1, 16486649}, {1, 16486650}, {1, 16486651}, {1, 16486652}, {1, 16486653}, {1, 16486654}, {1, 16486655}, {1, 16486656}, {1, 16486657, 1, 16486658, 1, 16486659, 1, 16486660, 1, 16486661, 1, 16486662, 1, 16486663, 1, 16486664, 1, 16486665, 1, 16486666, 1, 16486667, 1, 16486668, 1, 16486669, 1, 16486670, 1, 16486671, 1, 1, 16486672, 1, 16486673, 1, 16486674, 1, 16486675, 1, 16486676, 1, 16486677, 1, 16486678, 1, 16486679, 1, 16486680, 1, 1, 16486681, 1, 16486682, 1, 16486683, 1, 16486684, 1, 16486685, 1, 16486686, 1, 16486687, 1, 16486688, 1, 16486689, 1, 16486690, 1, 16486691, 1, 16486692, 1, 16486693, 1, 1, 16486694, 1, 16486695, 1, 1, 16486696, 1, 16486697, 1, 16486698, 1, 16486699, 1, 16486700, 1, 1, 16486701, 1, 16486702, 1, 16486703, 1, 16486704} 2016-01-22 11:56:26.734 The commit timestamp value is: 0x00025cee00086bea000400000000 ************************ STATISTICS SINCE AGENT STARTED *********************** 01-22-2016 06:56:33
Total Run Time (ms) : 7424969 Total Work Time : 7413547 Total Num Trans : 2 Num Trans/Sec : 0.00 Total Num Cmds : 290756 Num Cmds/Sec : 39.22 Total Skipped Cmds : 0 Total Idle Time : 0
Writer Thread Stats Total Number of Retries : 0 Time Spent on Exec : 7096320 Time Spent on Commits (ms): 1782 Commits/Sec : 0.00 Time to Apply Cmds (ms) : 7413547 Cmds/Sec : 39.22 Time Spent on Schema Changes (ms) : 0 Time Cmd Queue Empty (ms) : 313252 Empty Q Waits > 10ms: 8497 Total Time Request Blk(ms): 313252 P2P Work Time (ms) : 0 P2P Cmds Skipped : 0
Reader Thread Stats Calls to Retrieve Cmds : 1 Time to Retrieve Cmds (ms): 412919 Cmds/Sec : 704.15 Time Cmd Queue Full (ms) : 7007674 Full Q Waits > 10ms : 2271
Article Level Parameterized Command Stats Article Id: 18 Number of Inserts: 0, time (ms): 0 Number of Updates: 290756, time (ms): 316086 Number of Deletes: 0, time (ms): 0 Number of Other Cmds (e.g. custom commands): 0, time (ms): 0 2016-01-22 11:56:33.656 <stats state="1" work="7413" idle="0"><reader fetch="401" wait="7018"/><writer write="7413" wait="313"/><sincelaststats elapsedtime="7420" work="7413" cmds="290756" cmdspersec="39.000000"><reader fetch="401" wait="7018"/><writer write="7413" wait="313"/></sincelaststats><message>Normal events that describe both the reader and writer thread performance.</message></stats> *******************************************************************************
Performance settings for Replication Performance:
Distribution Agent Profile:
-CommitBatchSize 100 -CommitbatchThershold 1000 Means after 100 Transactions issue a commit on the Subscriber DB or after 1000 Commands issue a commit on the Subscriber DB (whichever happens first) If you know there is an update statement needed to be executed then you can adjust that to 1 and 10000. Then re-set it back to 100 and 1000 after. This requires updating the Distribution Agent Profile and of course a Stop & Start of the Distribution Agent.
-Subscription Streams - set to # of logical CPU's. Since each Logical CPU is a Scheduler inside SQL Server then you do not want the # of Subscription Streams to be more than the # of Logical CPU Cores on your Distributor. Increasing this value effectively allows for parallel update streams into the Subscriber.
We had great success with setting -CommitBatchSize=1000 -CommitBatchThreshold=1500 and -SubscriptionStreams=9 on our Production OnPrem Distribution Agent.
Azure:
select * from sys.dm_db_resource_stats select * from sys.dm_db_resource_stats select * from sys.dm_exec_requests0 dbcc inputbuffer(92) DBCloud.dbo.sp_MSupd_dboSEARCH_XXX;1
These are useful to see current connections & also if the as-a-service db has any resource issues like High CPU. This is known to cause TCP Connection issues with On-Prem Distribution Agent and also for any clients connecting using SSMS. Likely you need to move up a tier in the as-a-service - i.e. move your Azure SQL Database to bigger & better capacity VMs.
Misc T-SQL:
select * from distribution.dbo.MSrepl_commands with (nolock) where xact_seqno > (select transaction_timestamp from [AZRCLOUD].CloudDB.dbo.MSreplication_subscriptions with (nolock)) and publisher_database_id in (select id from distribution.dbo.MSpublisher_databases where publisher_db = 'OnPremDB')
exec distribution.dbo.sp_browsereplcmds @xact_seqno_start='0x00254A89000045B8000D', @xact_seqno_end='0x00254A89000045B8000D'
Job: Distribution clean up: distribution
dbcc inputbuffer(92)
select * from distribution.dbo.MSrepl_transactions select * from distribution.dbo.MSrepl_commands select * from publisherDB.dbo.sysarticles select * from distribution.dbo.MSpublisher_databases
sp_help_agent_profile 2 --Log Reader Agent
sp_help_agent_profile 3 --Distribution Agent
select * from msdb.dbo.MSagent_profiles
--Script Replication will show details on the Distribution Agent Profile -- Adding the agent profiles declare @config_id int exec sp_add_agent_profile @profile_id = @config_id OUTPUT, @profile_name = N'DEV2Azure', @agent_type = 3, @profile_type = 1, @description = N'' exec sp_change_agent_parameter @profile_id = @config_id, @parameter_name = N'-BcpBatchSize', @parameter_value = N'2147473647' exec sp_change_agent_parameter @profile_id = @config_id, @parameter_name = N'-CommitBatchSize', @parameter_value = N'1000' exec sp_change_agent_parameter @profile_id = @config_id, @parameter_name = N'-CommitBatchThreshold', @parameter_value = N'1500'
Performance Tuning SQL Server Transactional Replication – Part 1
SQL Server Fine – Tuning Transactional Replication Performance – Part 2
***Note:***
Log Reader Agent Profile location: Open Replication Monitor, go to Replication Monitor..My Publishers..select your Publisher and in the right pane select Agents tab and then the drop-down list select "Log Reader Agent".
Distribution Agent profile location: Open Replication Monitor, go to Replication Monitor..My Publishers..select your Publication, then Subscription(s) and right click select "Agent Profile".
|
|
|
|
|
|
|
|