info@techdevops.com | 437-991-3573 | Data Engineering Services
TechDevOps.com
Resources Tools
Experts in Microsoft SQL Server on Windows, Linux, Containers | Clusters, Always On, FCI | Migrations, Cloud, Performance



Distribution Agent Undistributed Commands value is not changing
by BF (Principal Consultant; Architecture; Engineering)
2016-01-26







I had this issue recently with an On-Prem Microsoft SQL Server 2014 SP1 Server that was configured to replicate transactions
real-time to an Azure SQL Database as-a-service. Both Replication Monitor and Jobs Activity Monitor were showing the Distribution
Agent as running. However the number of Undistributed Command was 290,000 and holding for over an hour. So, was it actually doing anything?


First, identify the issue:

#1. Configure the Distribution Agent with Verbose Logging

-Subscriber [AZR01XXXX-DEV.DATABASE.WINDOWS.NET] -SubscriberDB [DBCloud] -Publisher [XXXXDB01] -Distributor [XXXXDB01] -DistributorSecurityMode 1 -Publication [PubDBDBcloud] -PublisherDB [DB] -Continuous -OutputVerboseLevel 3 -Output "Y:\ReplOutput\verbose.txt"

#2. Stop the Distribution Agent

#3. Start the Distribution Agent

At this point the verbose.txt log file is created and starting to populate. I noticed right away that it is processing a single Transaction.
A very large single transactions with a lot of commands.
As you can see below it was putting an entry in the verbose log for each CommandID it had processed. Essentially each separate CommandID was a replication stored procedure call to the Subscriber. Over 290000 SP calls to the Subscriber. That may seen like a lot but I have seen more before in the past and the Replication System was able to keep up much faster that this. Recall: The Subscriber here lives is in Azure.

2016-01-22 11:56:22.797 Last transaction timestamp: 0x00025cee00086bea000400000000
Transaction seqno: 0x00025cee00086bea0004
Command Id: 290754
Partial: 0
Type: 30
Command: <>

2016-01-22 11:56:22.797 Last transaction timestamp: 0x00025cee00086bea000400000000
Transaction seqno: 0x00025cee00086bea0004
Command Id: 290755
Partial: 0
Type: 30
Command: <>

2016-01-22 11:56:22.797 Last transaction timestamp: 0x00025cee00086bea000400000000
Transaction seqno: 0x00025cee00086bea0004
Command Id: 290756
Partial: 0
Type: 30
Command: <>

2016-01-22 11:56:22.812 Last transaction timestamp: 0x00025cee00086bea000400000000
Transaction seqno: 0x00025cee00088441000f
Command Id: 1
Partial: 0
Type: 47
Command: -2147483642

Next:

Execute: sp_browsereplcmds @xact_seqno_start='0x00025cee00086bea0004', @xact_seqno_end='0x00025cee00086bea0004'
Output: CALL [sp_MSupd_dboSEARCH_XXXX] (upd meaning update statement)
select * from MSrepl_transactions
select * from MSrepl_commands

I can now see it was was an UPDATE Statement executed that updated every record in the table (290000 records). Again, not a big
deal with On_Prem to On-Prem Transactional Replication but amuch slower with Azure Subscriber. I have seen it in the millions before
and handled within 30-45minutes.


At this point we know that the Distribution Agent is in fact performing work. The CommandID in the verbose log is increating which
means it is updating the Subscriber. It is best to let that complete.
It is not a good idea to Stop & Start the Distribution Agents during a large update statement processing as it will have to start that
Transaction over at CommandID=1 and go again to CommandID=290000.

Note:
Undistributed Commands value will not change until the update statement commands reach 290,000. Then it commits and updates the value.


Verbose Log Output:

This will indicate the processing details like Total Num Cmds: 290756 and Total Run Time (ms) : 7424969. Excellent information.


2016-01-22 11:56:22.812 sp_MSget_repl_commands timestamp returned: 0x0x00025cee00088478001d00000000, 290757, local rowcount: 290757
2016-01-22 11:56:26.640 Parameterized values for above command(s): {{1, 16486630}, {1, 16486631}, {1, 16486632}, {1, 16486633}, {1, 16486634}, {1, 16486635}, {1, 16486636}, {1, 16486637}, {1, 16486638}, {1, 16486639}, {1, 16486640}, {1, 16486641}, {1, 1, 16486642}, {1, 16486643}, {1, 16486644}, {1, 16486645}, {1, 16486646}, {1, 16486647}, {1, 16486648}, {1, 16486649}, {1, 16486650}, {1, 16486651}, {1, 16486652}, {1, 16486653}, {1, 16486654}, {1, 16486655}, {1, 16486656}, {1, 16486657, 1, 16486658, 1, 16486659, 1, 16486660, 1, 16486661, 1, 16486662, 1, 16486663, 1, 16486664, 1, 16486665, 1, 16486666, 1, 16486667, 1, 16486668, 1, 16486669, 1, 16486670, 1, 16486671, 1, 1, 16486672, 1, 16486673, 1, 16486674, 1, 16486675, 1, 16486676, 1, 16486677, 1, 16486678, 1, 16486679, 1, 16486680, 1, 1, 16486681, 1, 16486682, 1, 16486683, 1, 16486684, 1, 16486685, 1, 16486686, 1, 16486687, 1, 16486688, 1, 16486689, 1, 16486690, 1, 16486691, 1, 16486692, 1, 16486693, 1, 1, 16486694, 1, 16486695, 1, 1, 16486696, 1, 16486697, 1, 16486698, 1, 16486699, 1, 16486700, 1, 1, 16486701, 1, 16486702, 1, 16486703, 1, 16486704}
2016-01-22 11:56:26.734 The commit timestamp value is: 0x00025cee00086bea000400000000
************************ STATISTICS SINCE AGENT STARTED ***********************
01-22-2016 06:56:33

Total Run Time (ms) : 7424969 Total Work Time : 7413547
Total Num Trans : 2 Num Trans/Sec : 0.00
Total Num Cmds : 290756 Num Cmds/Sec : 39.22
Total Skipped Cmds : 0
Total Idle Time : 0

Writer Thread Stats
Total Number of Retries : 0
Time Spent on Exec : 7096320
Time Spent on Commits (ms): 1782 Commits/Sec : 0.00
Time to Apply Cmds (ms) : 7413547 Cmds/Sec : 39.22
Time Spent on Schema Changes (ms) : 0
Time Cmd Queue Empty (ms) : 313252 Empty Q Waits > 10ms: 8497
Total Time Request Blk(ms): 313252
P2P Work Time (ms) : 0 P2P Cmds Skipped : 0

Reader Thread Stats
Calls to Retrieve Cmds : 1
Time to Retrieve Cmds (ms): 412919 Cmds/Sec : 704.15
Time Cmd Queue Full (ms) : 7007674 Full Q Waits > 10ms : 2271

Article Level Parameterized Command Stats
Article Id: 18
Number of Inserts: 0, time (ms): 0
Number of Updates: 290756, time (ms): 316086
Number of Deletes: 0, time (ms): 0
Number of Other Cmds (e.g. custom commands): 0, time (ms): 0
2016-01-22 11:56:33.656 <stats state="1" work="7413" idle="0"><reader fetch="401" wait="7018"/><writer write="7413" wait="313"/><sincelaststats elapsedtime="7420" work="7413" cmds="290756" cmdspersec="39.000000"><reader fetch="401" wait="7018"/><writer write="7413" wait="313"/></sincelaststats><message>Normal events that describe both the reader and writer thread performance.</message></stats>
*******************************************************************************


Performance settings for Replication Performance:

Distribution Agent Profile:

-CommitBatchSize 100
-CommitbatchThershold 1000
Means after 100 Transactions issue a commit on the Subscriber DB or after 1000 Commands issue a commit on the Subscriber DB (whichever
happens first)
If you know there is an update statement needed to be executed then you can adjust that to 1 and 10000. Then re-set it back to 100 and
1000 after. This requires updating the Distribution Agent Profile and of course a Stop & Start of the Distribution Agent.

-Subscription Streams - set to # of logical CPU's. Since each Logical CPU is a Scheduler inside SQL Server then you do not want
the # of Subscription Streams to be more than the # of Logical CPU Cores on your Distributor. Increasing this value effectively
allows for parallel update streams into the Subscriber.

We had great success with setting -CommitBatchSize=1000 -CommitBatchThreshold=1500 and -SubscriptionStreams=9 on our Production OnPrem
Distribution Agent.

Azure:

select * from sys.dm_db_resource_stats
select * from sys.dm_db_resource_stats
select * from sys.dm_exec_requests0
dbcc inputbuffer(92)
DBCloud.dbo.sp_MSupd_dboSEARCH_XXX;1

These are useful to see current connections & also if the as-a-service db has any resource issues like High CPU. This is known to
cause TCP Connection issues with On-Prem Distribution Agent and also for any clients connecting using SSMS. Likely you need to move up
a tier in the as-a-service - i.e. move your Azure SQL Database to bigger & better capacity VMs.


Misc T-SQL:

select * from distribution.dbo.MSrepl_commands with (nolock) where xact_seqno >
(select transaction_timestamp from [AZRCLOUD].CloudDB.dbo.MSreplication_subscriptions with (nolock))
and publisher_database_id in
(select id from distribution.dbo.MSpublisher_databases where publisher_db = 'OnPremDB')

exec distribution.dbo.sp_browsereplcmds @xact_seqno_start='0x00254A89000045B8000D', @xact_seqno_end='0x00254A89000045B8000D'

Job: Distribution clean up: distribution

dbcc inputbuffer(92)

select * from distribution.dbo.MSrepl_transactions
select * from distribution.dbo.MSrepl_commands
select * from publisherDB.dbo.sysarticles
select * from distribution.dbo.MSpublisher_databases


sp_help_agent_profile 2 --Log Reader Agent

sp_help_agent_profile 3 --Distribution Agent

select * from msdb.dbo.MSagent_profiles


--Script Replication will show details on the Distribution Agent Profile
-- Adding the agent profiles
declare @config_id int
exec sp_add_agent_profile @profile_id = @config_id OUTPUT, @profile_name = N'DEV2Azure', @agent_type = 3, @profile_type = 1, @description = N''
exec sp_change_agent_parameter @profile_id = @config_id, @parameter_name = N'-BcpBatchSize', @parameter_value = N'2147473647'
exec sp_change_agent_parameter @profile_id = @config_id, @parameter_name = N'-CommitBatchSize', @parameter_value = N'1000'
exec sp_change_agent_parameter @profile_id = @config_id, @parameter_name = N'-CommitBatchThreshold', @parameter_value = N'1500'


Performance Tuning SQL Server Transactional Replication – Part 1

SQL Server Fine – Tuning Transactional Replication Performance – Part 2


***Note:***

Log Reader Agent Profile location: Open Replication Monitor, go to Replication Monitor..My Publishers..select your Publisher and
in the right pane select Agents tab and then the drop-down list select "Log Reader Agent".

Distribution Agent profile location: Open Replication Monitor, go to Replication Monitor..My Publishers..select your Publication, then Subscription(s) and right click select "Agent Profile".