We had an issue recently with Replication Monitor displaying the below error in the Distributor to Subscriber History monitor. The error was occurring every 10 minutes and continuous. The replication topology was Publisher & Distributor located On-Premise in our local Data Centers and all the Subscribers located in Microsoft Azure Cloud in USA Data Centers.
The error was occurring only in 3 out of the 12 Subscribers. What was causing this error?
Agent is retrying after an error. 65 retries attempted. See agent job history in the Jobs folder for more details.
After some investigation, we discovered negative value in run_duration column in the MSDB sysjobhistory table. There were a total of 3 records with negative values and those 3 records were linked to the same 3 Subscribers that were showing this error. We proceeded to remove those 3 records from the Job History table using the SSMS Job Agent GUI "Delete" functionality. Immediately after removal of those records, the error message never re-occurred. Further investigation showed that those 3 Subscriber Servers has a Day Light Savings Time change on that same day the errors first appeared. Problem solved & root cause identified!
T-SQL: Find the records that cause the error seen in Replication Monitor:
SELECT * FROM msdb.dbo.sysjobhistory WHERE run_duration < 0 ORDER BY run_date DESC
|
|
|
|
|