info@techdevops.com | 437-991-3573 | Data Engineering Services
TechDevOps.com
Resources Tools
Experts in Microsoft SQL Server on Windows, Linux, Containers | Clusters, Always On, FCI | Migrations, Cloud, Performance



Using Extended Events to Monitor Microsoft SQL Server AlwaysOn Availability Groups
by BF (Principal Consultant; Architecture; Engineering)
2016-11-19








The Extended Events code below will be running against a 3-node WSFC Cluster containing Microsoft SQL Server 2014 AlwaysON Availability
Groups on Azure Resource Manager Virtual Machines.

The high-level Azure Cloud objects created for this project are below:

- A virtual network containing two subnets, including a front-end and a back-end subnet
- Two domain controllers in an availability set with an Active Directory (AD) domain
- Two SQL Server VMs in an availability set deployed to the back-end subnet and joined to the AD domain
- A 3-node WSFC cluster with the Node Majority quorum model
- An internal load balancer with one or more IP addresses to support one or more availability group listeners
- An availability group with two synchronous-commit replicas of an availability database






Using Extended Events to Monitor SQL Server 2014 AlwaysOn Availability Groups


Create a new Extended Events Session on SERVER:

CREATE EVENT SESSION [EE-AlwaysOn-AvaliabilityGroup] ON SERVER
ADD EVENT sqlserver.alwayson_ddl_executed,
ADD EVENT sqlserver.availability_replica_automatic_failover_validation,
ADD EVENT sqlserver.availability_replica_state,
ADD EVENT sqlserver.availability_replica_state_change,
ADD EVENT sqlserver.availability_group_lease_expired,
ADD EVENT sqlserver.availability_replica_manager_state_change,
ADD EVENT sqlserver.error_reported(
WHERE ([error_number]=(9691) OR [error_number]=(35204) OR [error_number]=(9693) OR [error_number]=(26024) OR [error_number]=(28047) OR [error_number]=(26023) OR [error_number]=(9692) OR [error_number]=(28034) OR [error_number]=(28036) OR [error_number]=(28048) OR [error_number]=(28080) OR [error_number]=(28091) OR [error_number]=(26022) OR [error_number]=(9642) OR [error_number]=(35201) OR [error_number]=(35202) OR [error_number]=(35206) OR [error_number]=(35207) OR [error_number]=(26069) OR [error_number]=(26070) OR [error_number]>(41047) AND [error_number]<(41056) OR [error_number]=(41142) OR [error_number]=(41144) OR [error_number]=(1480) OR [error_number]=(823) OR [error_number]=(824) OR [error_number]=(829) OR [error_number]=(35264) OR [error_number]=(35265))),
ADD EVENT sqlserver.hadr_undo_of_redo_log_scan,
ADD EVENT sqlserver.lock_redo_blocked
ADD TARGET package0.ring_buffer(SET max_memory=(1024))
WITH (MAX_MEMORY=1024 KB,EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,MAX_DISPATCH_LATENCY=30 SECONDS,MAX_EVENT_SIZE=0 KB,MEMORY_PARTITION_MODE=NONE,TRACK_CAUSALITY=OFF,STARTUP_STATE=OFF)
GO



Start Extended Events Session:

If NOT Exists (Select Name FROM sys.dm_xe_sessions where name = 'EE-AlwaysOn-AvaliabilityGroup')
Begin
ALTER EVENT SESSION [EE-AlwaysOn-AvaliabilityGroup] ON SERVER STATE = START;
End

--SELECT * FROM sys.dm_xe_sessions
--SELECT * FROM sys.dm_xe_session_targets
--WAITFOR DELAY '00:00:05';


Query captured Extended Events data:

Select XML raw data:

SELECT CAST(st.target_data AS XML)
FROM
sys.dm_xe_sessions AS se
INNER JOIN sys.dm_xe_session_targets AS st ON se.address like st.event_session_address
where se.name = 'EE-AlwaysOn-AvaliabilityGroup'

Populate @target_data(XML) variable from captured E.E. data:

DECLARE @target_data XML;
SELECT @target_data = CAST(st.target_data AS XML)
FROM sys.dm_xe_sessions AS se
INNER JOIN sys.dm_xe_session_targets AS st ON se.address like st.event_session_address
where se.name = 'EE-AlwaysOn-AvaliabilityGroup'

Select Total Events Processed:

SELECT @target_data.value('(RingBufferTarget/@totalEventsProcessed)[1]', 'int') AS TotalEventsProcessed

Select XML data as Rows & Columns:

SELECT --top 100
n.value('(@timestamp)[1]', 'datetime2') as 'datetime',
n.value('(data[@name="availability_group_name"]/value)[1]', 'varchar(max)') as [availability_group_name],
n.value('(data[@name="availability_replica_name"]/value)[1]', 'varchar(max)') as [availability_replica_name],
UPPER(n.value('(data[@name="ddl_action"]/text)[1]', 'varchar(max)')) as [ddl_action],
UPPER(n.value('(data[@name="ddl_phase"]/text)[1]', 'varchar(max)')) as [ddl_phase],
n.value('(data[@name="statement"]/value)[1]', 'varchar(max)') as [statement],
--n.value('(data[@name="previous_state"]/value)[1]', 'varchar(max)') as [previous_state],
n.value('(data[@name="previous_state"]/text)[1]', 'varchar(max)') as [previous_state],
--n.value('(data[@name="current_state"]/value)[1]', 'varchar(max)') as [current_state],
n.value('(data[@name="current_state"]/text)[1]', 'varchar(max)') as [current_state],
n.value('(data[@name="error_number"]/value)[1]', 'varchar(max)') as [error_number],
n.value('(data[@name="message"]/value)[1]', 'varchar(max)') as [error_message]
FROM @target_data.nodes('RingBufferTarget/event') AS q(n)
--WHERE not n.value('(data[@name="object_name"]/value)[1]', 'varchar(max)') like '%sp_reset_connection%'
ORDER BY n.value('(@timestamp)[1]', 'datetime2') asc


Extended Events Captured Data in the image below:

- Records in the green block indicate Server-1 failing over to Server-0. Server-1 becoming the SECONDARY Node.

- Records in the purple block indicate Server-0 failing over to Server-1. Server-1 becoming the PRIMARY Node.






Stop Event Session, Drop Event Target, Drop Event Session:

If Exists (Select Name FROM sys.dm_xe_sessions where name = 'EE-AlwaysOn-AvaliabilityGroup')
Begin
ALTER EVENT SESSION [EE-AlwaysOn-AvaliabilityGroup] ON SERVER STATE = STOP;

ALTER EVENT SESSION [EE-AlwaysOn-AvaliabilityGroup] ON SERVER DROP TARGET package0.ring_buffer

DROP EVENT SESSION [EE-AlwaysOn-AvaliabilityGroup] ON SERVER
End


Note: If you need the detailed Technical Specification on the Azure Cloud 3-Node Cluster setup see here



That's it!