How to Troubleshoot Scheduled Snapshots issues

How to Troubleshoot Scheduled Snapshots issues

book

Article ID: CTX224169

calendar_today

Updated On:

Description

This article provides general guidance to troubleshoot VM Scheduled Snapshot (VMSS) issues.
 


Instructions

VM Scheduled Snapshot (VMSS) feature enables customers to configure their environment to automatically create VM snapshots at specified intervals.

How VMSS works

You can configure snapshot schedule policies by specifying involved VMs, snapshot type, frequency, schedule, etc. VMSS plugin will periodically go through all the policies to determine whether there is a need to create new VM snapshots. It involves three main components:
  • XAPI
  • VMSS plugin
  • Cron job
XAPI Changes
The XAPI database is extended for VMSS with new parameters:
Parameter NameDescription & Value
name-labelName label for VMSS
name-descriptionDescription for VMSS
enabledEnable/Disable VMSS to take snapshot: true/false
typeType of snapshot VMSS takes:
  • snapshot: Disk only snapshot
  • snapshot_with_quiesce: Quiesce the VM before taking snapshot (Windows only)
  • checkpoint: Disk and memory snapshot
retained-snapshotsScheduled snapshots to be retained. Range: 1 to 10
frequencyFrequency of taking snapshot of VMs:
  • hourly
  • daily
  • weekly
schedule
  • hour: 0 to 23
  • min: [0;15;30;45]
  • days: [Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday]
last-run-timeDate Time of last execution of VMSS
VMsVMs in the snapshot schedule VM UUID
VM record is also extended with following two parameters:
Parameter NameDescription & Value
snapshot-scheduleThe VMSS schedule policy a VM belongs to: <VMSS_UUID>
is-vmss-snapshotIf a snapshot was created from VMSS: true/false
With these changes, your configured snapshot schedules (policies) will be stored in XAPI database. You can use XAPI command line (CLI) xe vmss-list to check the details of a snapshot schedule policy, for example:
User-added image


VMSS Plugin
VMSS is implemented as XAPI plugin under /etc/xapi.d/plugins/vmss and logs to /var/log/VMSSlog. The log entries in VMSSlog well demonstrate the work flow:
  1. It goes through all the scheduled snapshot policies in the pool and check if any of them are due.
May 21 10:15:01 xrtuk-09-11 VMSS: [10123] No of Policies: 2
May 21 10:15:01 xrtuk-09-11 VMSS: [10123] Max number of threads allocated for policy : 1
May 21 10:15:01 xrtuk-09-11 VMSS: [10123] Policy Batch: 0
May 21 10:15:01 xrtuk-09-11 VMSS: [10123] Not processing policy: a42a265c-1490-f9ff-2076-ba1cdc4fd6ce
May 21 10:15:01 xrtuk-09-11 VMSS: [10123] Policy Batch:1
May 21 10:15:01 xrtuk-09-11 VMSS: [10123] Processing policy: 7b10ea29-ab0c-af14-a116-ae293560f404
 
  1. If a snapshot schedule policy is due, VMSS will go through all the VM objects in XAPI associated with this scheduled snapshot policy one by one and create a new snapshot.
  • If the snapshot operation fails, it will create a notification alert for the event and move to the next VM.
  • It checks if an older snapshot now needs to be deleted to comply with the retained snapshots defined in the scheduled policy.
    • If there is a need to delete any existing snapshots, it will delete the oldest snapshot created via scheduled policy.
May 21 10:15:01 xrtuk-09-11 VMSS: [10123] No of VMs: 2
May 21 10:15:01 xrtuk-09-11 VMSS: [10123] Max number of threads for processing VM: 1
May 21 10:15:01 xrtuk-09-11 VMSS: [10123] VM Batch: 0
May 21 10:15:01 xrtuk-09-11 VMSS: [10123] Processing VM: e2a06f32-d4ad-316f-ef20-4e4541a5a453
May 21 10:16:40 xrtuk-09-11 VMSS: [10123] Snapshot retention value reached for VM: e2a06f32-d4ad-316f-ef20-4e4541a5a453. Deleting the oldest snapshot: 8da13c01-e6f9-fd9f-9413-941c09f9ff56
May 21 10:16:44 xrtuk-09-11 VMSS: [10123] Completed processing VM: e2a06f32-d4ad-316f-ef20-4e4541a5a453
May 21 10:16:44 xrtuk-09-11 VMSS: [10123] VM Batch: 1
May 21 10:16:44 xrtuk-09-11 VMSS: [10123] Processing VM: 0e268e37-cbdc-5b6d-8098-89e740337c2a
May 21 10:17:59 xrtuk-09-11 VMSS: [10123] Snapshot retention value reached for VM: 0e268e37-cbdc-5b6d-8098-89e740337c2a. Deleting the oldest snapshot: f0e0aff6-14e5-47ba-2176-afd7f96bbb89
May 21 10:18:03 xrtuk-09-11 VMSS: [10123] Completed processing VM: 0e268e37-cbdc-5b6d-8098-89e740337c2a
  1. After processing all VMs in the policy, it sets the last-run timestamp in the scheduled policy.
May 21 10:18:03 xrtuk-09-11 VMSS: [10123] snapshot start time: 2017-05-21 10:15:01.899450, end time: 2017-05-21 10:18:03.363721, Last expected run time: 2017-05-21 10:15:00
May 21 10:18:03 xrtuk-09-11 VMSS: [10123] RETURN in create_structured_alert: <XCData><time>2017-05-21 10:18:03.369298</time><messagetype>info</messagetype><message>VMSS_SNAPSHOT_SUCCEEDED</message></XCData>
May 21 10:18:03 xrtuk-09-11 VMSS: [10123] Completed processing policy: 7b10ea29-ab0c-af14-a116-ae293560f404

Cron Job
There is a cron job for VMSS which triggers VMSS process every 15 minutes. Note that there will be only one snapshot schedule running at a time. If previous VMSS schedule is not completed, then the new run will be skipped.
The following log entry in VMSSlog is the entry point for cron job:
May 21 10:15:01 xrtuk-09-11 VMSS: [10097] ===Kicking cron job for VMSS===
You can find the cron job defined at /etc/cron.d/vmss.cron:
*/15 * * * * root python -c 'import imp; vmss = imp.load_source("vmss", "/etc/xapi.d/plugins/vmss"); vmss.trigger_schedule_snapshots();'
You can also check the cron job history from /var/log/cron:
May 18 14:00:01 xrtuk-12-07 CROND[17114]: (root) CMD (python -c 'import imp; vmss = imp.load_source("vmss", "/etc/xapi.d/plugins/vmss"); vmss.trigger_schedule_snapshots();')

In summary, the scheduled snapshot feature works as follows:
  1. XAPI database stores user configured scheduled snapshot policies
  2. Cron job triggers VMSS plugin every 15 minutes;
  3. VMSS plugin goes through all snapshot schedule policies and processes them one by one.
How to Check Run History of Snapshot Schedules (Policies)
You can check the run history of snapshot schedules from XenCenter Snapshot schedules wizard by expanding Show Run History of a specific snapshot schedule:
User-added image
Or you can check from XenCenter Alerts:
User-added image

Possible Issues and Useful Logs
The possible issues you may encounter with Scheduled Snapshots feature are:
  • VMSS plugin could not login to XAPI session
VMSS is implemented as XAPI plugin, it needs to login to XAPI session to function. If the login fails, you’ll get VMSS_XAPI_LOGON_FAILURE error in /var/log/VMSSlog. You can try to use command xe-toolstack-restart to see if issue can be resolved, or further check /var/log/xensource.log to see if anything wrong with XAPI service.
  • Snapshot related Issues
The snapshot operation may fail due to multiple reasons, for example, SR backend failure, VSS related issues, etc.
Normally you’ll get more detailed error information from XenCenter Alerts and
VMSS_SNAPSHOT_FAILED error in /var/log/VMSSlog.
User-added image

At this point, you can refer to detailed error information and further check xensource.log and SMlog for possible root causes.
  • Snapshot schedule related issues
  1. Cron job triggers VMSS process every 15 minutes. If a customer sets many policies or containing a lot of VMs that the snapshot operations can’t finish in 15 minutes, then the next run of VMSS will be skipped. You may get the following log entries in VMSSlog:
  • 'VMSS_SNAPSHOT_LOCK_FAILED': 'The snapshot phase is already executing for this snapshot policy. Please try again later'.
  • 'VMSS_SNAPSHOT_MISSED_EVENT': 'A scheduled snapshot event was missed due to another on-going scheduled snapshot run. This is unexpected behaviour, please re-configure your snapshot sub-policy'.
In this situation, you need to check the snapshot policies and properly re-arrange the schedule.
  1. In very rare cases, the cron job may not kick off as expected. You can check /var/log/cron to confirm if the cron job was kicked off.
In summary, you can refer to following logs to troubleshoot Scheduled Snapshots related issues:
  • XenCenter Alerts
  • /var/log/VMSSlog
  • /var/log/xensource.log
  • /var/log/SMlog
  • /var/log/cron
Useful commands
With the extension of XAPI, there are several new xe command lines can be used for VMSS. The commonly used ones are:
  • xe vmss-list: List all VMSSs, filtering on the optional arguments.
  • xe vmss-create: Create a VM snapshot schedule
  • xe vmss-destroy: Destroy a VM snapshot schedule
  • xe vmss-param-set: Sets the parameter specified, can be used to edit a VM snapshot schedule
You can use xe help <command_name> to get detailed usage of these commands. Using these commands flexibly can help us perform some tasks efficiently.
For example, if you want to quickly identify which snapshot schedule a VM belongs to, you can not only check it from XenCenter but also using xe CLIs:
Option 1: Check from XenCenter
  1. Select the VM and click on the Snapshots tab, there is Show Details option in right top of this tab.
User-added image
  1. Expand Show Details option, the snapshot schedule that the VM belongs to is displayed on the right bottom of this page.
User-added image
Option 2: Check from xe CLI
There are two ways to check from xe CLI:
  1. Directly check using command xe vmss-list VMs:contains=<VM_UUID>, for example:
User-added image
  1. Check from VM side:
  • Use command xe vm-list params=snapshot-schedule uuid=<VM_UUID> to get snapshot schedule UUID (VMSS_UUID).
  • Then you can get more details of this schedule using command xe vmss-list uuid=<VMSS_UUID>, for example:
User-added image

 

 

Issue/Introduction

This article provides general guidance to troubleshoot VM Scheduled Snapshot (VMSS) related issues.

Additional Information

How to Create Scheduled Snapshots
How to Manage Scheduled Snapshots
How to Revert VMs to Scheduled Snapshots