Building a Smart Offsite Sync and MD5 Verification Script with AzCopy
When working with cloud storage, particularly when syncing files to Azure Blob Storage, ensuring the integrity of the files is very important. Azure's AzCopy
tool offers powerful features for syncing files between a local directory and Azure Blob Storage. However, while AzCopy
handles transfers efficiently, adding a layer of file integrity verification using MD5 hashes helps ensure the files were transferred correctly.
In this post, I will walk you through how I built a smart sync-and-verify backup script using AzCopy
. The script syncs files to Azure Blob Storage and verifies the integrity of changed files by comparing MD5 hashes between local and Azure-stored files.
Problem Statement
When syncing large directories of files to cloud storage, especially during backups, it is essential to verify that transferred files are identical to the local files by comparing their MD5 hashes.
Solution
By integrating AzCopy
's sync functionality with a custom shell script, we can:
- Sync files from a local directory to Azure Blob Storage.
- Extract details of the transferred files from the
AzCopy
job logs. - Perform MD5 verification to ensure the files are consistent on both ends.
Tools Used
- AzCopy: A command-line tool that helps manage and transfer data to/from Azure Blob Storage.
- Bash: The script was built in Bash, enabling seamless integration with the Linux environment.
- Azure CLI: Used to retrieve MD5 hashes from Azure Blob Storage.
Shell Script Workflow
Step 1: Setting Up Variables
Before diving into the logic, we define key variables, including the local directory to be synced, Azure Storage account details, and the location for logs. Here's an example setup:
STORAGE_ACCOUNT="myStorageAccount"
STORAGE_CONTAINER="backup"
LOCAL_DIR="/media/backup"
LOG_FILE="/var/log/azure_backup_md5_check_$(date +%Y%m%d).log"
You need to make sure to export $AZURE_BACKUP_SAS_TOKEN and $AZURE_BACKUP_ACCOUNT_KEY to the users environment to authenticate with the Azure Blob Storage.
Step 2: Syncing Files Using AzCopy
The script uses the AzCopy sync
command to upload files that have changed or are new in the local directory. Here's the command that performs the sync:
sync_output=$(azcopy sync "$LOCAL_DIR" "https://$STORAGE_ACCOUNT.blob.core.windows.net/$STORAGE_CONTAINER?$AZURE_BACKUP_SAS_TOKEN" --delete-destination=true --put-md5)
After the sync, the script extracts the job ID from the AzCopy
output:
job_id=$(echo "$sync_output" | awk '/Job/{print $2; exit}')
This job ID is crucial for locating the corresponding job log file.
Step 3: Parsing the AzCopy Job Log
The job log provides details of the files that were transferred. We need to capture lines that indicate the start of a file transfer and extract the local source file path:
while IFS= read -r line; do
local_file=$(echo "$line" | grep -oP 'Source "\K[^"]+')
changed_files+=("$local_file")
done < <(sudo grep "Starting transfer:" "$log_file")
By using this method, we ensure we only consider the files that were actually transferred during the sync.
Step 4: MD5 Hash Verification
Once we have the list of changed files, we compare the MD5 hash of each file locally and on Azure Blob Storage. To get the MD5 hash of the local file, we use the md5sum
command:
calculate_md5() {
local file="$1"
md5sum "$file" | awk '{print $1}'
}
For the Azure Blob Storage file, we query the ContentMD5
property using the Azure CLI:
get_blob_md5() {
local blob_name="$1"
az storage blob show --account-name "$STORAGE_ACCOUNT" --container-name "$STORAGE_CONTAINER" --name "$blob_name" \
--query properties.contentSettings.contentMd5 --output tsv --account-key "$AZURE_BACKUP_ACCOUNT_KEY"
}
azure_md5=$(get_blob_md5 "$blob_name")
azure_md5_hex=$(echo "$azure_md5" | base64 --decode | xxd -p)
We then compare the two hashes and log any mismatches.
Step 5: Logging and Summary
The script logs the entire process, ensuring transparency in file transfers and MD5 verifications. A summary is printed at the end, detailing which files matched and which had mismatches:
log_message "Summary:"
if [ ${#mismatched_files[@]} -eq 0 ]; then
log_message "✅ All changed files matched successfully!"
else
log_message "❌ The following changed files had MD5 mismatches:"
for file in "${mismatched_files[@]}"; do
log_message " - $file"
done
fi
Complete Script
Here’s the complete script after incorporating the sync, log parsing, and MD5 verification logic.
This project demonstrates how combining AzCopy
's powerful sync functionality with Bash scripting can result in a robust solution for transferring and verifying file integrity in Azure Blob Storage. By leveraging job logs, we can accurately identify changed files and ensure that all transferred files match their local counterparts using MD5 verification.
Comments ()