Automating GitHub to Azure DevOps Backups with Bash and Pipelines
Automating GitHub to Azure DevOps Backups with Bash and Pipelines
Keeping critical source code safe is a constant priority. While GitHub is typically reliable, it can be important for organizations to maintain an additional backup of their repositories in a separate platform—like Azure DevOps—in case of issues such as accidental repository deletions, access problems, or unexpected downtime.
In this post, I will walk you through how to automate backing up GitHub repositories into Azure DevOps using:
- A Bash script (to handle repository cloning and mirroring).
- An Azure DevOps pipeline (to run the script on a schedule or on demand).
1. Overview of the Backup Approach
Bash Script
- Fetch repository names from GitHub using the GitHub API and personal access token (PAT).
- Exclude certain repositories if desired (e.g., test repos).
- Check if each repository already exists in Azure DevOps.
- Create new repositories in Azure DevOps if they do not exist.
- Clone (mirror) the GitHub repo, including large files (LFS).
- Push the mirrored repository to Azure DevOps.
- Clean up any temporary files and directories.
Azure DevOps Pipeline
- Trigger the script on a schedule (or on demand).
- Use environment variables securely stored in an Azure DevOps Variable Group.
- Run the script in an Ubuntu-based agent to automate the backups regularly.
By the end of this setup, Azure DevOps will function as an automatic, up-to-date mirror of your GitHub repositories.
2. Prerequisites
- GitHub account with a Personal Access Token (PAT).
- You’ll need sufficient permissions (e.g.,
repo
scope) to clone and read repositories.
- You’ll need sufficient permissions (e.g.,
- Azure DevOps account with a project set up.
- You’ll need a PAT with permissions to create repositories in Azure DevOps.
- Bash shell (the script is designed for Unix/Linux environments).
- Utilities:
curl
,jq
,git
,git-lfs
must be installed (the script will attempt to install them if missing, assuming you havesudo
privileges). - Azure DevOps Pipeline set up to run the script.
- You can store your tokens (
GITHUB_PAT
,AZURE_PAT
) as secure pipeline variables or in a Variable Group for added security.
- You can store your tokens (
3. The Backup Script
Below is the sample Bash script (github_backup.sh
). It automates the repository cloning from GitHub and mirrors them into Azure DevOps. Adjust the environment variable defaults or pass them in at runtime through your pipeline or local shell.
#!/bin/bash
# Check environment variables or set defaults
GITHUB_USER="${GITHUB_USER:-your_github_username}"
GITHUB_PAT="${GITHUB_PAT:-your_github_token}"
AZURE_ORG="${AZURE_ORG:-your_azure_organization}"
AZURE_PROJECT="${AZURE_PROJECT:-your_azure_project}"
AZURE_PAT="${AZURE_PAT:-your_azure_devops_token}"
WORKDIR="${WORKDIR:-./gitbackup}"
# create a working directory
mkdir -p "$WORKDIR"
cd "$WORKDIR"
# Define repositories to exclude (space-separated)
exclude_repos="Test1 Test2"
# Convert the exclusion list into an array
IFS=' ' read -r -a exclude_array <<< "$exclude_repos"
# Function to check and install required packages
check_install() {
for pkg in "$@"; do
if ! apt list --installed 2>&1 | grep -v "WARNING: apt does not have a stable CLI interface" | grep -q "^$pkg/"; then
echo "$pkg is not installed. Installing..."
sudo apt update 2>&1 | grep -v "WARNING: apt does not have a stable CLI interface"
sudo apt install -y $pkg 2>&1 | grep -v "WARNING: apt does not have a stable CLI interface. Use with caution in scripts."
else
echo "$pkg is already installed."
fi
done
}
# Install required packages
echo "Checking and installing required packages..."
required_packages=("git" "curl" "jq" "git-lfs")
check_install "${required_packages[@]}"
# Initialize an empty string to store all repository names
all_repos=""
# Page counter
page=1
# Loop to fetch all repository names, handling pagination
while : ; do
# Fetch the current page of repositories
response=$(curl -s -H "Authorization: token $GITHUB_PAT" "https://api.github.com/user/repos?type=all&per_page=100&page=$page")
# Check if the response contains any repositories
count=$(echo "$response" | jq '. | length')
# Extract repository names and append to the all_repos string
repo_names=$(echo "$response" | jq -r '.[] | .name')
all_repos+="$repo_names\n" # Append each repo name followed by a newline
# Increment the page number
((page++))
# Break the loop if the current page contains less than 100 repositories
[ "$count" -lt 100 ] && break
done
# Filter out excluded repositories
for exclude in "${exclude_array[@]}"; do
all_repos=$(echo -e "$all_repos" | grep -v "^$exclude$")
done
# Print all repository names (for logging/debugging)
echo -e "$all_repos"
# Azure DevOps base API URL
azure_api_url="https://dev.azure.com/$AZURE_ORG/$AZURE_PROJECT/_apis/git/repositories?api-version=7.2-preview.1"
# Backup each repository
for repo_name in $all_repos; do
echo "Processing $repo_name"
# Check if repo exists in Azure DevOps
response=$(curl -s -u :$AZURE_PAT "$azure_api_url")
echo "$response" | grep -q "\"name\":\"$repo_name\""
if [ $? -eq 0 ]; then
echo "Repository $repo_name exists in Azure DevOps."
else
echo "Repository $repo_name does not exist in Azure DevOps, creating..."
curl -s -X POST -u :$AZURE_PAT -H "Content-Type: application/json" -d "{\"name\":\"$repo_name\"}" $azure_api_url
fi
# Clone the repository (mirror)
git_clone_url="https://$GITHUB_PAT@github.com/$GITHUB_USER/$repo_name"
git clone --mirror $git_clone_url
if [ $? -eq 0 ]; then
cd $repo_name.git
# Setup LFS if needed
if git lfs ls-files | grep -q '.*'; then
git lfs install
git config http.version HTTP/1.1
git config lfs.dialtimeout 120
git config lfs.activitytimeout 120
git lfs fetch --all
fi
# Adding Azure DevOps remote
azure_repo_url="https://$AZURE_PAT@dev.azure.com/$AZURE_ORG/$AZURE_PROJECT/_git/$repo_name"
git remote add azure $azure_repo_url
# Push everything including LFS objects to Azure DevOps
git push azure --mirror
if [ "$(git lfs ls-files | wc -l)" -gt 0 ]; then
git lfs push azure --all
fi
cd ..
rm -rf $repo_name.git # Clean up
else
echo "Failed to clone $repo_name"
fi
done
# Clean up working directory
cd ..
rm -rf "$WORKDIR"
What the Script Does
- Checks for required packages (
git
,curl
,jq
,git-lfs
). If any are missing, it attempts to install them. - Fetches repos from your GitHub account, handling pagination if you have many repositories.
- Excludes certain repositories (defined in
exclude_repos
). - Creates the same-named repository in Azure DevOps if it doesn’t already exist.
- Clones (mirrors) the repository from GitHub and pushes it to Azure DevOps, including LFS objects if present.
- Removes the local mirrored copy to keep storage usage down.
4. Azure DevOps Pipeline Configuration
Below is an example of an Azure DevOps pipeline (azure-pipelines.yml
) that triggers this script. You can place the pipeline file in the root of your repository containing the script. Then you can configure an Azure DevOps Pipeline in your project to pick it up.
trigger:
- none
schedules:
- cron: "30 20 * * *"
displayName: 'Every day at 6am'
branches:
include:
- main
always: true
pool:
vmImage: ubuntu-latest
variables:
- group: DevOpsSettings
steps:
- script: |
echo "---------------------"
# Run bash script
chmod +x ./Scripts/github_backup.sh
./Scripts/github_backup.sh
echo "---------------------"
env:
# Map the pipeline variable to the script environment variable
AZURE_ORG: $(AZURE_ORG)
AZURE_PAT: $(AZURE_PAT)
AZURE_PROJECT: $(AZURE_PROJECT)
GITHUB_PAT: $(GITHUB_PAT)
GITHUB_USER: $(GITHUB_USER)
WORKDIR: $(WORKDIR)
displayName: 'GitHub Backup to Azure DevOps'
Explanation of Key Sections
- Schedules: The YAML example above uses a
cron
schedule that runs every day at 20:30 UTC (which is set to 6 AM your local time in the display name, but adjust as needed). This ensures your backup script executes daily. - Variables:
- We reference a Variable Group named
DevOpsSettings
. This Variable Group might containAZURE_ORG
,AZURE_PAT
,AZURE_PROJECT
,GITHUB_PAT
, andGITHUB_USER
, which are used in our script. - You can also define them directly in the pipeline as pipeline variables or as secret variables.
- We reference a Variable Group named
- steps: Runs the script by:
- Setting executable permissions (
chmod +x
). - Executing
github_backup.sh
.
- Setting executable permissions (
- env: Maps the pipeline variables to the script environment variables. This ensures that your tokens and other info are passed to the script correctly.
5. Setting Up Secure Variables
You’ll need to store your tokens (both GitHub and Azure DevOps) securely:
- Azure DevOps → Pipelines → Library → Variable Groups.
- Create a new Variable Group (e.g., named
DevOpsSettings
), add your variables, and mark sensitive ones likeGITHUB_PAT
andAZURE_PAT
as secret. - Reference the Variable Group in your pipeline YAML (the
- group: DevOpsSettings
line).
6. Running the Backup
On Demand
To run the backup immediately, go to your Azure DevOps Pipeline → Run pipeline → select the appropriate branch. Your pipeline will execute the script, pulling down all GitHub repos and mirroring them into Azure DevOps.
Scheduled
The cron
setting in your pipeline configuration will run the script on a defined schedule (once daily in the example above). Logs will be stored in Azure DevOps, allowing you to verify that each backup step completed successfully.
7. Verifying the Backup
- Check Azure DevOps → Repos: You should see the repositories created or updated with recent commits from GitHub.
- Inspect the pipeline logs to confirm that each repository was cloned and pushed successfully.
- Look for errors in the pipeline logs:
- If any repository fails, the script will log “Failed to clone” or indicate if LFS push was unsuccessful.
8. Customizing the Script
You may want to modify the script’s behavior. Common customizations include:
- Changing the working directory (
WORKDIR
) to match your environment. - Filtering additional repositories that you might not want to back up.
- Adjusting the installation logic if you’re on a system where
sudo apt install
is not available (e.g., non-Debian-based Linux).
If your repositories rely on submodules, you’ll need to adapt the script to handle --recurse-submodules
or other submodule-specific logic since the example above does not address them.
9. Conclusion
By combining a Bash script with an Azure DevOps pipeline, you can reliably back up your GitHub repositories to Azure DevOps on a schedule or whenever you choose. This ensures you have a secondary, version-controlled copy of your code in another platform—providing extra peace of mind and mitigation against unexpected GitHub outages or accidents.
Next Steps:
- Test the script on a single repository before rolling it out to all repos.
- Maintain your tokens properly, and consider employing secret scanning or Azure Key Vault for enhanced security.
- Check your pipeline logs regularly to confirm backups are successful.
Feel free to contribute or adapt this script to your needs. Having a robust backup strategy is crucial in modern DevOps practices, and this approach is a relatively simple, flexible way to keep your GitHub repos safe in Azure DevOps.