Automating GitHub to Azure DevOps Backups with Bash and Pipelines

Page content

Automating GitHub to Azure DevOps Backups with Bash and Pipelines

Keeping critical source code safe is a constant priority. While GitHub is typically reliable, it can be important for organizations to maintain an additional backup of their repositories in a separate platform—like Azure DevOps—in case of issues such as accidental repository deletions, access problems, or unexpected downtime.

In this post, I will walk you through how to automate backing up GitHub repositories into Azure DevOps using:

  • A Bash script (to handle repository cloning and mirroring).
  • An Azure DevOps pipeline (to run the script on a schedule or on demand).

1. Overview of the Backup Approach

Bash Script

  1. Fetch repository names from GitHub using the GitHub API and personal access token (PAT).
  2. Exclude certain repositories if desired (e.g., test repos).
  3. Check if each repository already exists in Azure DevOps.
  4. Create new repositories in Azure DevOps if they do not exist.
  5. Clone (mirror) the GitHub repo, including large files (LFS).
  6. Push the mirrored repository to Azure DevOps.
  7. Clean up any temporary files and directories.

Azure DevOps Pipeline

  1. Trigger the script on a schedule (or on demand).
  2. Use environment variables securely stored in an Azure DevOps Variable Group.
  3. Run the script in an Ubuntu-based agent to automate the backups regularly.

By the end of this setup, Azure DevOps will function as an automatic, up-to-date mirror of your GitHub repositories.


2. Prerequisites

  1. GitHub account with a Personal Access Token (PAT).
    • You’ll need sufficient permissions (e.g., repo scope) to clone and read repositories.
  2. Azure DevOps account with a project set up.
    • You’ll need a PAT with permissions to create repositories in Azure DevOps.
  3. Bash shell (the script is designed for Unix/Linux environments).
  4. Utilities: curl, jq, git, git-lfs must be installed (the script will attempt to install them if missing, assuming you have sudo privileges).
  5. Azure DevOps Pipeline set up to run the script.
    • You can store your tokens (GITHUB_PAT, AZURE_PAT) as secure pipeline variables or in a Variable Group for added security.

3. The Backup Script

Below is the sample Bash script (github_backup.sh). It automates the repository cloning from GitHub and mirrors them into Azure DevOps. Adjust the environment variable defaults or pass them in at runtime through your pipeline or local shell.

#!/bin/bash

# Check environment variables or set defaults
GITHUB_USER="${GITHUB_USER:-your_github_username}"
GITHUB_PAT="${GITHUB_PAT:-your_github_token}"
AZURE_ORG="${AZURE_ORG:-your_azure_organization}"
AZURE_PROJECT="${AZURE_PROJECT:-your_azure_project}"
AZURE_PAT="${AZURE_PAT:-your_azure_devops_token}"
WORKDIR="${WORKDIR:-./gitbackup}"

# create a working directory
mkdir -p "$WORKDIR"
cd "$WORKDIR"

# Define repositories to exclude (space-separated)
exclude_repos="Test1 Test2"

# Convert the exclusion list into an array
IFS=' ' read -r -a exclude_array <<< "$exclude_repos"

# Function to check and install required packages
check_install() {
    for pkg in "$@"; do
        if ! apt list --installed 2>&1 | grep -v "WARNING: apt does not have a stable CLI interface" | grep -q "^$pkg/"; then
            echo "$pkg is not installed. Installing..."
            sudo apt update 2>&1 | grep -v "WARNING: apt does not have a stable CLI interface"
            sudo apt install -y $pkg 2>&1 | grep -v "WARNING: apt does not have a stable CLI interface. Use with caution in scripts."
        else
            echo "$pkg is already installed."
        fi
    done
}

# Install required packages
echo "Checking and installing required packages..."
required_packages=("git" "curl" "jq" "git-lfs")
check_install "${required_packages[@]}"

# Initialize an empty string to store all repository names
all_repos=""

# Page counter
page=1

# Loop to fetch all repository names, handling pagination
while : ; do
  # Fetch the current page of repositories
  response=$(curl -s -H "Authorization: token $GITHUB_PAT" "https://api.github.com/user/repos?type=all&per_page=100&page=$page")
  # Check if the response contains any repositories
  count=$(echo "$response" | jq '. | length')
  # Extract repository names and append to the all_repos string
  repo_names=$(echo "$response" | jq -r '.[] | .name')
  all_repos+="$repo_names\n"  # Append each repo name followed by a newline
  # Increment the page number
  ((page++))
  # Break the loop if the current page contains less than 100 repositories
  [ "$count" -lt 100 ] && break
done

# Filter out excluded repositories
for exclude in "${exclude_array[@]}"; do
    all_repos=$(echo -e "$all_repos" | grep -v "^$exclude$")
done

# Print all repository names (for logging/debugging)
echo -e "$all_repos"

# Azure DevOps base API URL
azure_api_url="https://dev.azure.com/$AZURE_ORG/$AZURE_PROJECT/_apis/git/repositories?api-version=7.2-preview.1"

# Backup each repository
for repo_name in $all_repos; do
    echo "Processing $repo_name"

    # Check if repo exists in Azure DevOps
    response=$(curl -s -u :$AZURE_PAT "$azure_api_url")
    echo "$response" | grep -q "\"name\":\"$repo_name\""
    if [ $? -eq 0 ]; then
        echo "Repository $repo_name exists in Azure DevOps."
    else
        echo "Repository $repo_name does not exist in Azure DevOps, creating..."
        curl -s -X POST -u :$AZURE_PAT -H "Content-Type: application/json" -d "{\"name\":\"$repo_name\"}" $azure_api_url
    fi

    # Clone the repository (mirror)
    git_clone_url="https://$GITHUB_PAT@github.com/$GITHUB_USER/$repo_name"
    git clone --mirror $git_clone_url
    if [ $? -eq 0 ]; then
        cd $repo_name.git

        # Setup LFS if needed
        if git lfs ls-files | grep -q '.*'; then
            git lfs install
            git config http.version HTTP/1.1
            git config lfs.dialtimeout 120
            git config lfs.activitytimeout 120
            git lfs fetch --all
        fi

        # Adding Azure DevOps remote
        azure_repo_url="https://$AZURE_PAT@dev.azure.com/$AZURE_ORG/$AZURE_PROJECT/_git/$repo_name"
        git remote add azure $azure_repo_url

        # Push everything including LFS objects to Azure DevOps
        git push azure --mirror
        if [ "$(git lfs ls-files | wc -l)" -gt 0 ]; then
            git lfs push azure --all
        fi

        cd ..
        rm -rf $repo_name.git  # Clean up
    else
        echo "Failed to clone $repo_name"
    fi
done

# Clean up working directory
cd ..
rm -rf "$WORKDIR"

What the Script Does

  1. Checks for required packages (git, curl, jq, git-lfs). If any are missing, it attempts to install them.
  2. Fetches repos from your GitHub account, handling pagination if you have many repositories.
  3. Excludes certain repositories (defined in exclude_repos).
  4. Creates the same-named repository in Azure DevOps if it doesn’t already exist.
  5. Clones (mirrors) the repository from GitHub and pushes it to Azure DevOps, including LFS objects if present.
  6. Removes the local mirrored copy to keep storage usage down.

4. Azure DevOps Pipeline Configuration

Below is an example of an Azure DevOps pipeline (azure-pipelines.yml) that triggers this script. You can place the pipeline file in the root of your repository containing the script. Then you can configure an Azure DevOps Pipeline in your project to pick it up.

trigger:
- none

schedules:
- cron: "30 20 * * *"
  displayName: 'Every day at 6am'
  branches:
    include:
    - main
  always: true

pool:
  vmImage: ubuntu-latest

variables:
- group: DevOpsSettings

steps:
- script: |
    echo "---------------------"
    # Run bash script
    chmod +x ./Scripts/github_backup.sh
    ./Scripts/github_backup.sh
    echo "---------------------"    
  env:
    # Map the pipeline variable to the script environment variable
    AZURE_ORG: $(AZURE_ORG)
    AZURE_PAT: $(AZURE_PAT)
    AZURE_PROJECT: $(AZURE_PROJECT)
    GITHUB_PAT: $(GITHUB_PAT)
    GITHUB_USER: $(GITHUB_USER)
    WORKDIR: $(WORKDIR)
  displayName: 'GitHub Backup to Azure DevOps'

Explanation of Key Sections

  1. Schedules: The YAML example above uses a cron schedule that runs every day at 20:30 UTC (which is set to 6 AM your local time in the display name, but adjust as needed). This ensures your backup script executes daily.
  2. Variables:
    • We reference a Variable Group named DevOpsSettings. This Variable Group might contain AZURE_ORG, AZURE_PAT, AZURE_PROJECT, GITHUB_PAT, and GITHUB_USER, which are used in our script.
    • You can also define them directly in the pipeline as pipeline variables or as secret variables.
  3. steps: Runs the script by:
    • Setting executable permissions (chmod +x).
    • Executing github_backup.sh.
  4. env: Maps the pipeline variables to the script environment variables. This ensures that your tokens and other info are passed to the script correctly.

5. Setting Up Secure Variables

You’ll need to store your tokens (both GitHub and Azure DevOps) securely:

  1. Azure DevOpsPipelinesLibraryVariable Groups.
  2. Create a new Variable Group (e.g., named DevOpsSettings), add your variables, and mark sensitive ones like GITHUB_PAT and AZURE_PAT as secret.
  3. Reference the Variable Group in your pipeline YAML (the - group: DevOpsSettings line).

6. Running the Backup

On Demand

To run the backup immediately, go to your Azure DevOps Pipeline → Run pipeline → select the appropriate branch. Your pipeline will execute the script, pulling down all GitHub repos and mirroring them into Azure DevOps.

Scheduled

The cron setting in your pipeline configuration will run the script on a defined schedule (once daily in the example above). Logs will be stored in Azure DevOps, allowing you to verify that each backup step completed successfully.


7. Verifying the Backup

  1. Check Azure DevOpsRepos: You should see the repositories created or updated with recent commits from GitHub.
  2. Inspect the pipeline logs to confirm that each repository was cloned and pushed successfully.
  3. Look for errors in the pipeline logs:
    • If any repository fails, the script will log “Failed to clone” or indicate if LFS push was unsuccessful.

8. Customizing the Script

You may want to modify the script’s behavior. Common customizations include:

  • Changing the working directory (WORKDIR) to match your environment.
  • Filtering additional repositories that you might not want to back up.
  • Adjusting the installation logic if you’re on a system where sudo apt install is not available (e.g., non-Debian-based Linux).

If your repositories rely on submodules, you’ll need to adapt the script to handle --recurse-submodules or other submodule-specific logic since the example above does not address them.


9. Conclusion

By combining a Bash script with an Azure DevOps pipeline, you can reliably back up your GitHub repositories to Azure DevOps on a schedule or whenever you choose. This ensures you have a secondary, version-controlled copy of your code in another platform—providing extra peace of mind and mitigation against unexpected GitHub outages or accidents.

Next Steps:

  • Test the script on a single repository before rolling it out to all repos.
  • Maintain your tokens properly, and consider employing secret scanning or Azure Key Vault for enhanced security.
  • Check your pipeline logs regularly to confirm backups are successful.

Feel free to contribute or adapt this script to your needs. Having a robust backup strategy is crucial in modern DevOps practices, and this approach is a relatively simple, flexible way to keep your GitHub repos safe in Azure DevOps.