GHCPCLI-Local: Running GitHub Copilot CLI on Your Own Hardware

Page content

GHCPCLI

GHCPCLI-Local: Running GitHub Copilot CLI on Your Own Hardware

The primary purpose of GHCPCLI-Local is to run the GitHub Copilot CLI entirely on your own hardware, wired to a local LLM backend, so no prompts, code, or telemetry ever leave your network.

Purpose

The GitHub Copilot CLI is a capable terminal assistant, but by default every request travels to GitHub’s cloud inference and telemetry endpoints. For anyone working under data-residency rules, on an air-gapped network, or who simply prefers to keep their code on their own machine, that is a problem.

GHCPCLI-Local solves it without forking or patching anything. It is a thin wrapper script that sets the COPILOT_PROVIDER_* environment variables the Copilot CLI already reads at startup, then hands off to the unmodified copilot binary. Every Copilot CLI feature works as normal - it is just powered by a model running on your own GPU or CPU.

TL;DR: install the Copilot CLI, point GHCPCLI-Local at Ollama, llama.cpp, or LM Studio, and you have a fully local Copilot with optional self-hosted web search.

Source Code

The project is on GitHub: https://github.com/CraigWilsonOZ/GHCPCLI-Local

Use Cases

  • Keeping source code and prompts on regulated or air-gapped networks where cloud inference is not permitted
  • Running Copilot CLI offline or in low-connectivity environments
  • Experimenting with different open-weight models (Qwen, Llama, and others) behind a familiar tool
  • Reducing reliance on cloud inference for cost or privacy reasons while keeping the Copilot workflow

Prerequisites

A GitHub Copilot subscription is still required - the CLI binary is licensed by GitHub, and this project only redirects its inference calls. Individual, Business, and Enterprise plans all work.

Tool Purpose Required
copilot CLI The AI assistant being wrapped Yes
bash 4.0+ Runs all scripts Yes
curl Backend health checks Yes
python3 3.11+ and uv SearXNG MCP server runtime For web search
docker + docker compose Runs the SearXNG container stack For web search
Ollama, llama.cpp, or LM Studio Local LLM backend At least one

A GPU is strongly recommended for models 14B and above. As a rough guide, a 7-8B model needs 6-8 GB of VRAM, a 32-35B model needs 20-24 GB, and a 70B model needs 40 GB or more.

How It Works

  1. setup.sh (or setup.ps1 on Windows) checks the required tools, prompts for backend URLs and default models, and installs the SearXNG MCP server.
  2. copilot-local.sh loads your .env file with the endpoint URLs and default models.
  3. The script validates that the chosen backend is reachable before launching.
  4. It checks the model supports tool use and has a sufficient context window.
  5. It exports the COPILOT_PROVIDER_* environment variables the Copilot CLI reads.
  6. It sets COPILOT_OFFLINE=true so the CLI cannot reach GitHub telemetry or authentication endpoints.
  7. It calls exec copilot, replacing the shell with the Copilot CLI process.
  8. Copilot talks to your local LLM server over an OpenAI-compatible HTTP API, and inference runs on your own CPU or GPU.

With SearXNG configured in .mcp.json, Copilot can call a search tool to retrieve web results. The MCP server is started and managed by Copilot CLI automatically over the stdio transport declared in .mcp.json.

Architecture

copilot-local.sh
    |
    +-- Loads .env  (endpoint URLs and default models)
    +-- Validates backend is reachable
    +-- Checks model supports tool use and context window
    +-- Exports COPILOT_PROVIDER_* environment variables
    |
    +-- exec copilot  (replaces this shell with the Copilot CLI)
              |
              |  OpenAI-compatible API (HTTP)
              v
    Local LLM server  (Ollama / llama.cpp / LM Studio)
              |
              |  model weights on local disk
              v
    LLM inference  (runs on your CPU/GPU)

The repository also ships a Python MCP server (SearXNG-MCP/) that exposes a search() tool, and a self-hosted SearXNG Docker stack (searxng/) for private web search with no third-party search API.

Getting Started

On Linux:

git clone https://github.com/CraigWilsonOZ/GHCPCLI-Local.git
cd GHCPCLI-Local
./setup.sh

# Start SearXNG (optional)
cd searxng && ./manage.sh start && cd ..

# Start your LLM backend
ollama serve

# Launch Copilot against a local model
./copilot-local.sh --backend ollama --model qwen3.6:35b

The Windows flow is identical using setup.ps1 and copilot-local.ps1. If .env already exists, the configuration step is skipped - delete it and re-run setup to reconfigure.

Security Considerations

  • COPILOT_OFFLINE=true is set by default, so the CLI cannot reach GitHub telemetry or authentication endpoints.
  • All inference runs on your hardware or local network - no prompts or code leave your network.
  • API keys should be passed via COPILOT_PROVIDER_API_KEY in the shell environment, never on the command line with --api-key.
  • The SearXNG secret key must be a unique random value, generated with ./searxng/manage.sh secret-key.

Limitations

  • A GitHub Copilot subscription is still required - this project redirects inference, it does not replace the licensed CLI.
  • Local model quality and speed depend entirely on your hardware; smaller models will not match cloud-hosted frontier models.
  • The chosen model must support tool use and a large enough context window for Copilot’s agentic workflow.
  • Web search requires the additional Docker, Python, and uv dependencies for the SearXNG stack.

Future Work

  • Broader backend coverage as more OpenAI-compatible local servers mature
  • Model recommendation guidance tuned to common GPU tiers
  • Tighter validation and clearer diagnostics when a model lacks tool-use support

Conclusion

GHCPCLI-Local keeps the GitHub Copilot CLI workflow developers already know while moving every inference call onto hardware you control. It is a small, transparent wrapper rather than a fork, which makes it easy to audit and easy to drop. If you want the convenience of Copilot in the terminal without sending your code to the cloud, this is a practical way to get there.

References