The Illusion of Ownership
Millions of developers trust GitHub with their most valuable creative work. But buried in its Terms of Service are provisions granting Microsoft remarkably broad rights over publicly hosted code — including the right to train commercial AI systems on it without attribution, compensation, or meaningful consent. Your open-source license is not a shield.
GitHub's TOS acknowledges that you retain copyright over uploaded code. In practice, that statement coexists with a much wider set of rights GitHub grants itself:
Highlighted: rights GitHub claims over your content
GitHub ToS — User-Generated Content
"You own content you create, but you allow us certain rights to it, so that we can display and share the content you post… the rights you grant us are limited to those we need to provide the service."
The phrase "provide the service" has quietly expanded to encompass AI product development. GitHub Copilot — sold to enterprise customers — was trained on all publicly available code on GitHub. That code includes yours.
Key problem: Your open-source license (MIT, GPL, Apache) may require attribution. Copilot reproduces code snippets verbatim without citing the original author or license. GitHub argues this falls within granted rights and fair use — a contested position most individual developers lack the resources to challenge.
Section D.8 — The AI Waiver Clause
In one of the most aggressive clauses in any major platform's TOS, Section D.8 doesn't merely permit AI training on your code — it compels you to waive your own protective terms:
Highlighted: rights you are forced to waive
GitHub ToS — Section D.8
"By using automated means to access, collect, or otherwise use any publicly accessible Content from the Service for the purpose of developing or training any commercially available artificial intelligence model… you hereby waive any and all policies, terms, conditions, or contractual provisions governing products, services, websites or datasets you own or operate that would otherwise prohibit, restrict, or place conditions upon GitHub's Access… You further agree not to impose technical or other targeted measures to restrict or retaliate against such Access."
Read that carefully. If GitHub decides to scrape your content for AI training, you are contractually prohibited from using technical means — robots.txt, rate limiting, access controls — to stop it. Any protective clauses in your own terms of service are pre-emptively voided.
Adding .github/copilot-ignore, custom TDM restriction language in your license, or a NoAI License variant does not protect you from GitHub's own AI systems under this clause.
Privacy: The Telemetry You Never Agreed To
GitHub collects substantial behavioural and personal data from every user interaction — even on public repositories. Copilot's Additional Products Terms are explicit:
Highlighted: data GitHub collects about you
GitHub Additional Products Terms — Copilot
"GitHub Copilot may collect and process data… this may include Prompts, Suggestions, and code snippets, and will collect additional usage information tied to your Account — this may include Service Usage Information, Website Usage Data, and Feedback Data. This may include personal data.
For GitHub Copilot Free users, the data collected by GitHub Copilot may be used for AI Model training where permitted and if you allow in your settings."
Not only your hosted code, but your coding behaviour — what you type, accept, reject, and how long you pause — becomes training signal for a commercial product. Copilot Business and Enterprise customers received a 2024 contractual update confirming GitHub will not use inputs or outputs for model training "unless instructed in writing to do so." Free and personal plan users receive no such protection. It is a two-tier system that monetises the privacy of developers who cannot afford paid plans.
Copyright Erosion in the Age of Copilot
The class action Doe v. GitHub argued that Copilot violated developer rights by reproducing licensed code without attribution, directly breaching MIT, GPL, and Apache 2.0 terms that require credit. Courts have not settled this — but individual developers cannot afford to litigate it.
U.S. copyright law protects original expression — including source code — even when publicly accessible. The EU Copyright Directive's Article 4 requires explicit authorisation for text and data mining; most commercial AI training falls outside its research exceptions. Public availability does not equal permission.
GitHub vs. SourceHut
| Criterion | GitHub (Microsoft) | SourceHut (sr.ht) |
|---|---|---|
| Business model | Free users = product; sell Copilot | Subscriptions — you are the customer |
| AI training on code | Yes — ToS Section D.8 | No AI products; no training use |
| Platform source | Proprietary | Fully open-source (AGPL) |
| Telemetry | Extensive (41+ scripts/page) | Minimal, privacy-first |
| Takedown risk | At Microsoft's discretion | Self-host a fork if needed |
| Workflow lock-in | Proprietary issues/PRs | Native git email workflow |
| Financial transparency | None | Annual public reports |
| Issue archive | Proprietary silo, no export standard | public-inbox.org compatible |
How to Migrate to SourceHut
SourceHut is a suite of open-source developer tools funded by subscriptions. The migration has four parts: repositories, issues, mailing list archival, and CI pipelines.
Part 1 — Migrate Repositories with gh2srht
gh2srht, written by Simon Ser (~emersion), automates bulk migration of GitHub repositories — including issues and labels — directly to SourceHut. Install it via Go:
go install git.sr.ht/~emersion/gh2srht@latest
Export your GitHub and SourceHut tokens as environment variables:
# github.com/settings/tokens — needs: repo, read:org
export GITHUB_TOKEN="ghp_xxxxxxxxxxxxxxxxxxxx"
# meta.sr.ht/oauth — needs: repos:write
export SRHT_TOKEN="xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Run gh2srht to migrate all repositories for a GitHub user:
# Migrate all public repos
gh2srht -github-user yourghuser -srht-user ~yoursrhtuser
# Or migrate a single repository
gh2srht -github-repo yourghuser/myrepo -srht-user ~yoursrhtuser
gh2srht creates the SourceHut repositories, pushes all branches and tags, and migrates issues with labels intact. For repositories needing manual handling:
-
Clone as a bare mirror:
git clone --mirror https://github.com/youruser/yourrepo.git cd yourrepo.git -
Add the SourceHut remote and push. SourceHut auto-creates the repo on first push if it does not exist:
git remote add srht [email protected]:~youruser/yourrepo git push srht --mirror -
Update your local origin:
git remote set-url origin [email protected]:~youruser/yourrepo -
Archive on GitHub via Settings → Danger Zone → Archive this repository, then update your README:
## Repository Moved Canonical development is now at: https://git.sr.ht/~youruser/yourrepo This GitHub mirror is archived and read-only.
Part 2 — Durable Archives with public-inbox
public-inbox.org is an archival and indexing system for public mailing lists that makes every thread permanently searchable over HTTPS, NNTP, and IMAP. GitHub Issues exist entirely within GitHub's proprietary silo — if GitHub removes your repository or goes offline, your issue history vanishes. public-inbox provides a durable, decentralised archive that no platform can take away.
# Clone source from the canonical repository
git clone https://public-inbox.org/public-inbox.git
cd public-inbox
# Install required Perl dependencies (Perl 5.12+, Git 1.8+)
# Core: DBD::SQLite (IMAP/NNTP/threading), URI (HTML/Atom output)
# Optional but recommended: Plack (HTTP interface), Xapian (full-text search)
#
# On Debian/Ubuntu:
# sudo apt install perl libdbd-sqlite3-perl liburi-perl \
# libplack-perl libsearch-xapian-perl
# On Alpine:
# apk add perl perl-dbd-sqlite perl-uri perl-plack
# Or install via cpan:
# cpan DBD::SQLite URI Plack Search::Xapian
# Build, test, and install to /usr/local
perl Makefile.PL
make
make test
sudo make install
# --- Unprivileged / minimal install (symlinks into $HOME/bin) ---
perl Makefile.PL
make symlink-install prefix=$HOME
# Initialise an inbox for your project list
public-inbox-init -V2 myproject \
/srv/public-inbox/myproject \
https://lists.sr.ht/~youruser/myproject-devel \
[email protected]
# Subscribe the inbox via LMTP — add to your MTA's aliases:
myproject-devel: "|/usr/bin/public-inbox-mda --no-precheck"
# Start the HTTPS read interface
public-inbox-httpd --listen 0.0.0.0:8080
Part 3 — Export GitHub Issues to public-inbox
gh2srht migrates issues to todo.sr.ht automatically. To also inject them into your public-inbox archive:
# Export issues as JSON via the GitHub CLI
gh issue list --repo youruser/yourrepo \
--json number,title,body,state --limit 1000 > issues.json
# Convert to mbox format
python3 issues2mbox.py issues.json > issues.mbox
# Inject into public-inbox
public-inbox-inject /srv/public-inbox/myproject < issues.mbox
Part 3b — Decentralized Project Management with git-bug
git-bug is a platform‑independent issue tracker that stores issues, comments, and metadata as native git objects inside your repository. There are no separate files or web UI; the entire project management history lives in the git database, pushes and pulls with any standard remote, and works offline. A bridge for GitHub exists today, and an open feature request for a SourceHut bridge can be found in issue #1024 (area/bridge · kind/feature · priority/important-longterm). An active pull request, PR #1499, is adding native todo.sr.ht GraphQL synchronization; until that lands the GitHub bridge can still push SourceHut by mirroring your repo.
Installation is simple:
# Go
GO111MODULE=on go install github.com/src-d/git-bug/cmd/git-bug@latest
Basic usage:
git bug user create
git bug add "Issue title"
git bug ls # list issues
git bug termui # curses interface
git bug webui # browser UI
To import an existing GitHub project, configure the bridge and pull the data using your GitHub token:
git bug bridge configure github \
--repo youruser/yourrepo --token $GITHUB_TOKEN
# fetch issues, comments, labels, and users
GitHub bridge pulling…
Because issues are stored in refs, pushing to SourceHut carries them
automatically; a simple git push srht --mirror will include
the refs/git-bug/* namespace. The SourceHut bridge status
remains open: see issue
#1024 for the SourceHut-specific work and the linked
PR
#1499 for the todo.sr.ht sync.
Part 4 — CI Pipelines on builds.sr.ht
Replace .github/workflows/*.yml with a .build.yml at your repository root:
image: alpine/edge
packages:
- make
- gcc
- git
sources:
- https://git.sr.ht/~youruser/yourrepo
tasks:
- build: |
cd yourrepo
make all
- test: |
cd yourrepo
make test
- lint: |
cd yourrepo
make lint
Part 4b — Bridge CI During Transition with hottub
If you are running a gradual migration and still have a GitHub mirror active, hottub — also by Simon Ser (~emersion) — acts as a CI bridge that forwards GitHub Check run events to builds.sr.ht. This lets you keep GitHub as a read-only mirror for contributors who haven't switched yet, while all actual CI runs execute on SourceHut. A public hosted instance is available if you don't want to self-host.
Step 1 — Register a GitHub App for the Checks API
Open the GitHub App registration page and configure it as follows:
- Set a name and homepage URL for your instance
- Leave the callback URL empty
- Set the Setup URL to
https://<your-domain>/post-install - Set the Webhook URL to
https://<your-domain>/webhook - Under Repository permissions, grant: Checks (read/write), Commit statuses (read/write), Contents (read-only), Metadata (read-only), Pull requests (read-only)
- Under Subscribe to events, check: Check run, Check suite, Pull request
Step 2 — Collect credentials
From the app settings page, note the App ID and optional Webhook secret. Then generate and download a PEM private key — you'll pass this file to hottub at startup.
Step 3 — Build and start hottub
# Clone and build
git clone https://github.com/emersion/hottub
cd hottub
go build
# Run with your GitHub app credentials
./hottub \
-gh-app-id <app-id> \
-gh-private-key <path/to/private-key.pem> \
-gh-webhook-secret <webhook-secret>
Optionally, register an sr.ht OAuth2 client (redirection URI: https://<your-domain>/authorize-srht) to improve the user authorisation flow, then pass its credentials:
./hottub \
-gh-app-id <app-id> \
-gh-private-key <path/to/private-key.pem> \
-gh-webhook-secret <webhook-secret> \
-metasrht-client-id <client-id> \
-metasrht-client-secret <client-secret>
Step 4 — Install the app on your GitHub repositories
From the GitHub App page, click Install App and select the repositories you want to bridge. Once installed, any Check run triggered on GitHub will be forwarded to builds.sr.ht and results reported back to the GitHub PR or commit status — giving contributors a familiar green/red check while your CI runs entirely outside GitHub's infrastructure.
Once the migration is complete and contributors have followed to SourceHut, you can remove the GitHub App installation and archive the mirror entirely.
Part 5 — Harden Against AI Scrapers
Even on SourceHut, assert your rights explicitly at the license level. Append a TDM restriction clause to your LICENSE file:
AI TRAINING RESTRICTION
-----------------------
You may not use, reproduce, or exploit this software or any portion thereof
for text and data mining, including training, fine-tuning, evaluating, or
distilling artificial intelligence or machine learning models, without
explicit written permission from the copyright holder(s).
Add opt-out signals that tooling and crawlers recognise:
# package.json
{ "ai-training-opt-out": true, "tdm-reservation": "1" }
# pyproject.toml
[tool.metadata]
ai-training-opt-out = true
# .well-known/ai.txt (at your domain root)
User-agent: *
Disallow: /
NoAITrain: true
Taking Your Code Back
GitHub's Section D.8 is not an accident or oversight. It is the monetisation mechanism: your publicly hosted code is raw material for Copilot, sold to enterprises at scale. The TOS pre-emptively voids any protective terms you write into your own licenses and prohibits the technical countermeasures you might otherwise deploy.
SourceHut offers a different compact. You pay a subscription, you receive a service, the platform's incentives align with yours. The source code is open. The financials are public. Your mailing list archive lives in public-inbox — a standard, self-hostable system independent of any corporate platform. There is no Copilot.
Migration takes an afternoon. The tools are solid: gh2srht handles repositories and issues in bulk; public-inbox gives every project a permanent, decentralised archive. The privacy, copyright integrity, and alignment of incentives you receive in return are worth considerably more than the time it takes.
Resources:
sourcehut.org — official site, documentation, and subscription
sr.ht/~emersion/gh2srht/ — GitHub → SourceHut migration tool by Simon Ser
public-inbox.org — self-hosted mailing list archival and indexing
github.com/src-d/git-bug — git-bug distributed issue tracker (see issue #1024 for SourceHut bridge)
docs.github.com/site-policy — GitHub ToS (read Section D.8)
Issue #1024 — open feature request for SourceHut bridge
PR #1499 — todo.sr.ht GraphQL sync work in progress