The cloud is for when your own hardware isn't enough, or you want something always-on without running a machine at home. It gives you three things: (A) a rented computer to run the same tools, (B) managed model "backends" you call over the internet, and (C) rented GPUs for big models you can't run at home.
You don't need to be a programmer to follow this. Each piece is explained in plain language, and you only need the parts that match what you're trying to do.
Unlike the device guides, this isn't a strict six-step recipe. It's four self-contained blocks — pick the one that fits what you need. Most people only use one or two.
A VM (virtual machine) is a computer you rent in someone else's data center. The most common kind is an Ubuntu Linux server — once you connect to it, it behaves exactly like a Linux box sitting on your desk.
Why you want it: an always-on machine you don't have to keep powered at home — good for running an agent, a small server, or anything you want reachable around the clock.
Pick a providerAny of these rent you a VM:
In the provider's web console, create an Ubuntu VM (their setup wizard walks you through size, region, and a login key). Once it exists, you connect with SSH — and from that point it's identical to a machine at home, so just follow the Linux guide to install Claude Code, Ollama, Tailscale, and the rest.
To manage your cloud account from your own machine (start/stop VMs, check billing), install the provider's command-line tool and sign in once:
gcloud CLI from cloud.google.com/sdk/docs/install. Sign in with:
gcloud init
aws CLI from aws.amazon.com/cli. Sign in with:
aws configure
az CLI from learn.microsoft.com/cli/azure/install-azure-cli. Sign in with:
az login
A managed backend lets you call an AI model over the internet — there's no server for you to run or maintain. You send a request, you get an answer, and you're billed for what you use.
Why you want it: zero machines to babysit, and your AI usage can run through a company cloud account for billing and data control.
The big three providersOpenRouter gives you one API key that reaches hundreds of models from many providers. It's the simplest way to try lots of models without signing up for each one separately.
Point Claude Code at a cloud backendThis is the key tie-in: Claude Code can use these as its engine instead of the default. You set an environment variable before launching it:
export CLAUDE_CODE_USE_BEDROCK=1(plus your AWS region and credentials)
export CLAUDE_CODE_USE_VERTEX=1(plus your Google Cloud project and region)
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash), then run hermes --tui. Docs: hermes-agent.nousresearch.com/docs.If you want to run a big open model — the kind that won't fit on your home PC — you can rent a powerful GPU by the hour, just for as long as you need it.
Why you want it: occasional access to serious horsepower without buying a $5,000 graphics card you'd only use now and then.
Where to rent a GPU by the hour The patternIt's the same idea every time:
curl -fsSL https://ollama.com/install.sh | sh
That last part means the rented GPU shows up alongside your other devices — your laptop or phone can talk to it directly, no public ports exposed.
Most people overthink this. Match what you're trying to do to one row below and start there.
| If you want… | Use |
|---|---|
| A spare always-on Linux box | A small cloud VM (DigitalOcean / GCE / EC2) |
| Claude/Gemini billed through a company cloud account | Bedrock or Vertex AI |
| To try many models with one key | OpenRouter |
| To run a big open model occasionally | A by-the-hour GPU cloud + Ollama + Tailscale |
You know the three ways to use AI in the cloud — a rented Linux server, a managed model backend, or an hourly GPU — and how to point Claude Code at a cloud backend. Use only the parts you need, and keep that budget alert on. The single most important habit: turn it off when you're done.