Cloud — AI in Google Cloud, AWS & Azure

A cloud VM is just a Linux server

A VM (virtual machine) is a computer you rent in someone else's data center. The most common kind is an Ubuntu Linux server — once you connect to it, it behaves exactly like a Linux box sitting on your desk.

Why you want it: an always-on machine you don't have to keep powered at home — good for running an agent, a small server, or anything you want reachable around the clock.

Pick a provider

Any of these rent you a VM:

Google Cloud — the product is called Compute Engine.
AWS — the product is called EC2.
Microsoft Azure — the product is called Virtual Machines.
DigitalOcean — its VMs are called droplets (the simplest place to start).

Create the VM, then follow our Linux guide

In the provider's web console, create an Ubuntu VM (their setup wizard walks you through size, region, and a login key). Once it exists, you connect with SSH — and from that point it's identical to a machine at home, so just follow the Linux guide to install Claude Code, Ollama, Tailscale, and the rest.

💡Tip: the whole point is sameness. There's no special "cloud" version of these tools — once you SSH in, the Linux steps are exactly what you run.

Install the provider's CLI on your own computer

To manage your cloud account from your own machine (start/stop VMs, check billing), install the provider's command-line tool and sign in once:

Google Cloud — the gcloud CLI from cloud.google.com/sdk/docs/install. Sign in with:
```
gcloud init
```
AWS — the aws CLI from aws.amazon.com/cli. Sign in with:
```
aws configure
```
Azure — the az CLI from learn.microsoft.com/cli/azure/install-azure-cli. Sign in with:
```
az login
```

⚠Money note: a VM with a graphics card (GPU) costs real money per hour. Stop it when it's not in use. A small CPU-only VM is cheap; a GPU VM is not.

Managed model backends

A managed backend lets you call an AI model over the internet — there's no server for you to run or maintain. You send a request, you get an answer, and you're billed for what you use.

Why you want it: zero machines to babysit, and your AI usage can run through a company cloud account for billing and data control.

The big three providers

Google Cloud — Vertex AI. Its Model Garden hosts Gemini, Anthropic's Claude, Llama, and more. You enable the Vertex AI API in your project, then call the models.
AWS — Amazon Bedrock. Claude, Llama, and others all behind one API.
Azure — Azure AI Foundry / Azure OpenAI Service. Microsoft's managed model service.

The easiest way to experiment: OpenRouter

OpenRouter gives you one API key that reaches hundreds of models from many providers. It's the simplest way to try lots of models without signing up for each one separately.

Point Claude Code at a cloud backend

This is the key tie-in: Claude Code can use these as its engine instead of the default. You set an environment variable before launching it:

AWS Bedrock:
```
export CLAUDE_CODE_USE_BEDROCK=1
```
(plus your AWS region and credentials)
Google Vertex AI:
```
export CLAUDE_CODE_USE_VERTEX=1
```
(plus your Google Cloud project and region)

🏢Why a business would do this: routing through Bedrock or Vertex AI sends your billing and data through your own cloud account instead of a personal subscription — useful for companies. The full Claude playbook is at claude.wholetech.com.

Model-agnostic agents: Hermes Agent (Nous Research) is built to point at whichever model provider you want — Bedrock, Vertex AI, OpenRouter, or a direct API — so it pairs naturally with these cloud backends. Install it on a cloud VM via the Linux guide (it runs natively on Linux: curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash), then run hermes --tui. Docs: hermes-agent.nousresearch.com/docs.

GPU clouds for big local-style models

If you want to run a big open model — the kind that won't fit on your home PC — you can rent a powerful GPU by the hour, just for as long as you need it.

Why you want it: occasional access to serious horsepower without buying a $5,000 graphics card you'd only use now and then.

Where to rent a GPU by the hour

RunPod — runpod.io
Lambda — lambda.ai
Vast.ai — vast.ai

The pattern

It's the same idea every time:

Start a GPU instance in the provider's console.

Install Ollama on it:

curl -fsSL https://ollama.com/install.sh | sh

Pull a large model.
Install Tailscale on the instance too, so you can reach it from home as if it were on your own private network.

That last part means the rented GPU shows up alongside your other devices — your laptop or phone can talk to it directly, no public ports exposed.

⚠The most expensive thing on this page: GPU instances can cost several dollars an hour. When you finish, DESTROY the instance, not just stop it — on many GPU clouds a merely "stopped" instance still keeps its storage and can keep charging. Destroy it and you're back to zero.

Which should you use?

Most people overthink this. Match what you're trying to do to one row below and start there.

If you want…	Use
A spare always-on Linux box	A small cloud VM (DigitalOcean / GCE / EC2)
Claude/Gemini billed through a company cloud account	Bedrock or Vertex AI
To try many models with one key	OpenRouter
To run a big open model occasionally	A by-the-hour GPU cloud + Ollama + Tailscale

💡Tip: you can mix these. A common setup is a cheap always-on VM for everyday tasks, plus an occasional GPU instance you spin up only for the heavy jobs.

The AI stack in the cloud.

How to read this page

A cloud VM is just a Linux server

Managed model backends

GPU clouds for big local-style models

Which should you use?

What you have now

🗺️ The full tool map →

📋 Cheat sheet →