OS·WholeTech
☁️ Google Cloud · AWS · Azure · GPU clouds

The AI stack in the cloud.

The cloud is for when your own hardware isn't enough, or you want something always-on without running a machine at home. It gives you three things: (A) a rented computer to run the same tools, (B) managed model "backends" you call over the internet, and (C) rented GPUs for big models you can't run at home.

You don't need to be a programmer to follow this. Each piece is explained in plain language, and you only need the parts that match what you're trying to do.

Before you start

How to read this page

Unlike the device guides, this isn't a strict six-step recipe. It's four self-contained blocks — pick the one that fits what you need. Most people only use one or two.

🗺️New to the names? See the full map of AI tools and how they fit together at /landscape/.
Cost warning — read this first: the cloud bills by the hour or by usage. A forgotten server or GPU can quietly run up a real bill. Always turn things off when you're done, and in your provider's console set a budget alert so you're emailed before a surprise charge. This single habit matters more than anything else on this page.
1

A cloud VM is just a Linux server

A VM (virtual machine) is a computer you rent in someone else's data center. The most common kind is an Ubuntu Linux server — once you connect to it, it behaves exactly like a Linux box sitting on your desk.

Why you want it: an always-on machine you don't have to keep powered at home — good for running an agent, a small server, or anything you want reachable around the clock.

Pick a provider

Any of these rent you a VM:

  • Google Cloud — the product is called Compute Engine.
  • AWS — the product is called EC2.
  • Microsoft Azure — the product is called Virtual Machines.
  • DigitalOcean — its VMs are called droplets (the simplest place to start).
Create the VM, then follow our Linux guide

In the provider's web console, create an Ubuntu VM (their setup wizard walks you through size, region, and a login key). Once it exists, you connect with SSH — and from that point it's identical to a machine at home, so just follow the Linux guide to install Claude Code, Ollama, Tailscale, and the rest.

💡Tip: the whole point is sameness. There's no special "cloud" version of these tools — once you SSH in, the Linux steps are exactly what you run.
Install the provider's CLI on your own computer

To manage your cloud account from your own machine (start/stop VMs, check billing), install the provider's command-line tool and sign in once:

Money note: a VM with a graphics card (GPU) costs real money per hour. Stop it when it's not in use. A small CPU-only VM is cheap; a GPU VM is not.
2

Managed model backends

A managed backend lets you call an AI model over the internet — there's no server for you to run or maintain. You send a request, you get an answer, and you're billed for what you use.

Why you want it: zero machines to babysit, and your AI usage can run through a company cloud account for billing and data control.

The big three providers
  • Google Cloud — Vertex AI. Its Model Garden hosts Gemini, Anthropic's Claude, Llama, and more. You enable the Vertex AI API in your project, then call the models.
  • AWS — Amazon Bedrock. Claude, Llama, and others all behind one API.
  • Azure — Azure AI Foundry / Azure OpenAI Service. Microsoft's managed model service.
The easiest way to experiment: OpenRouter

OpenRouter gives you one API key that reaches hundreds of models from many providers. It's the simplest way to try lots of models without signing up for each one separately.

Point Claude Code at a cloud backend

This is the key tie-in: Claude Code can use these as its engine instead of the default. You set an environment variable before launching it:

  • AWS Bedrock:
    export CLAUDE_CODE_USE_BEDROCK=1
    (plus your AWS region and credentials)
  • Google Vertex AI:
    export CLAUDE_CODE_USE_VERTEX=1
    (plus your Google Cloud project and region)
🏢Why a business would do this: routing through Bedrock or Vertex AI sends your billing and data through your own cloud account instead of a personal subscription — useful for companies. The full Claude playbook is at claude.wholetech.com.
Model-agnostic agents: Hermes Agent (Nous Research) is built to point at whichever model provider you want — Bedrock, Vertex AI, OpenRouter, or a direct API — so it pairs naturally with these cloud backends. Install it on a cloud VM via the Linux guide (it runs natively on Linux: curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash), then run hermes --tui. Docs: hermes-agent.nousresearch.com/docs.
3

GPU clouds for big local-style models

If you want to run a big open model — the kind that won't fit on your home PC — you can rent a powerful GPU by the hour, just for as long as you need it.

Why you want it: occasional access to serious horsepower without buying a $5,000 graphics card you'd only use now and then.

Where to rent a GPU by the hour The pattern

It's the same idea every time:

  • Start a GPU instance in the provider's console.
  • Install Ollama on it:
    curl -fsSL https://ollama.com/install.sh | sh
  • Pull a large model.
  • Install Tailscale on the instance too, so you can reach it from home as if it were on your own private network.

That last part means the rented GPU shows up alongside your other devices — your laptop or phone can talk to it directly, no public ports exposed.

The most expensive thing on this page: GPU instances can cost several dollars an hour. When you finish, DESTROY the instance, not just stop it — on many GPU clouds a merely "stopped" instance still keeps its storage and can keep charging. Destroy it and you're back to zero.
4

Which should you use?

Most people overthink this. Match what you're trying to do to one row below and start there.

If you want…Use
A spare always-on Linux boxA small cloud VM (DigitalOcean / GCE / EC2)
Claude/Gemini billed through a company cloud accountBedrock or Vertex AI
To try many models with one keyOpenRouter
To run a big open model occasionallyA by-the-hour GPU cloud + Ollama + Tailscale
💡Tip: you can mix these. A common setup is a cheap always-on VM for everyday tasks, plus an occasional GPU instance you spin up only for the heavy jobs.
You're set

What you have now

You know the three ways to use AI in the cloud — a rented Linux server, a managed model backend, or an hourly GPU — and how to point Claude Code at a cloud backend. Use only the parts you need, and keep that budget alert on. The single most important habit: turn it off when you're done.