The Tech Basement

This is an old revision of the document!

Roo Code is an extension to VSCode, and I quote them verbatim:

Roo Code is an open-source, AI-powered coding assistant that runs in VS Code. It goes beyond simple autocompletion by reading and writing across multiple files, executing commands, and adapting to your workflow—like having a whole dev team right inside your editor.

I am very new to Roo Code, and frankly, my main interest is not in actually using Roo Code but rather to get it up-and-running with locally hosted models. Roo Code supports Ollama and that is likely the simplest way to get it to work locally as Ollama is super simple to get running. I have however chosen to go for vLLM because I'm aiming for the highest throughput and because I want to check out vLLM. My initial test is with DeepSeek-R1-Distill-Qwen-14B and the following docker compose is what I use to get vLLM up and running.

services:
  vllm:
    container_name: vllm
    image: vllm/vllm-openai:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ["0", "1"]
              capabilities: [gpu]
    runtime: nvidia
    ports:
      - "8001:8000"
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface
    environment:
      - HUGGING_FACE_HUB_TOKEN=<token here!>
    command: >
      --model deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
      --tensor-parallel-size 2
      --gpu-memory-utilization 0.95
      --max-model-len 16384
      --allowed-origins [\"*\"]
      --dtype float16
    ipc: host

To fit the code on two RTX 3090s I'm using –dtype float16 and since I'm… TBC

Discussion

JosephHen, 2026/01/05 11:18

我热爱这样的想法, 看到你们相册那样的地方。继续分享感动。 <a href=https://iqvel.com/zh-Hans/a/%E8%82%AF%E5%B0%BC%E4%BA%9A/%E5%AE%89%E5%8D%9A%E5%A1%9E%E5%88%A9%E5%9C%8B%E5%AE%B6%E5%85%AC%E5%9C%92>乞力馬扎羅遠景</a> 精彩的旅游网站, 加油保持这种风格。感谢!

Real name:

E-Mail:

Enter your comment. Wiki syntax is allowed:

Subscribe to comments