Roo Code is an extension to VSCode, and I quote them verbatim:
Roo Code is an open-source, AI-powered coding assistant that runs in VS Code. It goes beyond simple autocompletion by reading and writing across multiple files, executing commands, and adapting to your workflow—like having a whole dev team right inside your editor.
I am very new to Roo Code, and frankly, my main interest is not in actually using Roo Code but rather to get it up-and-running with locally hosted models. Roo Code supports Ollama and that is likely the simplest way to get it to work locally as Ollama is super simple to get running. I have however chosen to go for vLLM because I'm aiming for the highest throughput and because I want to check out vLLM. My initial test is with DeepSeek-R1-Distill-Qwen-14B and the following docker compose is what I use to get vLLM up and running.
services: vllm: container_name: vllm image: vllm/vllm-openai:latest deploy: resources: reservations: devices: - driver: nvidia device_ids: ["0", "1"] capabilities: [gpu] runtime: nvidia ports: - "8001:8000" volumes: - ~/.cache/huggingface:/root/.cache/huggingface environment: - HUGGING_FACE_HUB_TOKEN=<token here!> command: > --model deepseek-ai/DeepSeek-R1-Distill-Qwen-14B --tensor-parallel-size 2 --gpu-memory-utilization 0.95 --max-model-len 16384 --allowed-origins [\"*\"] --dtype float16 ipc: host
To fit the code on two RTX 3090s I'm using –dtype float16 and since I'm… TBC
Discussion