User Tools

Site Tools


tech:computer-stuff:roo-code-with-vllm

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

tech:computer-stuff:roo-code-with-vllm [2025/06/05 08:13] – created jon-dokuwikitech:computer-stuff:roo-code-with-vllm [2025/06/05 08:37] (current) – Add initial info jon-dokuwiki
Line 1: Line 1:
-Initial commit+[[https://roocode.com|Roo Code]] is an extension to VSCode, and I quote them verbatim: 
 + 
 +//Roo Code is an open-source, AI-powered coding assistant that runs in VS Code. It goes beyond simple autocompletion by reading and writing across multiple files, executing commands, and adapting to your workflow—like having a whole dev team right inside your editor.// 
 + 
 +I am very new to Roo Code, and frankly, my main interest is not in actually using Roo Code but rather to get it up-and-running with locally hosted models. Roo Code supports Ollama and that is likely the simplest way to get it to work locally as Ollama is super simple to get running. I have however chosen to go for vLLM because I'm aiming for the highest throughput and because I want to check out vLLM. My initial test is with //DeepSeek-R1-Distill-Qwen-14B// and the following docker compose is what I use to get vLLM up and running. 
 + 
 +<code> 
 +services: 
 +  vllm: 
 +    container_name: vllm 
 +    image: vllm/vllm-openai:latest 
 +    deploy: 
 +      resources: 
 +        reservations: 
 +          devices: 
 +            - driver: nvidia 
 +              device_ids: ["0", "1"
 +              capabilities: [gpu] 
 +    runtime: nvidia 
 +    ports: 
 +      - "8001:8000" 
 +    volumes: 
 +      - ~/.cache/huggingface:/root/.cache/huggingface 
 +    environment: 
 +      - HUGGING_FACE_HUB_TOKEN=<token here!> 
 +    command: > 
 +      --model deepseek-ai/DeepSeek-R1-Distill-Qwen-14B 
 +      --tensor-parallel-size 2 
 +      --gpu-memory-utilization 0.95 
 +      --max-model-len 16384 
 +      --allowed-origins [\"*\"
 +      --dtype float16 
 +    ipc: host 
 +</code> 
 + 
 +To fit the code on two RTX 3090s I'm using //--dtype float16// and since I'm... TBC 
tech/computer-stuff/roo-code-with-vllm.txt · Last modified: by jon-dokuwiki