tech:computer-stuff:roo-code-with-vllm
Differences
This shows you the differences between two versions of the page.
tech:computer-stuff:roo-code-with-vllm [2025/06/05 08:13] – created jon-dokuwiki | tech:computer-stuff:roo-code-with-vllm [2025/06/05 08:37] (current) – Add initial info jon-dokuwiki | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | Initial commit | + | [[https:// |
+ | |||
+ | //Roo Code is an open-source, | ||
+ | |||
+ | I am very new to Roo Code, and frankly, my main interest is not in actually using Roo Code but rather to get it up-and-running with locally hosted models. Roo Code supports Ollama and that is likely the simplest way to get it to work locally as Ollama is super simple to get running. I have however chosen to go for vLLM because I'm aiming for the highest throughput and because I want to check out vLLM. My initial test is with // | ||
+ | |||
+ | < | ||
+ | services: | ||
+ | vllm: | ||
+ | container_name: | ||
+ | image: vllm/ | ||
+ | deploy: | ||
+ | resources: | ||
+ | reservations: | ||
+ | devices: | ||
+ | - driver: nvidia | ||
+ | device_ids: [" | ||
+ | capabilities: | ||
+ | runtime: nvidia | ||
+ | ports: | ||
+ | - " | ||
+ | volumes: | ||
+ | - ~/ | ||
+ | environment: | ||
+ | - HUGGING_FACE_HUB_TOKEN=< | ||
+ | command: > | ||
+ | --model deepseek-ai/ | ||
+ | --tensor-parallel-size 2 | ||
+ | --gpu-memory-utilization 0.95 | ||
+ | --max-model-len 16384 | ||
+ | --allowed-origins [\" | ||
+ | --dtype float16 | ||
+ | ipc: host | ||
+ | </ | ||
+ | |||
+ | To fit the code on two RTX 3090s I'm using //--dtype float16// and since I'm... TBC |
tech/computer-stuff/roo-code-with-vllm.txt · Last modified: by jon-dokuwiki