tech:computer-stuff:roo-code-with-vllm
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revision | |||
| tech:computer-stuff:roo-code-with-vllm [2025/06/05 08:37] – Add initial info jon-dokuwiki | tech:computer-stuff:roo-code-with-vllm [2025/08/27 12:26] (current) – Update with info about gpt-oss-120b jon-dokuwiki | ||
|---|---|---|---|
| Line 34: | Line 34: | ||
| </ | </ | ||
| - | To fit the code on two RTX 3090s I'm using //--dtype float16// and since I'm... TBC | + | UPDATE: Roo Code requires complex reasoning and understanding. The closes |
| + | < | ||
| + | services: | ||
| + | vllm: | ||
| + | container_name: | ||
| + | image: vllm/ | ||
| + | restart: unless-stopped | ||
| + | deploy: | ||
| + | resources: | ||
| + | reservations: | ||
| + | devices: | ||
| + | - driver: nvidia | ||
| + | device_ids: [" | ||
| + | capabilities: | ||
| + | runtime: nvidia | ||
| + | ports: | ||
| + | - " | ||
| + | volumes: | ||
| + | - ~/ | ||
| + | environment: | ||
| + | - HUGGING_FACE_HUB_TOKEN=HFTOKEN_HERE | ||
| + | - TORCH_CUDA_ARCH_LIST=8.6 | ||
| + | - NCCL_IB_DISABLE=1 | ||
| + | - NCCL_P2P_DISABLE=0 | ||
| + | # vLLM stability/ | ||
| + | - VLLM_WORKER_MULTIPROC_METHOD=spawn | ||
| + | - CUDA_DEVICE_MAX_CONNECTIONS=1 | ||
| + | - VLLM_ATTENTION_BACKEND=TRITON_ATTN_VLLM_V1 | ||
| + | command: > | ||
| + | --model openai/ | ||
| + | --tensor-parallel-size 4 | ||
| + | --gpu-memory-utilization 0.90 | ||
| + | --dtype auto | ||
| + | --max-model-len 131072 | ||
| + | --allowed-origins [\" | ||
| + | --disable-fastapi-docs | ||
| + | --hf-overrides ' | ||
| + | --disable-custom-all-reduce | ||
| + | |||
| + | ipc: host | ||
| + | |||
| + | </ | ||
| + | |||
| + | U can prob. push the GPU mem util past 0.90. I've made it as far as 0.95, and more memory means larger KV cache and larger throughput. In Roo I had to activate high (max) reasoning to make it understand the complex requests. The current issue is that all code that // | ||
tech/computer-stuff/roo-code-with-vllm.1749112630.txt.gz · Last modified: by jon-dokuwiki
