llm

3 posts

Volcengine DeepSeek R1 Full-Featured API Free Usage Guide
2025-02-20
This guide walks you through connecting to Volcengine’s full-featured DeepSeek R1 API from scratch. It covers real-name registration, enabling models, creating API keys and inference endpoints, and calling them via Cherry Studio and ChatBox. You’ll also learn how to enable web search and translation with DeepSeek-V3, leveraging generous free quotas and high RPM/TPM limits.
706 字
|
4 分钟
Multi‑Node Private Deployment of DeepSeek-r1:671B Full Version on K8s + SGLang
This post walks through deploying the full DeepSeek-r1-671B model on Kubernetes with SGLang for production-grade, multi-node GPU inference. It explains how to orchestrate elastic multi-GPU workloads using LeaderWorkerSet and Volcano, optimize performance via RadixAttention and KV cache reuse, integrate Prometheus/Grafana for SLA-grade monitoring, and contrasts this K8s+SGLang stack with Ollama. A step-by-step environment and YAML guide is included.
1575 字
|
8 分钟
Running deepseek-r1 617B with Ollama on Ubuntu 22.04 + 8×A800
2025-01-21
This post shows how to run deepseek‑r1 617B locally with Ollama on Ubuntu 22.04 using a server equipped with dual Xeon Platinum CPUs, 1 TB RAM, NVMe storage, and 8×NVIDIA A800 GPUs. It walks through hardware specs, Ollama installation, model directory and environment configuration, service exposure for remote access, Docker + nvidia-docker2 setup, and deploying Open WebUI for a complete large-model deployment workflow.
731 字
|
4 分钟