llm - Tags - 猫猫博客

Volcengine DeepSeek R1 Full-Featured API Free Usage Guide

2025-02-20

llm

This guide walks you through connecting to Volcengine’s full-featured DeepSeek R1 API from scratch. It covers real-name registration, enabling models, creating API keys and inference endpoints, and calling them via Cherry Studio and ChatBox. You’ll also learn how to enable web search and translation with DeepSeek-V3, leveraging generous free quotas and high RPM/TPM limits.

706 字

|

4 分钟

Multi‑Node Private Deployment of DeepSeek-r1:671B Full Version on K8s + SGLang

2025-02-19

教程

llm

/

Kubernetes

This post walks through deploying the full DeepSeek-r1-671B model on Kubernetes with SGLang for production-grade, multi-node GPU inference. It explains how to orchestrate elastic multi-GPU workloads using LeaderWorkerSet and Volcano, optimize performance via RadixAttention and KV cache reuse, integrate Prometheus/Grafana for SLA-grade monitoring, and contrasts this K8s+SGLang stack with Ollama. A step-by-step environment and YAML guide is included.

1575 字

|

8 分钟

Running deepseek-r1 617B with Ollama on Ubuntu 22.04 + 8×A800

2025-01-21

教程

llm

This post shows how to run deepseek‑r1 617B locally with Ollama on Ubuntu 22.04 using a server equipped with dual Xeon Platinum CPUs, 1 TB RAM, NVMe storage, and 8×NVIDIA A800 GPUs. It walks through hardware specs, Ollama installation, model directory and environment configuration, service exposure for remote access, Docker + nvidia-docker2 setup, and deploying Open WebUI for a complete large-model deployment workflow.

731 字

|

4 分钟