3457 字
17 分钟

Dokploy 安全与监控体系完整部署指南:CrowdSec + Traefik + Grafana

Dokploy 的第三弹文章来啦!以下为本人所使用的安全规则和配置。

本文记录了在 Dokploy 环境中构建完整安全与监控体系的实践经验,涵盖 CrowdSec 威胁检测、Traefik 与 Cloudflare 集成、以及基于 Grafana/Loki/VictoriaMetrics 的可观测性平台。所有配置均可直接复刻使用。

Dokploy 系列文章:

架构概述#

整个体系由三大核心模块组成:

Dokploy 安全与监控体系架构

┌─────────────────────────────────────────────────────────────────────────┐
│ Internet │
│ 用户请求 ──► Cloudflare CDN ──► Traefik (:80/:443) │
└─────────────────────────────────────────────────────────────────────────┘
┌─────────────────┴─────────────────┐
▼ ▼
┌───────────────────┐ ┌─────────────────┐
│ Traefik 中间件 │ │ 访问日志 │
│ ┌───────────────┐ │ │ (JSON格式) │
│ │Cloudflare-Only│ │ └────────┬────────┘
│ │ IP白名单 │ │ │
│ └───────────────┘ │ ▼
│ ┌───────────────┐ │ ┌─────────────────┐
│ │ crowdsec │◄├─────────────────│ CrowdSec │
│ │ 威胁检测 │ │ │ Engine │
│ └───────────────┘ │ └────────┬────────┘
└─────────┬─────────┘ │
│ ┌──────────────┼──────────────┐
▼ ▼ ▼ ▼
┌─────────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Docker 容器 │ │ Firewall │ │ Traefik │ │VictoriaMetrics│
│ 应用服务 │ │ Bouncer │ │ Bouncer │ │ 告警通知 │
└─────────────────┘ │ (nftables) │ │ (Plugin) │ └─────────────┘
└─────────────┘ └─────────────┘
┌─────────────────────────────────────────────────────────────────────────┐
│ 监控体系 │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Alloy │───►│ Loki │───►│ Grafana │ │
│ │ 日志采集 │ │ 日志存储 │ │ 可视化 │ │
│ └─────────────┘ └─────────────┘ └──────┬──────┘ │
│ ┌─────────────┐ ┌─────────────┐ │ │
│ │Node Exporter│───►│VictoriaMetrics│─────────┘ │
│ │ 主机指标 │ │ 指标存储 │ │
│ └─────────────┘ └─────────────┘ │
└─────────────────────────────────────────────────────────────────────────┘

组件职责#

组件职责数据保留
CrowdSec Engine威胁检测引擎,分析日志识别恶意行为-
Firewall Bouncer主机层防火墙,使用 nftables 阻止恶意 IP-
Traefik Bouncer应用层防护,在请求进入时实时阻止-
Loki日志聚合存储14 天
VictoriaMetrics时间序列指标存储90 天
Alloy日志采集器,支持多种日志源-
Node Exporter主机性能指标采集-
Grafana统一可视化面板-

第一部分:CrowdSec 安全防护#

1.1 CrowdSec 简介#

CrowdSec 是一个开源的协作式安全引擎,通过分析日志检测恶意行为,并通过 Bouncer 执行封禁决策。其特点包括:

  • 行为分析:基于场景的威胁检测
  • 社区共享:全球威胁情报共享
  • 多层防护:支持防火墙和应用层 Bouncer
  • 轻量高效:使用 Go 编写,资源占用低

1.2 在 Dokploy 中部署 CrowdSec#

创建一个 Compose 服务,使用以下配置:

docker-compose.yml
services:
crowdsec:
image: crowdsecurity/crowdsec:latest
environment:
GID: "${GID-1000}"
COLLECTIONS: "crowdsecurity/linux crowdsecurity/traefik crowdsecurity/http-cve"
volumes:
- crowdsec-db:/var/lib/crowdsec/data/
- crowdsec-config:/etc/crowdsec/
- ../files/acquis.yaml:/etc/crowdsec/acquis.yaml
- ../files/victoriametrics.yaml:/etc/crowdsec/notifications/http_victoriametrics.yaml
- ../files/profiles.yaml:/etc/crowdsec/profiles.yaml
- /etc/dokploy/traefik/dynamic/access.log:/var/log/traefik/access.log:ro
- /var/log/auth.log:/var/log/ssh/auth.log:ro
security_opt:
- no-new-privileges:true
ports:
- "127.0.0.1:8080:8080" # LAPI
- "127.0.0.1:6060:6060" # Prometheus metrics
labels:
- traefik.enable=false
restart: unless-stopped
networks:
- default
- dokploy-network
networks:
default:
dokploy-network:
external: true
volumes:
crowdsec-db:
crowdsec-config:

关键配置说明:

  • COLLECTIONS: 启用的检测集合,包括 Linux 系统、Traefik Web 和 HTTP CVE
  • 挂载 Traefik 访问日志供 CrowdSec 分析
  • 挂载 SSH 认证日志检测暴力破解
  • 端口仅绑定到 127.0.0.1,增强安全性

1.3 日志采集配置#

创建 files/acquis.yaml 文件,定义 CrowdSec 要分析的日志源:

acquis.yaml
---
# Traefik 访问日志采集
filenames:
- /var/log/traefik/access.log
poll_without_inotify: true
labels:
type: traefik
---
# SSH 认证日志采集
filenames:
- /var/log/ssh/auth.log
labels:
type: syslog

1.4 决策配置#

创建 files/profiles.yaml 文件,定义封禁策略:

profiles.yaml
name: default_ip_remediation
filters:
- Alert.Remediation == true && Alert.GetScope() == "Ip"
decisions:
- type: ban
duration: 4h
notifications:
- http_victoriametrics
on_success: break
---
name: default_range_remediation
filters:
- Alert.Remediation == true && Alert.GetScope() == "Range"
decisions:
- type: ban
duration: 4h
notifications:
- http_victoriametrics
on_success: break

配置说明:

  • 检测到恶意 IP 后封禁 4 小时
  • 支持单 IP 和 IP 段封禁
  • 每次决策都通知到 VictoriaMetrics 用于监控

1.5 主机层 Firewall Bouncer#

在主机上安装 Firewall Bouncer,使用 nftables 实现防火墙级别封禁:

Terminal window
# 安装
curl -s https://install.crowdsec.net | sudo sh
apt install crowdsec-firewall-bouncer-nftables

配置文件 /etc/crowdsec/bouncers/crowdsec-firewall-bouncer.yaml

mode: nftables
update_frequency: 10s
log_mode: file
log_dir: /var/log/
log_level: info
log_compression: true
log_max_size: 100
log_max_backups: 3
log_max_age: 30
api_url: http://127.0.0.1:8080/
api_key: YOUR_API_KEY_HERE
insecure_skip_verify: false
disable_ipv6: false
deny_action: DROP
deny_log: false
supported_decisions_types:
- ban
blacklists_ipv4: crowdsec-blacklists
blacklists_ipv6: crowdsec6-blacklists
ipset_type: nethash
iptables_chains:
- INPUT
iptables_add_rule_comments: true
nftables:
ipv4:
enabled: true
set-only: false
table: crowdsec
chain: crowdsec-chain
priority: -10
ipv6:
enabled: true
set-only: false
table: crowdsec6
chain: crowdsec6-chain
priority: -10
nftables_hooks:
- input
- forward
prometheus:
enabled: false
listen_addr: 127.0.0.1
listen_port: 60601

获取 API Key:

Terminal window
# 在 CrowdSec 容器中执行
docker exec -it <crowdsec-container> cscli bouncers add firewall-bouncer

1.6 Traefik Bouncer 插件#

Traefik Bouncer 在应用层实现实时请求过滤。首先在 traefik.yml 中启用插件:

experimental:
plugins:
bouncer:
moduleName: github.com/maxlerebourg/crowdsec-bouncer-traefik-plugin
version: v1.4.6

然后在中间件配置中使用:

http:
middlewares:
crowdsec:
plugin:
bouncer:
enabled: true
crowdsecMode: live
crowdsecLapiKey: YOUR_TRAEFIK_BOUNCER_KEY
crowdsecLapiHost: service-crowdsec-k5ws6c-crowdsec-1:8080

获取 Traefik Bouncer Key:

Terminal window
docker exec -it <crowdsec-container> cscli bouncers add traefik-bouncer

1.7 VictoriaMetrics 告警通知#

创建 files/victoriametrics.yaml,将 CrowdSec 决策推送到 VictoriaMetrics:

type: http
name: http_victoriametrics
log_level: debug
format: >
{{- range $Alert := . -}}
{{- $traefikRouters := GetMeta . "traefik_router_name" -}}
{{- range .Decisions -}}
{"metric":{"__name__":"cs_lapi_decision","instance":"dokploy-server","country":"{{$Alert.Source.Cn}}","asname":"{{$Alert.Source.AsName}}","asnumber":"{{$Alert.Source.AsNumber}}","latitude":"{{$Alert.Source.Latitude}}","longitude":"{{$Alert.Source.Longitude}}","iprange":"{{$Alert.Source.Range}}","scenario":"{{.Scenario}}","type":"{{.Type}}","duration":"{{.Duration}}","scope":"{{.Scope}}","ip":"{{.Value}}","traefik_routers":{{ printf "%q" ($traefikRouters | uniq | join ",")}}},"values": [1],"timestamps":[{{now|unixEpoch}}000]}
{{- end }}
{{- end -}}
url: http://service-victoriametrics-dhydty-victoriametrics-1:8428/api/v1/import
method: POST
headers:
Content-Type: application/json

这将创建 cs_lapi_decision 指标,包含攻击者的地理位置、ASN、场景等信息,便于在 Grafana 中分析。

第二部分:Traefik 与 Cloudflare 集成#

2.1 为什么需要 Cloudflare IP 白名单#

当使用 Cloudflare 作为 CDN 时,所有请求都经过 Cloudflare 代理。这带来两个问题:

  1. 来源 IP 混淆:Traefik 看到的客户端 IP 是 Cloudflare 的代理 IP,而非真实用户 IP
  2. 直连绕过:攻击者可能直接访问源站 IP,绕过 Cloudflare 的保护

解决方案:

  • 配置 forwardedHeaders.trustedIPs 信任 Cloudflare IP,正确获取真实客户端 IP
  • 配置 ipAllowList 中间件,只允许 Cloudflare IP 访问

2.2 Traefik 静态配置#

/etc/dokploy/traefik/traefik.yml 完整配置:

global:
sendAnonymousUsage: false
experimental:
plugins:
bouncer:
moduleName: github.com/maxlerebourg/crowdsec-bouncer-traefik-plugin
version: v1.4.6
accessLog:
filePath: "/etc/dokploy/traefik/dynamic/access.log"
format: json
bufferingSize: 100
providers:
swarm:
exposedByDefault: false
watch: true
docker:
exposedByDefault: false
watch: true
file:
directory: /etc/dokploy/traefik/dynamic
watch: true
entryPoints:
web:
address: :80
forwardedHeaders:
trustedIPs:
# Cloudflare IPv4
- 173.245.48.0/20
- 103.21.244.0/22
- 103.22.200.0/22
- 103.31.4.0/22
- 141.101.64.0/18
- 108.162.192.0/18
- 190.93.240.0/20
- 188.114.96.0/20
- 197.234.240.0/22
- 198.41.128.0/17
- 162.158.0.0/15
- 104.16.0.0/13
- 104.24.0.0/14
- 172.64.0.0/13
- 131.0.72.0/22
http:
middlewares:
- cloudflare-only@file
- crowdsec@file
websecure:
address: :443
http3:
advertisedPort: 443
forwardedHeaders:
trustedIPs:
# Cloudflare IPv4 (与 web 相同)
- 173.245.48.0/20
- 103.21.244.0/22
- 103.22.200.0/22
- 103.31.4.0/22
- 141.101.64.0/18
- 108.162.192.0/18
- 190.93.240.0/20
- 188.114.96.0/20
- 197.234.240.0/22
- 198.41.128.0/17
- 162.158.0.0/15
- 104.16.0.0/13
- 104.24.0.0/14
- 172.64.0.0/13
- 131.0.72.0/22
http:
middlewares:
- cloudflare-only@file
- crowdsec@file
tls:
certResolver: letsencrypt
api:
insecure: true
certificatesResolvers:
letsencrypt:
acme:
email: your-email@example.com
storage: /etc/dokploy/traefik/dynamic/acme.json
httpChallenge:
entryPoint: web

关键配置说明:

  • forwardedHeaders.trustedIPs: 信任 Cloudflare IP,使 Traefik 正确解析 X-Forwarded-For 头获取真实客户端 IP
  • http.middlewares: 入口点级别的中间件链,所有请求都经过 cloudflare-onlycrowdsec 中间件
  • accessLog.format: json: JSON 格式日志便于 CrowdSec 和 Alloy 解析

2.3 动态中间件配置#

/etc/dokploy/traefik/dynamic/middlewares.yml

http:
middlewares:
redirect-to-https:
redirectScheme:
scheme: https
permanent: true
cloudflare-only:
ipAllowList:
sourceRange:
# Cloudflare IPv4
- "173.245.48.0/20"
- "103.21.244.0/22"
- "103.22.200.0/22"
- "103.31.4.0/22"
- "141.101.64.0/18"
- "108.162.192.0/18"
- "190.93.240.0/20"
- "188.114.96.0/20"
- "197.234.240.0/22"
- "198.41.128.0/17"
- "162.158.0.0/15"
- "104.16.0.0/13"
- "104.24.0.0/14"
- "172.64.0.0/13"
- "131.0.72.0/22"
# Cloudflare IPv6
- "2400:cb00::/32"
- "2606:4700::/32"
- "2803:f800::/32"
- "2405:b500::/32"
- "2405:8100::/32"
- "2a06:98c0::/29"
- "2c0f:f248::/32"
crowdsec:
plugin:
bouncer:
enabled: true
crowdsecMode: live
crowdsecLapiKey: YOUR_TRAEFIK_BOUNCER_KEY
crowdsecLapiHost: service-crowdsec-k5ws6c-crowdsec-1:8080
TIP

Cloudflare IP 列表可能会更新,建议定期从 https://www.cloudflare.com/ips/ 获取最新列表。

2.4 真实 IP 识别机制#

请求流经以下路径时,IP 信息的变化:

用户 (203.0.113.50)
→ Cloudflare (添加 CF-Connecting-IP: 203.0.113.50)
→ Traefik (信任 Cloudflare IP,解析 X-Forwarded-For)
→ CrowdSec (从日志获取真实 IP 203.0.113.50)

Traefik 访问日志示例:

{
"ClientAddr": "172.71.99.180:11150",
"ClientHost": "203.0.113.50",
"RequestHost": "example.com",
"RequestMethod": "GET",
"RequestPath": "/api/data",
"DownstreamStatus": 200,
"RouterName": "my-router@docker"
}
  • ClientAddr: Cloudflare 代理 IP
  • ClientHost: 真实客户端 IP(通过信任配置正确解析)

2.5 特殊路由旁路配置#

某些服务可能需要绕过 CrowdSec 检测(如分析服务)。可以创建单独的路由配置:

rybbit-bypass.yml
http:
routers:
rybbit-backend-websecure:
rule: Host(`analytics.example.com`) && PathPrefix(`/api`)
service: rybbit-backend-service
entryPoints:
- websecure
tls:
certResolver: letsencrypt
priority: 100 # 高优先级确保先匹配
services:
rybbit-backend-service:
loadBalancer:
servers:
- url: http://rybbit-backend:3001
passHostHeader: true

注意:这些路由没有应用 crowdsec@file 中间件,因此不受 CrowdSec 保护。

第三部分:监控日志体系#

3.1 Loki 日志存储#

Loki 是 Grafana Labs 开发的日志聚合系统,专为 Kubernetes 和容器环境设计。

files/loki-config.yaml 配置:

# Loki 3.x 配置 - 优化存储
auth_enabled: false
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: info
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
# Loki 3.x 使用 TSDB + v13 schema
schema_config:
configs:
- from: 2024-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
tsdb_shipper:
active_index_directory: /loki/tsdb-shipper-active
cache_location: /loki/tsdb-shipper-cache
cache_ttl: 24h
filesystem:
directory: /loki/chunks
# Ingester 优化,提高压缩率
ingester:
chunk_idle_period: 30m
chunk_block_size: 262144 # 256KB blocks
chunk_target_size: 1572864 # 1.5MB chunks
chunk_retain_period: 1m
max_chunk_age: 2h
flush_check_period: 30s
flush_op_timeout: 10m
wal:
enabled: true
dir: /loki/wal
flush_on_shutdown: true
limits_config:
retention_period: 336h # 14 天保留
ingestion_rate_mb: 8
ingestion_burst_size_mb: 16
max_streams_per_user: 5000
max_entries_limit_per_query: 5000
max_line_size: 128KB
allow_structured_metadata: true
per_stream_rate_limit: 3MB
per_stream_rate_limit_burst: 10MB
max_label_name_length: 1024
max_label_value_length: 2048
max_label_names_per_series: 15
compactor:
working_directory: /loki/compactor
retention_enabled: true
retention_delete_delay: 1h
delete_request_store: filesystem
compaction_interval: 10m
retention_delete_worker_count: 150
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 100
parallelise_shardable_queries: true
cache_results: true
query_scheduler:
max_outstanding_requests_per_tenant: 100
analytics:
reporting_enabled: false

3.2 Alloy 日志采集#

Grafana Alloy 是新一代日志/指标采集器,取代 Promtail。

files/alloy-config.alloy 配置:

// Grafana Alloy 配置 - 日志采集
// 优化存储效率
// ============================================
// Loki 输出端点
// ============================================
loki.write "default" {
endpoint {
url = "http://loki:3100/loki/api/v1/push"
batch_wait = "5s" // 批量发送,提高压缩率
batch_size = "1MiB" // 1MB 批量大小
}
external_labels = {
cluster = "dokploy",
}
}
// ============================================
// Docker 容器发现
// ============================================
discovery.docker "containers" {
host = "unix:///var/run/docker.sock"
refresh_interval = "30s"
}
// ============================================
// Docker 容器日志采集
// ============================================
discovery.relabel "docker_logs" {
targets = discovery.docker.containers.targets
// 提取容器名称
rule {
source_labels = ["__meta_docker_container_name"]
regex = "/(.*)"
target_label = "container"
}
// 提取 compose 项目
rule {
source_labels = ["__meta_docker_container_label_com_docker_compose_project"]
target_label = "compose_project"
}
// 提取 compose 服务
rule {
source_labels = ["__meta_docker_container_label_com_docker_compose_service"]
target_label = "compose_service"
}
// 提取镜像名
rule {
source_labels = ["__meta_docker_container_image"]
target_label = "image"
}
}
loki.source.docker "containers" {
host = "unix:///var/run/docker.sock"
targets = discovery.relabel.docker_logs.output
forward_to = [loki.process.docker.receiver]
}
loki.process "docker" {
stage.static_labels {
values = {
job = "docker",
host = "dokploy-server",
}
}
forward_to = [loki.write.default.receiver]
}
// ============================================
// Traefik 访问日志 (JSON 格式)
// ============================================
local.file_match "traefik" {
path_targets = [{"__path__" = "/var/log/traefik/access.log"}]
sync_period = "15s"
}
loki.source.file "traefik" {
targets = local.file_match.traefik.targets
forward_to = [loki.process.traefik.receiver]
tail_from_end = true
}
loki.process "traefik" {
stage.static_labels {
values = {
job = "traefik",
host = "dokploy-server",
}
}
// 解析 JSON 并提取字段作为标签
stage.json {
expressions = {
client_ip = "ClientHost",
method = "RequestMethod",
path = "RequestPath",
status = "DownstreamStatus",
router = "RouterName",
request_host = "RequestHost",
entry_point = "entryPointName",
}
}
stage.labels {
values = {
client_ip = "",
method = "",
status = "",
router = "",
request_host = "",
entry_point = "",
}
}
forward_to = [loki.write.default.receiver]
}
// ============================================
// SSH/Auth 日志
// ============================================
local.file_match "auth" {
path_targets = [{"__path__" = "/var/log/host/auth.log"}]
sync_period = "30s"
}
loki.source.file "auth" {
targets = local.file_match.auth.targets
forward_to = [loki.process.auth.receiver]
tail_from_end = true
}
loki.process "auth" {
stage.static_labels {
values = {
job = "auth",
host = "dokploy-server",
}
}
// 解析 syslog 格式
stage.regex {
expression = "^(?P<timestamp>\\w+\\s+\\d+\\s+\\d+:\\d+:\\d+)\\s+(?P<hostname>\\S+)\\s+(?P<service>[^\\[:]+)(?:\\[(?P<pid>\\d+)\\])?:\\s+(?P<message>.*)$"
}
stage.labels {
values = {
service = "",
}
}
forward_to = [loki.write.default.receiver]
}
// ============================================
// UFW 防火墙日志
// ============================================
local.file_match "ufw" {
path_targets = [{"__path__" = "/var/log/host/ufw.log"}]
sync_period = "30s"
}
loki.source.file "ufw" {
targets = local.file_match.ufw.targets
forward_to = [loki.process.ufw.receiver]
tail_from_end = true
}
loki.process "ufw" {
stage.static_labels {
values = {
job = "ufw",
host = "dokploy-server",
}
}
// 提取 UFW 字段
stage.regex {
expression = "\\[UFW (?P<action>\\w+)\\].*SRC=(?P<src_ip>[\\d\\.a-fA-F:]+).*PROTO=(?P<proto>\\w+)"
}
stage.regex {
expression = "DPT=(?P<dst_port>\\d+)"
}
stage.labels {
values = {
action = "",
src_ip = "",
proto = "",
dst_port = "",
}
}
forward_to = [loki.write.default.receiver]
}
// ============================================
// 内核日志
// ============================================
local.file_match "kernel" {
path_targets = [{"__path__" = "/var/log/host/kern.log"}]
sync_period = "30s"
}
loki.source.file "kernel" {
targets = local.file_match.kernel.targets
forward_to = [loki.process.kernel.receiver]
tail_from_end = true
}
loki.process "kernel" {
stage.static_labels {
values = {
job = "kernel",
host = "dokploy-server",
}
}
forward_to = [loki.write.default.receiver]
}
// ============================================
// CrowdSec 日志
// ============================================
local.file_match "crowdsec" {
path_targets = [
{"__path__" = "/var/log/crowdsec/crowdsec.log", "log_type" = "main"},
{"__path__" = "/var/log/crowdsec/crowdsec_api.log", "log_type" = "api"},
{"__path__" = "/var/log/crowdsec/firewall-bouncer.log", "log_type" = "firewall-bouncer"},
]
sync_period = "30s"
}
loki.source.file "crowdsec" {
targets = local.file_match.crowdsec.targets
forward_to = [loki.process.crowdsec.receiver]
tail_from_end = true
}
loki.process "crowdsec" {
stage.static_labels {
values = {
job = "crowdsec",
host = "dokploy-server",
}
}
forward_to = [loki.write.default.receiver]
}

3.3 VictoriaMetrics 指标存储#

VictoriaMetrics 是高性能的 Prometheus 兼容时间序列数据库。

docker-compose.yml

version: "3.8"
services:
victoriametrics:
image: victoriametrics/victoria-metrics:latest
restart: unless-stopped
volumes:
- vm-data:/storage
- ../files/scrape.yml:/etc/victoriametrics/scrape.yml
- ../files/blackbox:/etc/prometheus/blackbox
ports:
- "127.0.0.1:20001:8428"
command:
- "--storageDataPath=/storage"
- "--retentionPeriod=90d"
- "--httpListenAddr=:8428"
- "--promscrape.config=/etc/victoriametrics/scrape.yml"
networks:
- default
- dokploy-network
networks:
default:
dokploy-network:
external: true
volumes:
vm-data:

files/scrape.yml 指标采集配置:

scrape_configs:
- job_name: 'crowdsec'
scrape_interval: 15s
static_configs:
- targets: ['service-crowdsec-k5ws6c-crowdsec-1:6060']
- job_name: 'logporter'
scrape_interval: 15s
static_configs:
- targets: ['service-loki-wykkft-logporter-1:9333']
metric_relabel_configs:
- source_labels: [__name__]
regex: 'docker_.*'
action: keep
- job_name: 'node-exporter'
scrape_interval: 15s
static_configs:
- targets: ['service-loki-wykkft-node_exporter-1:9100']

3.4 完整 Docker Compose 配置#

Loki + Alloy + Node Exporter + Logporter 的完整配置:

version: "3.8"
services:
loki:
image: grafana/loki:latest
restart: unless-stopped
command: -config.file=/etc/loki/local-config.yaml
volumes:
- loki-data:/loki
- ../files/loki-config.yaml:/etc/loki/local-config.yaml
networks:
- default
- dokploy-network
alloy:
image: grafana/alloy:latest
restart: unless-stopped
command:
- run
- /etc/alloy/config.alloy
- --server.http.listen-addr=0.0.0.0:12345
- --storage.path=/var/lib/alloy/data
volumes:
- ../files/alloy-config.alloy:/etc/alloy/config.alloy
- alloy-data:/var/lib/alloy/data
- /var/run/docker.sock:/var/run/docker.sock:ro
# Traefik 日志
- /etc/dokploy/traefik/dynamic/access.log:/var/log/traefik/access.log:ro
# 系统日志
- /var/log/auth.log:/var/log/host/auth.log:ro
- /var/log/ufw.log:/var/log/host/ufw.log:ro
- /var/log/kern.log:/var/log/host/kern.log:ro
# CrowdSec 日志
- /var/log/crowdsec.log:/var/log/crowdsec/crowdsec.log:ro
- /var/log/crowdsec_api.log:/var/log/crowdsec/crowdsec_api.log:ro
- /var/log/crowdsec-firewall-bouncer.log:/var/log/crowdsec/firewall-bouncer.log:ro
depends_on:
- loki
networks:
- default
- dokploy-network
logporter:
image: lifailon/logporter:latest
restart: unless-stopped
environment:
- DOCKER_LOG_METRICS=false
- DOCKER_LOG_CUSTOM_METRICS=false
- DOCKER_LOG_CUSTOM_QUERY=error|Error|ERROR|exception|Exception|EXCEPTION|fail|Fail|FAIL
volumes:
- /var/run/docker.sock:/var/run/docker.sock
networks:
- default
- dokploy-network
node_exporter:
image: quay.io/prometheus/node-exporter:latest
restart: unless-stopped
command:
- '--path.rootfs=/host'
- '--path.procfs=/host/proc'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/host:ro,rslave
pid: host
networks:
- default
- dokploy-network
networks:
default:
dokploy-network:
external: true
volumes:
loki-data:
alloy-data:

3.5 Grafana 部署#

Grafana 用于统一可视化所有监控数据:

version: "3.8"
services:
grafana:
image: grafana/grafana:latest
restart: unless-stopped
volumes:
- grafana-storage:/var/lib/grafana
ports:
- "127.0.0.1:20000:3000"
networks:
- default
- dokploy-network
networks:
default:
dokploy-network:
external: true
volumes:
grafana-storage:

配置数据源:

  1. Loki: URL 设置为 http://service-loki-wykkft-loki-1:3100
  2. VictoriaMetrics: URL 设置为 http://service-victoriametrics-dhydty-victoriametrics-1:8428

接入面板#

image-20251222161441409

image-20251222161612086

image-20251222161656811

image-20251222161936331

第四部分:故障排查#

4.1 CrowdSec 常见问题#

问题:Bouncer 连接失败

Terminal window
# 检查 CrowdSec 服务状态
docker exec -it <crowdsec-container> cscli bouncers list
# 检查 LAPI 连接
curl -s http://127.0.0.1:8080/v1/decisions | head

问题:没有检测到攻击

Terminal window
# 检查日志采集状态
docker exec -it <crowdsec-container> cscli metrics
# 检查场景状态
docker exec -it <crowdsec-container> cscli scenarios list

问题:误封正常用户

Terminal window
# 查看当前封禁列表
docker exec -it <crowdsec-container> cscli decisions list
# 解除特定 IP 封禁
docker exec -it <crowdsec-container> cscli decisions delete --ip <IP>
# 添加白名单
docker exec -it <crowdsec-container> cscli parsers install crowdsecurity/whitelists

4.2 Traefik 连接问题#

问题:503 Service Unavailable

Terminal window
# 检查容器网络
docker network inspect dokploy-network
# 检查服务是否在正确网络中
docker inspect <container> | grep -A 10 Networks

问题:真实 IP 获取错误

检查 traefik.yml 中的 forwardedHeaders.trustedIPs 是否包含所有 Cloudflare IP。

4.3 日志采集问题#

问题:Loki 中没有日志

Terminal window
# 检查 Alloy 状态
docker logs <alloy-container> --tail 100
# 检查 Loki 健康状态
curl -s http://127.0.0.1:3100/ready

问题:日志标签缺失

检查 Alloy 配置中的 stage.labelsstage.json 配置是否正确。

4.4 监控数据缺失#

问题:VictoriaMetrics 无数据

Terminal window
# 检查抓取目标状态
curl -s http://127.0.0.1:20001/targets
# 手动查询指标
curl -s "http://127.0.0.1:20001/api/v1/query?query=up"

问题:CrowdSec 指标推送失败

检查 victoriametrics.yaml 中的 URL 是否正确,确保容器名称与实际一致。

总结#

本文详细介绍了在 Dokploy 环境中构建完整安全与监控体系的过程:

  1. CrowdSec 多层防护:容器化部署 + 主机 Firewall Bouncer + Traefik 插件
  2. Traefik + Cloudflare 集成:IP 白名单 + 真实 IP 识别
  3. 可观测性平台:Loki 日志 + VictoriaMetrics 指标 + Grafana 可视化
  4. 全面日志采集:Docker 容器、系统日志、Traefik 访问日志、安全日志

这套体系提供了从网络层到应用层的完整防护,以及全面的可观测性支持,适合生产环境使用。


参考资料:

Dokploy 安全与监控体系完整部署指南:CrowdSec + Traefik + Grafana
https://catcat.blog/2025/12/dokploy-security-monitoring-guide/
作者
猫猫博客
发布于
2025-12-22
许可协议
CC BY-NC-SA 4.0