mirror of
https://github.com/hpd840321/starRiverProperty.git
synced 2026-06-09 08:20:31 +08:00
docs: add ConsulServerList static/IP discovery analysis
Document the investigation into how ConsulServerList was replaced by ConfigurationBasedServerList with hardcoded IPs for Feign client service discovery. Covers V1 vs V2 config gap, three upstream services, and P0-P2 investigation items. Former-commit-id: 15a0d8567de43f8741d50cbcddc2599383942754
This commit is contained in:
@@ -0,0 +1,256 @@
|
||||
# ConsulServerList 静态固定后:IP 直连服务发现分析
|
||||
|
||||
> **日期**:2026-05-05
|
||||
> **分析范围**:`maven-cw-elevator-application`、`maven-intelligent-cwoscomponent`
|
||||
> **状态**:待后续排查
|
||||
|
||||
---
|
||||
|
||||
## 1. 问题背景
|
||||
|
||||
电梯应用(cw-elevator-application)的三个上游 Feign 客户端在 `spring.cloud.consul.discovery.enabled=false` 的情况下,无法通过 Consul 动态发现服务实例。当前解决方式是将 Ribbon 的 `ServerList` 固定为 `ConfigurationBasedServerList`,直接配置静态 IP 列表绕过 Consul。
|
||||
|
||||
但 V1 生产(星中心)中**并没有**这些静态 IP 配置,却仍在正常运行——这个差异需要深入排查。
|
||||
|
||||
---
|
||||
|
||||
## 2. 服务发现链路现状
|
||||
|
||||
### 2.1 V2 当前架构
|
||||
|
||||
```
|
||||
业务代码 → @FeignClient(name = "ninca-crk-std")
|
||||
│
|
||||
▼
|
||||
Ribbon LoadBalancer
|
||||
│
|
||||
▼
|
||||
ConfigurationBasedServerList ← 由 application.properties 强制指定
|
||||
│
|
||||
▼
|
||||
ninca-crk-std.ribbon.listOfServers ← 直接返回硬编码 IP:Port
|
||||
│
|
||||
▼
|
||||
目标 HTTP 服务
|
||||
```
|
||||
|
||||
### 2.2 V1 生产(星中心)架构——未文档化的差异
|
||||
|
||||
```
|
||||
业务代码 → @FeignClient(name = "ninca-crk-std")
|
||||
│
|
||||
▼
|
||||
Ribbon LoadBalancer
|
||||
│
|
||||
▼
|
||||
??? (无 ConfigurationBasedServerList 配置)
|
||||
│
|
||||
▼
|
||||
??? (无 listOfServers 配置)
|
||||
```
|
||||
|
||||
**V1 生产没有 `ribbon.listOfServers`,但 Feign 调用工作正常——原因未知。**
|
||||
|
||||
---
|
||||
|
||||
## 3. 涉及的上游服务与 Feign 客户端
|
||||
|
||||
### 3.1 `ninca-crk-std`(CRK 人脸识别 GPU)
|
||||
|
||||
| Feign 客户端 | 模块 | @FeignClient name | 当前解析方式 |
|
||||
|-------------|------|-------------------|-------------|
|
||||
| `VisitorFeignClient.java` | elevator-service | `${feign.ninca-crk-std.name:ninca-crk-std}` | Ribbon → `listOfServers=10.128.161.95:16106` |
|
||||
| `AcsRecordThreeSendFeignClient.java` | intelligent-cwoscomponent-rest | `${feign.ninca-crk-std.name:ninca-crk-std}` | Ribbon → 同上 |
|
||||
|
||||
另有非 Feign 路径:`AcsElevatorRecordServiceImpl` 通过 `@Value("${ninca-crk-std.ip}")` + `RestTemplate` 直连 CRK,与 Feign 无关。
|
||||
|
||||
### 3.2 `ninca-common-component-organization`(组织组件)
|
||||
|
||||
| Feign 客户端 | @FeignClient name | @FeignClient url | 当前解析方式 |
|
||||
|-------------|-------------------|-----------------|-------------|
|
||||
| `PersonFeignClient.java` | `ninca-common-component-organization` | **`http://127.0.0.1:33011`**(硬编码) | **绕过 Ribbon,直接 URL** |
|
||||
| `OrganizationFeignClient.java` | `${feign.component-organization.name}` | 无 | Ribbon → `listOfServers=127.0.0.1:33011` |
|
||||
| `LabelFeignClient.java` | `${feign.component-organization.name}` | 无 | Ribbon → 同上 |
|
||||
| `ImageStorePersonFeignClient.java` | `${feign.component-organization.name}` | 无 | Ribbon → 同上 |
|
||||
| `ImageStoreFeignClient.java` | `${feign.component-organization.name}` | 无 | Ribbon → 同上 |
|
||||
| `ApplicationImageStoreFeignClient.java` | `${feign.component-organization.name}` | 无 | Ribbon → 同上 |
|
||||
|
||||
### 3.3 `ninca-common`(公共组件)
|
||||
|
||||
| Feign 客户端 | @FeignClient name | 当前解析方式 |
|
||||
|-------------|-------------------|-------------|
|
||||
| `ZoneFeignClient.java` | `${feign.ninca-common.name:ninca-common}` | Ribbon → **无 `listOfServers` 配置** |
|
||||
| `FileFeign.java` | `${feign.ninca-common.name:ninca-common}` | Ribbon → 同上 |
|
||||
| `SysettingAreaFeignClient.java` | `${feign.ninca-common.name:ninca-common}` | Ribbon → 同上 |
|
||||
|
||||
> **⚠️ `ninca-common` 没有任何 `ribbon.listOfServers` 配置。如果 Feign 调用到达此服务,会因空列表失败。**
|
||||
|
||||
---
|
||||
|
||||
## 4. 配置差异矩阵
|
||||
|
||||
### 4.1 V2 deploy/v2-maven/application.properties(当前验证环境)
|
||||
|
||||
```properties
|
||||
# Feign 服务名映射
|
||||
feign.ninca-crk-std.name=ninca-crk-std
|
||||
feign.component-organization.name=ninca-common-component-organization
|
||||
feign.ninca-common.name=ninca-common
|
||||
|
||||
# ninca-crk-std: 强制 ConfigurationBasedServerList + 静态 IP
|
||||
ninca-crk-std.ribbon.NIWSServerListClassName=com.netflix.loadbalancer.ConfigurationBasedServerList
|
||||
ninca-crk-std.ribbon.listOfServers=10.128.161.95:16106
|
||||
|
||||
# ninca-common-component-organization: 同上
|
||||
ninca-common-component-organization.ribbon.NIWSServerListClassName=com.netflix.loadbalancer.ConfigurationBasedServerList
|
||||
ninca-common-component-organization.ribbon.listOfServers=127.0.0.1:33011
|
||||
|
||||
# ninca-common: ❌ 无任何 Ribbon 配置
|
||||
```
|
||||
|
||||
### 4.2 V2 deploy/v1-legacy/application.properties
|
||||
|
||||
```properties
|
||||
ninca-crk-std.ribbon.NIWSServerListClassName=com.netflix.loadbalancer.ConfigurationBasedServerList
|
||||
ninca-crk-std.ribbon.listOfServers=10.128.161.95:16106
|
||||
# ❌ component-organization 和 ninca-common 无 Ribbon 配置
|
||||
```
|
||||
|
||||
### 4.3 V1 生产(星中心/cw-elevator-application-V1.0.0.20211103/application.properties)
|
||||
|
||||
```properties
|
||||
feign.ninca-crk-std.name=ninca-crk-std
|
||||
feign.component-organization.name=ninca-common-component-organization
|
||||
feign.ninca-common.name=ninca-common
|
||||
ninca-crk-std.ip=10.0.22.102:16106
|
||||
# ❌ 无任何 ribbon.listOfServers 配置
|
||||
# ❌ 无任何 NIWSServerListClassName 配置
|
||||
# ✅ discovery.enabled=false(bootstrap.properties)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. NincaCrkStdRibbonConfiguration 源码删除问题
|
||||
|
||||
| 文件 | 状态 | 位置 |
|
||||
|------|------|------|
|
||||
| `NincaCrkStdRibbonConfiguration.java` | ❌ **源码已删除** | `cw-elevator-application-starter/src/main/java/cn/cloudwalk/ribbon/` 目录存在但为空 |
|
||||
| `NincaCrkStdRibbonConfiguration.class` | ✅ 旧 fat JAR 中残留 | `deploy/v2-maven/cw-elevator-application-2.0.0/BOOT-INF/classes/cn/cloudwalk/ribbon/` |
|
||||
| `ElevatorApplication.java` | ❌ 当前无 `@RibbonClient`/`@RibbonClients` | 启动类已清理 |
|
||||
|
||||
**影响**:新的 V2 构建不再包含 `NincaCrkStdRibbonConfiguration.class`。但由于 `application.properties` 中已有 `NIWSServerListClassName`,功能不受影响——属性配置和 Java 配置实现相同目的。
|
||||
|
||||
**残留风险**:旧 JAR 的 `NincaCrkStdRibbonConfiguration` 被 `@RibbonClients` 引用。如果旧 JAR 中的 `ElevatorApplication.class` 有 `@RibbonClients({@RibbonClient(configuration=NincaCrkStdRibbonConfiguration.class)})`,而新的构建没有这个类,会导致 `ClassNotFoundException`。需要确认 `ElevatorApplication.java` 是在哪个提交中完全移除了 `@RibbonClients`。
|
||||
|
||||
---
|
||||
|
||||
## 6. PersonFeignClient URL 硬编码问题
|
||||
|
||||
Commit: `ff9a9ed6` — `fix(test): hardcode PersonFeignClient url to local stub for test env`
|
||||
|
||||
```java
|
||||
// 改前:
|
||||
@FeignClient(name = "${feign.component-organization.name:ninca-common-component-organization}",
|
||||
path = "/component/person", fallback = PersonFeignClientFallback.class)
|
||||
|
||||
// 改后:
|
||||
@FeignClient(name = "ninca-common-component-organization",
|
||||
url = "http://127.0.0.1:33011",
|
||||
path = "/component/person", fallback = PersonFeignClientFallback.class)
|
||||
```
|
||||
|
||||
**改动影响**:
|
||||
- `name` 从可配置占位符改为硬编码
|
||||
- 新增 `url` 强制指向 `127.0.0.1:33011`(本地 stub)
|
||||
- `name` 硬编码后,即使去掉 `url`,Ribbon 客户端名也不再跟随 `feign.component-organization.name` 配置
|
||||
|
||||
**提交说明**:`NOTE: PersonFeignClient url change is test-only, revert before production`
|
||||
|
||||
---
|
||||
|
||||
## 7. 待排查项
|
||||
|
||||
### P0 — V1 生产服务发现机制
|
||||
|
||||
V1 生产(星中心 `bootstrap.properties`)中 `spring.cloud.consul.discovery.enabled=false`,且没有任何 `ribbon.listOfServers` 配置。需要确认:
|
||||
- V1 生产的 Feign 客户端到 CRK/组织组件的调用路径是否实际被执行?
|
||||
- 如果执行,ConsulServerList 在 `discovery.enabled=false` 下是否仍能从 Consul HTTP API 获取实例?
|
||||
- Spring Cloud Edgware.SR3 的 `ConsulServerList` 在 `discovery.enabled=false` 时到底如何行为?
|
||||
|
||||
**验证方法**:
|
||||
```bash
|
||||
# 在 V1 生产进程上查 Ribbon 实际使用的 ServerList 实现类
|
||||
# 通过 RibbonLoadBalancerProbeRunner 日志(如果有)
|
||||
# 或通过 jstack 分析 Feign 调用链
|
||||
|
||||
# 查看 Spring Cloud Consul 版本(决定 ConsulServerList 行为)
|
||||
unzip -p V1.jar lib/spring-cloud-consul-discovery-*.jar META-INF/MANIFEST.MF
|
||||
```
|
||||
|
||||
### P1 — ninca-common 缺少静态 IP 配置
|
||||
|
||||
`ZoneFeignClient`(空间服务)和 `FileFeign`(文件服务)调用 `ninca-common`,但没有任何 `listOfServers` 配置。需要:
|
||||
- 确认这些 Feign 客户端在 V1/V2 中是否实际被调用
|
||||
- 如果被调用,补充 `ninca-common.ribbon.listOfServers` 配置
|
||||
|
||||
### P1 — PersonFeignClient 生产就绪检查
|
||||
|
||||
- 确认 `url = "http://127.0.0.1:33011"` 在构建生产 JAR 前被 revert
|
||||
- 恢复 `name = "${feign.component-organization.name:ninca-common-component-organization}"` 的占位符形式
|
||||
|
||||
### P2 — NincaCrkStdRibbonConfiguration 清理残留
|
||||
|
||||
- 决定是否恢复源码(保留 Java config 方式)或完全依赖 properties
|
||||
- 如果依赖 properties,删除 `cn/cloudwalk/ribbon/` 空目录清除残留
|
||||
|
||||
### P2 — V1 vs V2 Ribbon 配置对齐文档
|
||||
|
||||
- 在 `docs/architecture/` 中记录 V1 生产的真实服务发现机制
|
||||
- 明确 V2 增加 `ribbon.listOfServers` 的理由和适用范围
|
||||
|
||||
---
|
||||
|
||||
## 8. 关键文件定位
|
||||
|
||||
| 文件 | 路径 |
|
||||
|------|------|
|
||||
| ElevatorApplication.java | `maven-cw-elevator-application/cw-elevator-application-starter/src/main/java/cn/cloudwalk/elevator/ElevatorApplication.java` |
|
||||
| NincaCrkStdRibbonConfiguration.java | **已删除**(旧路径:`.../ribbon/NincaCrkStdRibbonConfiguration.java`) |
|
||||
| RibbonLoadBalancerProbeRunner.java | `.../debug/RibbonLoadBalancerProbeRunner.java` |
|
||||
| ElevatorUpstreamServiceNames.java | `.../debug/ElevatorUpstreamServiceNames.java` |
|
||||
| PersonFeignClient.java | `maven-intelligent-cwoscomponent/intelligent-cwoscomponent-rest/.../person/feign/PersonFeignClient.java` |
|
||||
| V2 deploy application.properties | `maven-cw-elevator-application/deploy/v2-maven/application.properties` |
|
||||
| V2 deploy test properties | `maven-cw-elevator-application/deploy/v2-maven/application-test.properties` |
|
||||
| V1 生产 config | `星中心/cw-elevator-application-V1.0.0.20211103/application.properties` |
|
||||
| V1 生产 bootstrap | `星中心/cw-elevator-application-V1.0.0.20211103/bootstrap.properties` |
|
||||
| V1 fat JAR internal props | `星中心/.../cw-elevator-application-V1.0.0.20211103.jar!/application.properties` |
|
||||
| 提交:清理 @RibbonClients | `373b5501` |
|
||||
| 提交:PersonFeignClient url 硬编码 | `ff9a9ed6` |
|
||||
| 提交:添加 ZK discovery 依赖 | `6b5898d0` |
|
||||
| 提交:添加 NincaCrkStdRibbonConfiguration | `0a6ac955` |
|
||||
| 架构设计文档 | `docs/superpowers/specs/2026-05-01-service-discovery-architecture-design.md` |
|
||||
|
||||
---
|
||||
|
||||
## 9. 附录:Ribbon ServerList 在 Spring Cloud Edgware 中的默认行为
|
||||
|
||||
Spring Cloud Edgware.SR3(V1 使用的版本)中,当 `spring.cloud.consul.discovery.enabled=false`:
|
||||
|
||||
```
|
||||
ConsulDiscoveryClient → @ConditionalOnProperty(enabled=true) → bean NOT created
|
||||
ConsulServerList → depends on DiscoveryClient → can't be created
|
||||
Fallback → ConfigurationBasedServerList (Ribbon 默认)
|
||||
ConfigurationBasedServerList → reads "{name}.ribbon.listOfServers" from Environment
|
||||
if absent → returns empty list
|
||||
```
|
||||
|
||||
V2(Spring Cloud Greenwich + Boot 2.1.x)表现相同。
|
||||
|
||||
如果 V1 生产确实没有 `listOfServers` 但 Feign 调用正常,则说明:
|
||||
1. Feign 客户端的调用链路本身不在 CRK/组织组件上执行,或者
|
||||
2. Spring Cloud Edgware 的 ConsulServerList 在 `discovery.enabled=false` 时有特殊处理(需要反编译 `ConsulServerList.class` 确认),或者
|
||||
3. V1 生产运行时通过其他方式注入了 `listOfServers`(如 Consul KV、`SPRING_APPLICATION_JSON`、JVM 参数)
|
||||
|
||||
---
|
||||
|
||||
*本文档为代码走查与配置分析生成,需结合生产环境运行时证据进一步排查。*
|
||||
Reference in New Issue
Block a user