mirror of
https://github.com/hpd840321/starRiverProperty.git
synced 2026-06-09 08:20:31 +08:00
7b2bd307f1
- backend/: 13 Maven modules (cw-elevator-application, cloudwalk-cloud, intelligent-cwoscomponent, ninca-crk, etc.) - frontend/: 4 Vue projects (elevator-front, cwos-portal, alarm-front, front_acs) + decompiled + scripts - scripts/: build, test-env, tools (Docker Compose, service templates, API parity) - docs/: AGENTS.md, superpowers specs, architecture docs - .gitignore: standard Java/Maven exclusions Moved from legacy maven-*/ root layout to backend/ organized structure.
110 lines
9.5 KiB
Markdown
110 lines
9.5 KiB
Markdown
# Consul 可访问且「有清单」仍拿不到 IP/端口 — 走查结果(证据执行)
|
||
|
||
**证据根**:[`maven-cw-elevator-application/logs/evidence/`](../../maven-cw-elevator-application/logs/evidence/)
|
||
**主日志**:[`elevator-app.log`](../../maven-cw-elevator-application/logs/evidence/elevator-app.log)
|
||
**配置探针**:[`elevator-app-probe.log`](../../maven-cw-elevator-application/logs/evidence/elevator-app-probe.log)
|
||
**Consul 快照(示例)**:[`elevator-evidence-20260430-113912/consul-health-ninca-common-component-organization.json`](../../maven-cw-elevator-application/logs/evidence/elevator-evidence-20260430-113912/consul-health-ninca-common-component-organization.json)
|
||
|
||
本文对应走查计划的四项核对:**客户端名**、**Ribbon 初始化时序**、**ConsulServerList vs ConfigurationBased**、**bootstrap 多源合并**。结论与 [`elevator-evidence-v1-v2-diff-20260430.md`](elevator-evidence-v1-v2-diff-20260430.md) §6–§10、[`elevator-service-instance-missing-investigation.md`](elevator-service-instance-missing-investigation.md) 一致并细化到可 grep 的行级证据。
|
||
|
||
### 本服务是否注册到 Consul、与探针日志
|
||
|
||
- **拉不到上游 IP** 常见原因是 **Ribbon 客户端名 / listOfServers / 时序**(见下文),**不等价于**本机未注册;但仍应单独核实 **本进程** 在 Consul 的登记名是否与 `spring.application.name` 一致、健康是否 **passing**。
|
||
- 诊断探针 **默认始终运行**(无 `elevator.*.probe` 开关);Consul HTTP 首次延迟见源码 [`ElevatorProbeConstants`](../../maven-cw-elevator-application/cw-elevator-application-starter/src/main/java/cn/cloudwalk/elevator/debug/ElevatorProbeConstants.java)。[`ConsulUpstreamHealthProbeRunner`](../../maven-cw-elevator-application/cw-elevator-application-starter/src/main/java/cn/cloudwalk/elevator/debug/ConsulUpstreamHealthProbeRunner.java) 会请求本服务 **`/v1/health/service/<spring.application.name>`**(`passing=true` 与全量各一遍,并 **逐实例** DEBUG)、上游同名列表,以及 **`/v1/agent/self`**。
|
||
- [`RibbonLoadBalancerProbeRunner`](../../maven-cw-elevator-application/cw-elevator-application-starter/src/main/java/cn/cloudwalk/elevator/debug/RibbonLoadBalancerProbeRunner.java) 在同一延迟窗口输出各 client 的 **`ServerList` 实现类**、`ILoadBalancer`、**`DiscoveryClient.getInstances`**(本服务 + `ElevatorUpstreamServiceNames`)明细。
|
||
- [`logback.xml`](../../maven-cw-elevator-application/cw-elevator-application-starter/src/main/resources/logs/logback.xml) 将 **`org.springframework.cloud.consul`**、**`com.netflix.loadbalancer`**、**`org.springframework.cloud.client.discovery`** 与 **`cn.cloudwalk.elevator.debug`** 均按 **DEBUG** **双写**到 **`${logging.file}-probe.log`**(与自建 `consulProbe` / `ribbonProbe` / `discoveryProbe` 及框架行对照)。
|
||
|
||
---
|
||
|
||
## 1. 报错栈中的 Ribbon client 名 vs Consul 注册名
|
||
|
||
**核对方法**:在出错 JVM 日志中搜索 **`Load balancer does not have available server for client:`** 或 **`for client:`**,将后缀字符串与 Consul UI / `/v1/health/service/<name>` 的 **Service** 名 **逐字**比对(含 `ninca-common-` 前缀)。
|
||
|
||
### 本仓库证据包结论
|
||
|
||
| 项目 | 结论 |
|
||
|------|------|
|
||
| `for client:` 在本份 `elevator-app.log` 中 | **仅出现** `ninca-crk-std`(大量重复),**未出现** `component-organization` |
|
||
| Feign 逻辑名 `component-organization` | `feign.component-organization.name=ninca-common-component-organization`(探针:`file:` 与 `classpath:` 均为该值) |
|
||
| Ribbon 实际客户端名 | `Client: ninca-common-component-organization`,`ConsulServerList{serviceId='ninca-common-component-organization'}` |
|
||
| Consul | `ninca-common-component-organization` 健康快照为 **3** 个 passing 实例,与日志中三台 `:17016` 一致 |
|
||
|
||
**与人对齐时的表述**:若微信群截图写 **`for client: component-organization`**,而本电梯进程配置将 Feign **解析服务名**设为 **`ninca-common-component-organization`**,则二者 **不是** 同一 Ribbon client;对方环境若在 **`component-organization`** 名下查实例会得到 **0 台**,即使 Consul 里注册的是 **`ninca-common-component-organization`**。结论:**先看报错里的精确 client 字符串**,再对 Consul 注册名。
|
||
|
||
---
|
||
|
||
## 2. DynamicServerListLoadBalancer 初始化与 `Servers=[]` 时间序
|
||
|
||
**核对方法**:对关心的 client 搜索 **`DynamicServerListLoadBalancer for client <name> initialized`**,并向上查看同一 client 的 **`current list of Servers=[...]`**;若在列表仍为空窗口内发生首次 Feign 调用,会抛 **no available server**。
|
||
|
||
### `ninca-common-component-organization`(Consul 路径)
|
||
|
||
| 时间(日志) | 事件 |
|
||
|--------------|------|
|
||
| `11:16:33.087` | `Client: ninca-common-component-organization`,首次 `current list of Servers=[]`,`ServerList:null` |
|
||
| `11:16:33.097` | `DynamicServerListLoadBalancer for client ninca-common-component-organization initialized`,已为 **3 台** `:17016`,`ConsulServerList{serviceId='ninca-common-component-organization'}` |
|
||
|
||
约 **10ms** 内由空列表变为三台;若首次 RPC 落在此窗口仍可能失败(计划中的「首轮请求早于填充」风险)。
|
||
|
||
### `ninca-crk-std`(ConfigurationBased 路径)
|
||
|
||
| 时间(日志) | 事件 |
|
||
|--------------|------|
|
||
| `11:16:34.109` | `Client: ninca-crk-std`,`Servers=[]`,`ServerList:null` |
|
||
| `11:16:34.111` | `DynamicServerListLoadBalancer for client ninca-crk-std initialized`,**仍为** `Servers=[]`,`ServerList:ConfigurationBasedServerList` |
|
||
| `11:16:34.346` 起 | 连续 **`no available server for client: ninca-crk-std`** |
|
||
|
||
本证据包中 **no-server 全部归因于 `ninca-crk-std`**,且发生在 **`ninca-common-component-organization` 已填充三台之后**,说明主因不是「Consul 查错 organization 名」,而是 **`ninca-crk-std` 的静态 Ribbon 列表未配置**(见下节)。
|
||
|
||
同一全量日志内 **第二次启动**(约 `11:43`)后 `ninca-crk-std` 可出现非空列表,详见 [`elevator-evidence-v1-v2-diff-20260430.md`](elevator-evidence-v1-v2-diff-20260430.md) §6.2。
|
||
|
||
---
|
||
|
||
## 3. ConsulServerList 与 ConfigurationBasedServerList;`ninca-crk-std` 与 `listOfServers` / `ip`
|
||
|
||
| 客户端 | ServerList 实现 | 列表来源 |
|
||
|--------|-----------------|----------|
|
||
| `ninca-common-component-organization` | `ConsulServerList` | Consul `serviceId=ninca-common-component-organization` |
|
||
| `ninca-crk-std` | `ConfigurationBasedServerList` | Ribbon 属性(如 `ninca-crk-std.ribbon.listOfServers`),**不**自动使用 `ninca-crk-std.ip` |
|
||
|
||
代码侧:[`NincaCrkStdRibbonConfiguration`](../../maven-cw-elevator-application/cw-elevator-application-starter/src/main/java/cn/cloudwalk/ribbon/NincaCrkStdRibbonConfiguration.java) 为 `ninca-crk-std` 固定注册 `ConfigurationBasedServerList`。
|
||
|
||
**探针(`elevator-app-probe.log`):**
|
||
|
||
- `ninca-crk-std.ribbon.listOfServers value=null`
|
||
- `ninca-crk-std.ip=10.0.22.102:16106`(`file:` / `classpath:` 一致)
|
||
|
||
因此即 Consul 上 `ninca-crk-std` **passing=3**,Ribbon 仍可能 **`Servers=[]`**,与 Consul 是否正常无关,除非另行配置 **`ribbon.listOfServers`** 或改变 ServerList 策略。部署模板对照:V1 legacy 含显式 `ninca-crk-std.ribbon.listOfServers` 段;V2 模板以 `ninca-crk-std.ip` 为主,见 [`deploy/v1-legacy/application.properties`](../../maven-cw-elevator-application/deploy/v1-legacy/application.properties) 与 [`deploy/v2-maven/application.properties`](../../maven-cw-elevator-application/deploy/v2-maven/application.properties)。
|
||
|
||
---
|
||
|
||
## 4. `file:./bootstrap` 与 `classpath:/bootstrap` 对 Consul 的合并结果
|
||
|
||
**来源**:[`elevator-app-probe.log`](../../maven-cw-elevator-application/logs/evidence/elevator-app-probe.log) 中 `ConfigSourceProbeRunner` 行。
|
||
|
||
| 键 | Environment 合并结果 | `file:./bootstrap.properties` | `classpath:/bootstrap.properties` |
|
||
|----|------------------------|--------------------------------|-------------------------------------|
|
||
| `spring.cloud.consul.host` | `10.0.22.102` | `10.0.22.102` | 主机名(与磁盘不一致) |
|
||
| `spring.cloud.consul.discovery.enabled` | `true` | `true` | `false` |
|
||
|
||
**走查要点**:以 **`spring.application.*` 合并后的 `value=`** 为准核对「进程连的 Consul」是否与运维在浏览器里看的地址一致;磁盘与 jar 内嵌 bootstrap **并存**时,以最终 Environment 为准(本样本下 host 为 **10.0.22.102**、发现 **启用**)。证据目录内亦有 [`bootstrap.properties`](../../maven-cw-elevator-application/logs/evidence/elevator-evidence-20260430-113912/bootstrap.properties) 快照可与现场比对。
|
||
|
||
---
|
||
|
||
## 5. 建议的可复现 grep(本地)
|
||
|
||
```bash
|
||
# 锁定所有「for client:」后的客户端名(去重)
|
||
grep -E 'for client:' maven-cw-elevator-application/logs/evidence/elevator-app.log | sed -n 's/.*for client: \([^ ]*\).*/\1/p' | sort -u
|
||
|
||
# 各客户端初始化与非空列表
|
||
grep 'DynamicServerListLoadBalancer for client' maven-cw-elevator-application/logs/evidence/elevator-app.log
|
||
|
||
# 区分 Consul 与配置型列表
|
||
grep -E 'ConsulServerList|ConfigurationBasedServerList' maven-cw-elevator-application/logs/evidence/elevator-app.log
|
||
```
|
||
|
||
---
|
||
|
||
**延伸阅读**:[`elevator-v1-v2-init-timing-config-audit.md`](elevator-v1-v2-init-timing-config-audit.md)(初始化顺序与探针)、[`elevator-evidence-v1-v2-diff-20260430.md`](elevator-evidence-v1-v2-diff-20260430.md) §9(`listOfServers` vs `ninca-crk-std.ip`)。
|