Files
3x-ui/internal/web/service/inbound_node_ips.go
T
Sanaei adc64bb804 fix(nodes): cloned-node attribution, node-hosted client display (online/speed/counts), and sync robustness (#5488)
* fix(nodes): keep cloned nodes (shared panelGuid) in separate attribution buckets

#4983 keys online/inbound attribution by panelGuid, assuming it is globally unique. Cloned node servers ship an identical panelGuid in their copied settings, so the master collapsed several physical nodes into one bucket: GetMergedNodeTrees merged their online sets under one key and every inbound on those nodes (same origin_node_guid) read that merged set, so the inbound page showed online cross-attributed and counts inflated.

Fall back to the node-unique synthNodeGuid(node.Id) whenever a node's panelGuid is shared by another of the master's direct nodes. Applied consistently at originGuidFor (origin_node_guid write), the online-tree key plus a self-key remap for nodes that report a GUID-keyed tree, effectiveNodeGuid, and recountByGuid's inbound bucketing. sharedNodeGuids computes the collision set. Online now works without node changes; making panelGuids unique restores real-GUID identity and also fixes GUID-keyed IP attribution.

* fix(nodes): extend duplicate-GUID hardening to master collisions, IP attribution, and a heartbeat warning

Builds on the node-vs-node fix: a node's GUID is now also treated as ambiguous when it equals the master's own panelGuid (a node cloned from the master), so the master's local clients and that node can't merge. Centralized as ambiguousNodeGuids(nodes, selfGuid) + effectiveNodeKey(node).

Applied the same node-unique fallback to the GUID-keyed IP attribution that #4983 added but the prior commit left collapsing: MergeClientIpsByGuid remaps a cloned node's own subtree to its node-unique key, nodeGuidNameMap resolves names by that key, and node deletion purges both keys. Added a throttled heartbeat warning so the operator is told to regenerate a duplicate panelGuid. Tests cover master-collision, effectiveNodeKey, and the IP remap.

* fix(node-sync): log the client-IP-attribution 404 once per node, not every cycle

Old-build nodes lack panel/api/clients/clientIpsByGuid and answer 404 on every IP-sync cycle (~10s), which floods the debug log now that the IP phase actually runs. Note the missing endpoint once per node (re-armed if the node later recovers or is upgraded) and keep logging genuine fetch errors.

* fix(nodes): remap a cloned node's own-panelGuid origin so the inbound page shows online

These nodes report their OWN inbounds with their own panelGuid as OriginNodeGuid, so originGuidFor returned the shared GUID verbatim and never remapped it. origin_node_guid stayed the shared GUID while online was keyed under the node-unique key, so the inbound page (which reads the stored origin_node_guid) looked up an empty bucket and showed everyone offline — even though the Nodes page (which derives the key live) was correct. Treat an origin equal to the node's own panelGuid as the node's own inbound and resolve it through selfKey; keep only a genuinely different (descendant) origin across hops.

* fix(node-sync): don't delete a node's central inbounds when its snapshot is empty

The central-inbound sweep deletes any central inbound whose tag is absent from the node's snapshot, with no guard for an empty snapshot. A node mid-restart or with a transient DB error (e.g. Postgres 57P01) can return an empty inbound list with success=true, which wiped all of that node's central inbounds and their clients (and reset traffic history on re-create) — observed on the Germany node: 0 clients but still 44 online (online survives because it comes from the snapshot's online tree, not the central inbound). Skip the sweep entirely when the snapshot reports zero inbounds; a real per-inbound deletion still sweeps via a non-empty snapshot that omits one tag.

* fix(email): stay silent when SMTP notifications are disabled

The event subscriber is registered unconditionally and only checked the per-event list (smtpEnabledEvents, default login.attempt,cpu.high) — not the smtpEnable master toggle. Login events are always published, so a panel with smtpEnable=false still attempted a send on every login and logged 'email subscriber: send failed: smtp host not configured'. Gate HandleEvent on GetSmtpEnable() so a disabled-SMTP panel does nothing, matching the comment where the subscriber is registered.

* fix(nodes): count only expired/exhausted as 'ended', not disabled clients

The per-node depleted (ended) count folded disabled clients in with expired/exhausted (expired || exhausted || !Enable), so the Nodes page 'ended' chip was inflated and inconsistent with the inbound page, where disabled and depleted are separate buckets. Count only expired/exhausted in both GetAll and recountByGuid so 'ended' means the same thing on both pages.

* feat(nodes): show live speed for node-hosted inbounds

Inbound speed is computed on the dashboard from a 'traffics' delta feed, which only the local Xray poll produced — so node-hosted inbounds showed no speed. The node sync now diffs successive per-inbound cumulative totals (it polls @5s, same as the local poll) and broadcasts the byte deltas as a separate 'nodeTraffics' field, keyed by the central tag the dashboard already matches. The frontend applies 'traffics' to local inbounds and 'nodeTraffics' to node inbounds within their own scope, so the two 5s polls don't clobber each other and idle inbounds still clear. Deltas clamp to 0 on a reset; a node that fails to sync keeps a stale total so its delta is 0 (no phantom speed).

* fix(nodes): normalize node-inbound speed by elapsed time to avoid recovery spikes

Adversarial review found that a node's cumulative inbound counter keeps climbing while the master can't reach it, so the first delta after a gap (node outage, skipped poll, slow node) spans more than one 5s window but was still divided by the dashboard's fixed 5s — rendering an impossible one-tick speed spike on recovery (and a 2x over-report after a skipped poll). Now each delta is normalized to the fixed window using the real elapsed time since the inbound's counter last changed, so a backlog shows the true average rate over the gap. The change timestamp advances only on actual movement, so idle stretches average correctly when traffic resumes; resets rebaseline. Also moves the maybePushGlobals doc comment back onto its function.

* fix(inbounds): keep last speed across page navigation instead of blanking

Speed is delta-derived, so it can't be recomputed until the first poll after mount. The websocket subscription and speed state are page-scoped (useWebSocket lives in InboundsPage), so leaving to another page and returning blanked the Speed column for up to one 5s poll. Cache the last speed map across mounts (module scope, 15s recency guard) and seed the state from it, so returning shows the last throughput immediately and the next poll refreshes it. Applies to both local and node-hosted inbound speed.

* fix(inbounds): rebalance table column widths so it fills width without gaps

Inbound list columns had small fixed widths summing far below the table's
full width, so AntD spread the leftover space evenly into wide empty gaps.
Widen the content-heavy columns (protocol, clients, traffic, node) so the
slack lands there, keep the small ones (id, port, enable) tight, and make
scroll.x track the visible columns' total so the table never collapses
below content and adapts when conditional columns are hidden.

* feat(nodes): show active/disabled client counts on the nodes page like inbounds

The nodes page only showed total/online/ended, and (since ended now excludes disabled) disabled clients were invisible there. Compute per-node active and disabled counts — in both GetAll and recountByGuid, with the same depleted-wins-over-disabled precedence the inbound page uses so the buckets stay mutually exclusive — and render total/active/disabled/ended/online chips matching the inbound page (table column + mobile stats modal).

* fix(nodes): count active/disabled/ended by client email, not stale inbound_id

The per-node client breakdown filtered client_traffics by inbound_id, but that column goes stale after an inbound is delete+recreated (e.g. the Germany node), so almost every traffic row pointed at a dead inbound id and the counts collapsed — active showed ~5 instead of ~1100. Classify each node client via client_inbounds -> clients joined to client_traffics by EMAIL (the reliable key), deduped per node/guid, in both GetAll and recountByGuid. Now active/disabled/ended on the nodes page match the inbound page. Added a regression test that proves matching works with a deliberately stale inbound_id.

* style(nodes): widen Clients column so the count chips fit one tidy line

After adding the active/disabled chips, the 5 chips (total/active/disabled/ended/online) no longer fit the 160px Clients column and wrapped to two lines. Widen it to 220 and drop the Space wrap so they render on a single line like the inbound page, and zero the total tag's margin for even spacing. Same principle as 79ff283 (give the content column enough width).

* style(nodes): tighten Clients chip spacing to match the inbound page

AntD's default tag side-padding (~8px) put a wide gap between the count chips. Apply the inbound page's compact padding ('0 2px') + client-count-tag (tabular-nums) to each chip and narrow the column to 180 so the numbers sit close together like the inbound list instead of floating apart.
2026-06-22 20:20:55 +02:00

294 lines
9.6 KiB
Go

package service
import (
"encoding/json"
"sort"
"time"
"github.com/mhsanaei/3x-ui/v3/internal/database"
"github.com/mhsanaei/3x-ui/v3/internal/database/model"
"gorm.io/gorm/clause"
)
// node_client_ips.go implements per-node client-IP attribution. The flat
// inbound_client_ips table is a cluster-wide union (used for IP-limit counting
// and pushed back to every node), so it cannot tell which node a given IP is
// on. NodeClientIp keeps that attribution: each panel records its own Xray
// observations under its panelGuid, and the master merges every node's
// guid-keyed report — never mixing in IPs a parent pushed down.
// mergeModelClientIpEntries unions old and incoming observations, drops anything
// older than cutoff, keeps the newest timestamp per IP, and sorts newest-first.
// It mirrors mergeClientIpEntries but operates on the exported wire type.
func mergeModelClientIpEntries(old, incoming []model.ClientIpEntry, cutoff int64) []model.ClientIpEntry {
ipMap := make(map[string]int64, len(old)+len(incoming))
for _, e := range old {
if e.IP == "" || e.Timestamp < cutoff {
continue
}
ipMap[e.IP] = e.Timestamp
}
for _, e := range incoming {
if e.IP == "" || e.Timestamp < cutoff {
continue
}
if cur, ok := ipMap[e.IP]; !ok || e.Timestamp > cur {
ipMap[e.IP] = e.Timestamp
}
}
out := make([]model.ClientIpEntry, 0, len(ipMap))
for ip, ts := range ipMap {
out = append(out, model.ClientIpEntry{IP: ip, Timestamp: ts})
}
sort.Slice(out, func(i, j int) bool { return out[i].Timestamp > out[j].Timestamp })
return out
}
// upsertNodeClientIps folds a guid's per-email observations into NodeClientIp,
// merging with whatever is already stored for that (guid, email) and dropping
// stale entries. Empty merged results delete the row so the table stays bounded.
func upsertNodeClientIps(guid string, perEmail map[string][]model.ClientIpEntry) error {
if guid == "" || len(perEmail) == 0 {
return nil
}
db := database.GetDB()
cutoff := time.Now().Unix() - clientIpStaleAfterSeconds
var existing []model.NodeClientIp
if err := db.Where("node_guid = ?", guid).Find(&existing).Error; err != nil {
return err
}
existingByEmail := make(map[string]*model.NodeClientIp, len(existing))
for i := range existing {
existingByEmail[existing[i].Email] = &existing[i]
}
tx := db.Begin()
defer func() {
if r := recover(); r != nil {
tx.Rollback()
}
}()
for email, incoming := range perEmail {
if email == "" {
continue
}
var old []model.ClientIpEntry
if cur, ok := existingByEmail[email]; ok && cur.Ips != "" {
_ = json.Unmarshal([]byte(cur.Ips), &old)
}
merged := mergeModelClientIpEntries(old, incoming, cutoff)
if len(merged) == 0 {
// Nothing fresh: drop any stale row so attribution doesn't linger.
if _, ok := existingByEmail[email]; ok {
if err := tx.Where("node_guid = ? AND email = ?", guid, email).
Delete(&model.NodeClientIp{}).Error; err != nil {
tx.Rollback()
return err
}
}
continue
}
b, _ := json.Marshal(merged)
row := model.NodeClientIp{NodeGuid: guid, Email: email, Ips: string(b)}
if err := tx.Clauses(clause.OnConflict{
Columns: []clause.Column{{Name: "node_guid"}, {Name: "email"}},
DoUpdates: clause.AssignmentColumns([]string{"ips"}),
}).Create(&row).Error; err != nil {
tx.Rollback()
return err
}
}
return tx.Commit().Error
}
// RecordLocalClientIps stores this panel's own Xray observations under its
// panelGuid. Called by check_client_ip_job each scan with the live per-email IPs
// the local core reported.
func (s *InboundService) RecordLocalClientIps(panelGuid string, observed map[string][]model.ClientIpEntry) error {
return upsertNodeClientIps(panelGuid, observed)
}
// MergeClientIpsByGuid folds a node's guid-keyed attribution report (its own
// panelGuid subtree plus any descendants) into the local table, preserving which
// physical node each IP is on across a chain. When node is non-nil and its own
// panelGuid is ambiguous (shared with another node or the master — a cloned
// server), the node's own subtree is remapped to its node-unique key so two
// clones don't collapse into one attribution row; descendant subtrees keep their
// distinct GUIDs. A nil node merges the report verbatim.
func (s *InboundService) MergeClientIpsByGuid(node *model.Node, trees map[string]map[string][]model.ClientIpEntry) error {
if node != nil && node.Guid != "" {
if eff := effectiveNodeKey(node); eff != node.Guid {
if sub, ok := trees[node.Guid]; ok {
delete(trees, node.Guid)
if existing, ok := trees[eff]; ok {
for email, ips := range sub {
existing[email] = append(existing[email], ips...)
}
} else {
trees[eff] = sub
}
}
}
}
for guid, perEmail := range trees {
if err := upsertNodeClientIps(guid, perEmail); err != nil {
return err
}
}
return nil
}
// GetClientIpsByGuid returns this panel's full attribution subtree (guid -> email
// -> fresh IPs), dropping stale entries. It is what the clientIpsByGuid endpoint
// serves to a parent panel.
func (s *InboundService) GetClientIpsByGuid() (map[string]map[string][]model.ClientIpEntry, error) {
db := database.GetDB()
var rows []model.NodeClientIp
if err := db.Find(&rows).Error; err != nil {
return nil, err
}
cutoff := time.Now().Unix() - clientIpStaleAfterSeconds
out := make(map[string]map[string][]model.ClientIpEntry)
for _, row := range rows {
if row.NodeGuid == "" || row.Email == "" || row.Ips == "" {
continue
}
var entries []model.ClientIpEntry
if err := json.Unmarshal([]byte(row.Ips), &entries); err != nil {
continue
}
fresh := mergeModelClientIpEntries(nil, entries, cutoff)
if len(fresh) == 0 {
continue
}
if out[row.NodeGuid] == nil {
out[row.NodeGuid] = make(map[string][]model.ClientIpEntry)
}
out[row.NodeGuid][row.Email] = fresh
}
return out, nil
}
// GetClientIpNodeAttribution returns, for one client email, a map of IP -> the
// guid that most recently observed it (within the stale window). Used to label
// each IP in the panel with the node it is connecting to.
func (s *InboundService) GetClientIpNodeAttribution(email string) (map[string]string, error) {
db := database.GetDB()
var rows []model.NodeClientIp
if err := db.Where("email = ?", email).Find(&rows).Error; err != nil {
return nil, err
}
cutoff := time.Now().Unix() - clientIpStaleAfterSeconds
ipGuid := make(map[string]string)
ipTs := make(map[string]int64)
for _, row := range rows {
if row.NodeGuid == "" || row.Ips == "" {
continue
}
var entries []model.ClientIpEntry
if err := json.Unmarshal([]byte(row.Ips), &entries); err != nil {
continue
}
for _, e := range entries {
if e.IP == "" || e.Timestamp < cutoff {
continue
}
if cur, ok := ipTs[e.IP]; !ok || e.Timestamp > cur {
ipTs[e.IP] = e.Timestamp
ipGuid[e.IP] = row.NodeGuid
}
}
}
return ipGuid, nil
}
// ClientIpInfo is one IP shown in the panel's per-client IP log, labelled with
// the node it is connecting through ("" = this local panel).
type ClientIpInfo struct {
IP string `json:"ip"`
Time string `json:"time"`
Node string `json:"node"`
}
// GetClientIpsWithNodes returns a client's recorded IPs (from the flat
// inbound_client_ips display set) annotated with the node each IP is on, using
// the per-node attribution table. Local IPs (and any IP without attribution)
// carry an empty Node.
func (s *InboundService) GetClientIpsWithNodes(email string) ([]ClientIpInfo, error) {
raw, err := s.GetInboundClientIps(email)
if err != nil || raw == "" {
// Record-not-found (or empty) is "no IPs", not an error for the UI.
return []ClientIpInfo{}, nil
}
var entries []model.ClientIpEntry
if jerr := json.Unmarshal([]byte(raw), &entries); jerr != nil || len(entries) == 0 {
// Legacy shape: a plain JSON array of IP strings.
var oldIps []string
if json.Unmarshal([]byte(raw), &oldIps) == nil {
entries = entries[:0]
for _, ip := range oldIps {
entries = append(entries, model.ClientIpEntry{IP: ip})
}
}
}
if len(entries) == 0 {
return []ClientIpInfo{}, nil
}
attr, _ := s.GetClientIpNodeAttribution(email)
guidName := s.nodeGuidNameMap()
localGuid, _ := (&SettingService{}).GetPanelGuid()
out := make([]ClientIpInfo, 0, len(entries))
for _, e := range entries {
if e.IP == "" {
continue
}
info := ClientIpInfo{IP: e.IP}
if e.Timestamp > 0 {
info.Time = time.Unix(e.Timestamp, 0).Local().Format("2006-01-02 15:04:05")
}
if guid, ok := attr[e.IP]; ok && guid != "" && guid != localGuid {
info.Node = guidName[guid]
}
out = append(out, info)
}
return out, nil
}
// nodeGuidNameMap maps each known node's attribution key to its display name,
// keyed by effectiveNodeGuid so a cloned node's IPs (stored under its node-unique
// key) still resolve to the right name instead of colliding under a shared GUID.
func (s *InboundService) nodeGuidNameMap() map[string]string {
db := database.GetDB()
var nodes []model.Node
if err := db.Model(&model.Node{}).Find(&nodes).Error; err != nil {
return map[string]string{}
}
ptrs := make([]*model.Node, len(nodes))
for i := range nodes {
ptrs[i] = &nodes[i]
}
selfGuid, _ := (&SettingService{}).GetPanelGuid()
ambiguous := ambiguousNodeGuids(ptrs, selfGuid)
m := make(map[string]string, len(nodes))
for i := range nodes {
m[effectiveNodeGuid(&nodes[i], ambiguous)] = nodes[i].Name
}
return m
}
// DeleteNodeClientIpsByGuid removes all attribution rows for a guid (e.g. when a
// node is deleted) so its IPs stop being reported and counted.
func (s *InboundService) DeleteNodeClientIpsByGuid(guid string) error {
if guid == "" {
return nil
}
db := database.GetDB()
return db.Where("node_guid = ?", guid).Delete(&model.NodeClientIp{}).Error
}