The setting subcommand registers the -webCert and -webCertKey flags but
the "setting" case only calls updateSetting(), which ignores cert paths.
The flags were silently accepted and discarded, so a fresh panel stayed
HTTP-only (no webCertFile/webKeyFile written, "Panel is not secure with
SSL", browser ERR_SSL_PROTOCOL_ERROR). updateCert() was reachable only
through the separate "cert" case.
Call updateCert(webCertFile, webKeyFile) inside the "setting" case when
either flag is set, mirroring the "cert" subcommand. saveSetting() already
upserts, so this works on a fresh DB.
Co-authored-by: taov.rustam <taov.rustam@rwb.ru>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
The IP-sync phase shared a single 4s context with the traffic-snapshot fetch that runs before it. On high-latency nodes the snapshot's round-trips drained that budget, so FetchAllClientIps/PushAllClientIps/FetchClientIpsByGuid failed with 'context deadline exceeded' every cycle, silently breaking cross-node client-IP sync. Give the phase its own fresh context (nodeClientIpSyncTimeout=6s), mirroring maybePushGlobals.
Also convert node-name log lines to Warningf/Debugf: fmt.Sprint inserts no space between adjacent string args, so messages rendered as 'push client ips toUS1failed:'.
cron: SkipIfStillRunning stops a slow 5s/10s job from overlapping itself and racing the shared xrayAPI (grpc conn leak) and the StatsLastValues map (fatal concurrent map write). memlimit: auto-detect a Go soft memory limit from XUI_MEMORY_LIMIT, the cgroup limit, or system RAM (about 90 percent); opt-in pprof via XUI_PPROF. tgbot: userStates now goes through a mutex-guarded store with TTL pruning (was raced by worker-pool and delayed-delete goroutines). check_client_ip: prefilter inbounds by settings LIKE limitIp instead of loading and JSON-parsing all of them every scan. minor: prune StatsLastValues, RateLimiter.lastSent, reportedRemoteTagConflict. docker-compose: document the memory knobs.
* fix(node): never re-add a node's full counter on reset/restart (#5456, #5476, #5390)
When a node's per-client counter dips below the master's stored baseline
(node reboot, xray restart, or a reset propagated to the node), the delta
accounting clamped delta to the node's whole current counter and re-added it
to the master total — double-counting a client's lifetime usage in a single
sync and often pushing them over quota. Treat a backward-moving counter as a
reset: add 0 and rebaseline to the reported value, so only genuine post-reset
usage accrues.
Resets also now clear the per-node NodeClientTraffic baseline (ResetClient
TrafficByEmail, resetClientTrafficLocked, BulkResetTraffic, resetAllClient
TrafficsLocked), mirroring the delete paths. Without this the node's pre-reset
cumulative — including traffic it had counted but not yet synced — leaks back
onto the master after a reset, which is the 'reset reverts after a while'
report. The next sync then takes the clean delta=0 + rebaseline path regardless
of node state.
Updates TestNodeCounterReset (was _Clamped, now _NoReAdd) to assert rebaseline
instead of re-add, and adds TestCentralResetClearsNodeBaseline_NoLeak.
* fix(inbound): keep persisted node share strategy on edit (#5375)
Opening the edit modal silently reverted shareAddrStrategy from 'node' to
'listen'. The downgrade effect fires before the form settles: availableNodes
is an empty placeholder until /nodes/list resolves, and Form.useWatch('protocol')
is briefly empty on the first edit render — both transiently make the node
option look unavailable, so the effect clobbered the saved value.
Gate the downgrade on availableNodesFetched (threaded from useNodesQuery through
InboundsPage) and on the protocol watch being settled, so a persisted strategy
is only downgraded when the node option is genuinely unavailable. Adds a
rerender-based regression test covering the nodes-loading race.
* <3
* perf(traffic): skip cross-panel quota subquery when no globals exist (#5392, #5389)
disableInvalidClients ran a correlated EXISTS against client_global_traffics
on the full client_traffics table every 5s. On a panel no master pushes to,
that table is empty so the subquery can never match — yet it forced a full
scan that pegged Postgres at 100% CPU on large client counts. Probe the table
first and drop the EXISTS branch when it's empty (the common case), and add an
idx_client_global_email index so the subquery is an index lookup when globals
are present. Cross-panel enforcement is unchanged (TestGlobalUsage_DisablesClient).
This also relieves #5389 ('traffic writer queue full' / panel freeze): the
heavy query runs inside the serialized traffic write, so a slow DB backs the
shared writer queue up until request handlers block.
* fix(sub): don't advertise a leaked client IP for local wildcard inbounds (#5425)
For a local inbound with no node, no custom share address, and a wildcard/blank
listen, resolveInboundAddress fell straight through to the subscriber's request
host. Behind NAT/proxy/CDN that Host can be the requesting client's own IP, so
the subscription wrote the client's address into the inbound instead of the
server's — while the panel's own share link (which doesn't use the request host)
stayed correct.
Prefer the admin's configured public host (Sub/Web domain) over the raw request
host for this last-resort fallback. With no configured host the request host
still stands, so existing single-domain setups are unaffected.
Add three client-management actions to the Clients page More menu:
- Delete unattached clients: removes every client with no inbound
attachment, cascading its traffic rows, IP log, and external links
(POST /clients/delOrphans).
- Export clients: shows the {client, inboundIds} list in a read-only
CodeMirror viewer with copy/download (GET /clients/export returns the
array in the standard envelope).
- Import clients: pastes that JSON into an editable CodeMirror editor,
mirroring Import an Inbound (POST /clients/import takes a { data }
body). Attached clients go through the create-and-attach path; items
with no inboundIds are restored as bare records; existing emails are
never overwritten and are reported as skipped.
Document the new endpoints in api-docs and translate the new strings
into all supported languages.
GetRemoteCertHash shelled out to 'xray tls ping' and scraped its stdout, which swallowed the real failure (a refused dial surfaced only as 'no certificate hash found'). Replace it with a native uTLS Chrome handshake: dial/handshake errors now surface verbatim, host:port is honoured, and the leaf is taken from PeerCertificates[0] so IP-only self-signed certs (no DNS SANs) hash correctly. Mirrors alireza0/x-ui@1372ad0 without its nil-leaf panic.
The pin-from-remote button passed only the SNI to 'xray tls ping', which defaults to :443 — so it never reached a self-hosted inbound on another port and failed with a vague 'no certificate hash found'. Append the inbound's port when the SNI carries none, and surface the underlying ping failure (dial refused, timeout) in the error.
Certs without an OCSP responder URL (e.g. Let's Encrypt, which dropped OCSP in 2025) made xray log 'ignoring invalid OCSP: no OCSP server specified in cert' on every refresh. Default the per-cert ocspStapling interval to 0 (disabled) so new inbounds stay quiet; the field is kept for certs that do support stapling.
Resolve CodeQL go/path-injection (alert #96): the certFile path from
the getCertHash endpoint flowed straight into os.ReadFile, letting an
authenticated request read arbitrary files by path. Validate it against
an allow-list of certificate files the panel already references (inbound
TLS certificateFile values plus the panel's own web cert) and read the
config-sourced path rather than the caller-supplied one, breaking the
taint flow while preserving arbitrary cert locations.
Routing and Outbounds export now opens a TextModal showing the JSON with
copy/download buttons instead of auto-downloading the file. Routing import
and export are collapsed into a "More" dropdown to match the Outbounds tab.
The rule form Enabled field becomes a Switch instead of an Enabled/Disabled
Select.
TLS: add verifyPeerCertByName (vcn) to inbound settings + emit in both share-link generators (frontend + Go sub) and outbound parser; the allowInsecure replacement xray removed after 2026-06-01. Add server-side curvePreferences, masterKeyLog, echSockopt (passthrough + form) at tlsSettings top-level so they survive the panel-only settings strip.
REALITY: add limitFallbackUpload/Download (afterBytes/bytesPerSec/burstBytesPerSec) with per-field tooltips, plus masterKeyLog. Verified field names/semantics against pinned xray v1.260327.1 (bytesPerSec=0 disables).
Hosts: fix verify_peer_cert_by_name column bool->string (xray expects comma-separated names) with an idempotent, history-gate-free migration (SQLite typeof blank; Postgres ALTER once); emit vcn for hosts/external proxies.
Server: add getCertHash (local cert DER SHA-256) and getRemoteCertHash (xray tls ping) endpoints + api-docs; wire pinned-cert field buttons. Drop the meaningless random-hash button.
Xray UI: metrics endpoint (listen/tag) config in Basics; import/export for routing rules and outbounds.
Fallbacks card: compact empty state, header-aligned actions, responsive labeled grid rows.
i18n: add all new keys to every locale; drop unused generateRandomPin.
normalizeStreamSettings cleared StreamSettings for any protocol outside
its whitelist, and tunnel was missing. The frontend sent sockopt
correctly but the backend wiped it on every add/update. Tunnel relies on
sockopt (notably sockopt.tproxy for TProxy/redirect mode), so add it to
the whitelist.
* feat(sub): implement dynamic single-bracket remark variables with timezone-aware inline Jalali conversion
* Update .gitignore
* Update .gitignore
* merge: bring in origin/main commits to resolve conflict base
* fix(sub): address review issues in dynamic remark variables
- Add TIME_LEFT to unlimitedDropTokens so segments containing only
{TIME_LEFT} are dropped for unlimited clients (same as DAYS_LEFT)
- Remove dead uiSingleBraceRe variable (translateUISingleBrackets uses
a character scanner, not this regex)
- Change expireDateLabel to use time.Local instead of UTC, consistent
with jalaliExpireDateLabel
Co-authored-by: Sanaei <MHSanaei@users.noreply.github.com>
* fix
* fix
---------
Co-authored-by: MHSanaei <MHSanaei@users.noreply.github.com>
* feat(settings): add option to hide server settings in subscription
* chore: regenerate codegen and add translations for subHideSettings
- Update frontend/src/generated/{types,schemas,zod,examples}.ts to include
subHideSettings (bool) in AllSetting and AllSettingView
- Add subHideSettings / subHideSettingsDesc translation keys to all 11
remaining locales: ar-EG, fa-IR, es-ES, id-ID, ja-JP, pt-BR, uk-UA,
tr-TR, zh-TW, zh-CN, vi-VN
Co-authored-by: IgorKha <IgorKha@users.noreply.github.com>
Co-authored-by: Sanaei <MHSanaei@users.noreply.github.com>
* fix(sub): add subHideSettings default to settings map
Every other sub* setting has an entry in defaultValueMap; subHideSettings was missing, so GetSubHideSettings hit the 'key not in defaultValueMap' error path on a fresh install (only masked by the false fallback in sub.go). Add the default for consistency.
* feat(sub): add full XHTTP field mapping for Clash subscriptions
The Clash subscription generator only emitted path, host, mode in
xhttp-opts. Mihomo supports all XHTTP parameters including padding,
xmux (reuse-settings), session/seq placement, and more.
Add buildXhttpClashOpts() that maps all client-relevant XHTTP fields
from 3x-ui's camelCase JSON storage to Mihomo's kebab-case YAML format
using an explicit allowlist approach.
Field mapping (source-verified against Mihomo adapter/outbound/vless.go):
- String fields: xPaddingBytes→x-padding-bytes, sessionPlacement→
session-placement, etc. (10 fields with DPI default filtering)
- Bool fields: noGRPCHeader→no-grpc-header, xPaddingObfsMode→
x-padding-obfs-mode (with gated sub-fields)
- Nested: xmux→reuse-settings (6 sub-fields with kebab-case)
- Headers: pass through with Host key dropped
- Server-only fields automatically excluded (not in allowlist)
DPI defaults filtered: scMaxEachPostBytes="1000000",
scMinPostsIntervalMs="30" (known DPI fingerprint)
* test(sub): add comprehensive tests for buildXhttpClashOpts
9 test functions covering all field mapping categories:
- FullFieldMapping: every kebab-case key verified
- DPIDefaultsFiltered: scMaxEachPostBytes=1000000 and scMinPostsIntervalMs=30
- PaddingObfsGate: false/absent/true-with-no-gated-fields
- XmuxMapsToReuseSettings: full mapping, empty, int/float64/zero hKeepAlivePeriod
- ServerOnlyFieldsExcluded: noSSEHeader, scMaxBufferedPosts, etc.
- NilInput and EmptyInput: return nil
- HostFallbackFromHeaders: headers.Host, only-Host, case-insensitive drop
- NoGRPCHeaderFalsey: false and absent both produce no key
* fix(sub): clean up redundant skipValue check and add missing xhttp no-settings test
- In buildXhttpClashOpts, change string-field loop condition so that
skipValue == "" means "no filter" rather than redundantly comparing
v against "" twice (xPaddingBytes was the affected entry)
- Add TestApplyTransport_XHTTP_NoSettings to pin the behaviour when
xhttpSettings is absent: applyTransport returns true, network is set
to "xhttp", and xhttp-opts is not emitted
claude-code-action checks out the PR head branch and pushes Claude's
commits with `git push origin ...`. For PRs opened from a fork the head
branch lives on the contributor's repo, and the workflow GITHUB_TOKEN
cannot push there, so commits ended up as a stray branch on this repo
and never landed on the PR.
Redirect origin's push URL to the PR head repository (the fork for fork
PRs, this repo otherwise) using a PAT secret (CLAUDE_BOT_PAT) that has
push access; fetches still come from origin. persist-credentials is
disabled so the PAT in the push URL is used instead of the GITHUB_TOKEN
auth header. Requires the fork PR to have "Allow edits by maintainers"
enabled.
Rename claude-issue-bot.yml to claude-bot.yml and broaden it beyond
issues:
- handle-pr: review pull requests on open (read diff, label, post one
grounded review comment); review-only, no code changes.
- mention: allow committing. Add Edit/Write and git tools, contents:
write, and instruct it to make the smallest correct change and commit
to the current branch only on an explicit code-change request. Kept
default user gating (no allowed_non_write_users) so only write-access
users can trigger commits.
- Refresh the repository map (add internal/eventbus and the
service/email subpackage) across all three prompts.
- Raise max-turns.
@
Replace EventBusCheckboxes with card-based notification settings:
- Each event group gets its own card with responsive grid layout
- Master checkbox per group with indeterminate state
- Inline parameter inputs (CPU threshold) appear when enabled
- Theme-adaptive via Ant Design Card component
Components:
- NotificationLayout, NotificationCard, NotificationHeader, NotificationEvent
- TelegramNotifications, EmailNotifications with explicit event configs
A client linked to N inbounds has one ClientStats row per inbound, all
sharing the same email. getExhausted appended every row, so the admin
expiration/traffic report listed the same user once per inbound (N info
blocks and N buttons), which Telegram split into multiple messages.
Track seen emails and report each client once.
* perf(xray): compile log/traffic regexps once at package scope
GetTraffic recompiled two stats regexps on every traffic tick, and LogWriter.Write
recompiled two more on every log line. Hoist all four to package-level vars so they
compile once at load instead of per call on hot paths.
* fix(xray): guard LogWriter.lastLine against the GetResult reader race
Write is driven by the Xray process goroutine while Process.GetResult
reads lastLine from the caller's goroutine, so the unsynchronized field
is a data race under `go test -race`. Add an RWMutex and route every
write through setLastLine; GetResult reads via LastLine().
* fix(xray): bound handler gRPC calls with a deadline
AddInbound, DelInbound and the AddUser AlterInbound call used
context.Background(), so a hung core connection could block the caller
indefinitely (for example while the process restart lock is held). Give
them a 10s deadline (handlerRPCTimeout) and a nil-client guard, matching
the other handler operations.
getSetting (WHERE key=?) runs on nearly every subscription request and job
tick and had no index, so each lookup full-scans the settings table past the
large xrayTemplateConfig blob. Add an index on settings.key; AutoMigrate
creates it on existing DBs too. Includes a HasIndex test.
GetTraffic recompiled two stats regexps on every traffic tick, and LogWriter.Write
recompiled two more on every log line. Hoist all four to package-level vars so they
compile once at load instead of per call on hot paths.
Co-authored-by: Sanaei <ho3ein.sanaei@gmail.com>
Per SIP022, ss:// links for 2022-blake3-* methods must NOT base64-encode
the userinfo; method and password are percent-encoded instead. Clients
like Hiddify reject the base64 form. Fix both the server-side
subscription path and the client-side panel link, plus the matching
parsers for round-trip import.
Regression of #4364. Karing parses the `extra` JSON and ignores the
flat `mode=` param, so when extra was present without `mode` it stored
the transport with no mode and the handshake failed. The `mode` field
that #4365 added to buildXhttpExtra was dropped during the share-link
refactor; restore it in both the backend and frontend generators.
Issue 1: the host endpoint remark no longer substitutes the inbound remark
as the config name. {{INBOUND}} always resolves to the inbound's own remark
and {{HOST}} to the host remark, so both can be shown side by side instead
of the host name appearing twice. configName() drops hostRemark entirely;
token help text updated in all locales.
Issue 2: client_traffics.email is globally unique, so a client shared across
several inbounds of one subscription has a single traffic row owned by one
inbound. statsForClient only searched the current inbound's preloaded
ClientStats, missing on every other inbound's link and falling back to
Up=Down=0 -- so {{TRAFFIC_LEFT}} printed the full quota. Build a per-request
email->stats map from all the subscription's inbounds (no extra queries) and
fall back to it.
Local hot paths:
- autoRenewClients: replace the O(clients x expired) inner scan with an
email->traffic map lookup (quadratic at scale).
- node traffic sync: scope the client_traffics email-membership query to the
snapshot's emails instead of plucking the whole table every poll.
- add a (expiry_time, reset) index for the per-tick auto-renew filter.
- SQLite: add cache_size/mmap_size/temp_store pragmas (env-tunable); keep the
single-file DELETE journal and synchronous=FULL defaults.
- scale benchmarks now run on SQLite too via XUI_SCALE_TEST=1 (shared
setupScaleDB/resetScaleTables helpers), not just Postgres.
Node paths:
- bulk add/delete/adjust on a node-attached inbound folded one HTTP RPC per
client; above nodeBulkPushThreshold (32) mark the node dirty and let one
ReconcileNode push converge it instead of O(M) sequential round-trips.
Small ops keep the live per-client path. Also hoist nodePushPlan out of the
per-email delete loop.
- ReconcileNode skips inbounds whose wire payload is unchanged (per-tag
fingerprint on Remote), guarded by node-side tag presence so a restarted
node is still re-seeded.
Tests: auto-renew multi-inbound correctness, node-path dispatch (large ops
fold to dirty, small ops push live) via a manager runtime override seam, and
reconcile delta-skip.
Switch sqlite from WAL to DELETE journal mode so the database no longer
keeps -shm/-wal sidecar files; only x-ui.db remains at rest. Pair with
synchronous=FULL for crash-safe durability in rollback-journal mode.
The startup PRAGMA journal_mode=DELETE converts existing WAL databases
and removes their leftover sidecar files on first run, so upgrades need
no manual cleanup. busy_timeout and _txlock=immediate are unchanged.
* fix(routing): sync xray rules when panel inbound tags change or are deleted
When an auto-generated inbound tag changes (e.g. port edit), propagate the
rename into xrayTemplateConfig routing rules and loopback outbounds. On
inbound delete, drop rules that only matched that tag and strip the tag from
rules that also match on domain, IP, or other fields.
Run the template update after the inbound DB transaction commits so SQLite
WAL reads see the stored xray settings reliably.
* fix(inbounds): return needRestart after deferred routing tag sync
Use a named needRestart return in UpdateInbound so the post-commit PropagateInboundTagRename defer can signal callers to restart Xray.
---------
Co-authored-by: Sanaei <ho3ein.sanaei@gmail.com>
NodeService.Delete dropped the node row (and its per-node child rows) without
checking for inbounds still referencing it via node_id, leaving orphaned
inbounds with a dangling node_id that confuse node sync, subscriptions and
cleanup. Refuse the delete with a clear error when inbounds are still attached,
and remove the per-node child rows before the node row inside one transaction.
Delete stays tolerant of a missing node row so it can still clean up orphaned
rows. Regression test covers the blocked and clean-delete paths.
* fix(sub): preserve non-default scMinPostsIntervalMs in inbound wire payload
The frontend wire normalizer unconditionally deleted scMinPostsIntervalMs
from inbound configs before persisting to the database, so JSON
subscriptions could never include it — even when the admin set a
non-default value like "50-150".
Only strip the xray-core default ("30") or empty values. The literal
"30" is a known DPI fingerprint (#5141) and must still be removed, but
custom tuning knobs must survive the round-trip so that buildXhttpExtra
and the JSON subscription generator can propagate them to clients.
Add tests for non-default preservation and empty-value stripping.
* fix(sub): use per-inbound xmux instead of global subJsonMux in JSON subscriptions
The JSON subscription generator always used the global subJsonMux panel
setting for outbound.Mux, even when the inbound carried per-inbound xmux
inside xhttpSettings. This meant XHTTP outbounds that configured their own
multiplexing via xmux still got the legacy mux.cool block injected — and
the inbound's own xmux was silently ignored.
Now getConfig() checks whether xmux is present in the inbound's
xhttpSettings. When it is, the per-inbound xmux handles multiplexing
and the legacy outbound.Mux is suppressed. When xmux is absent, the
global subJsonMux is used as before.
The mux selection is threaded through genVless, genVnext, genServer,
and genHy as an explicit parameter so each protocol handler can decide
independently.
Add tests:
- xmux present → outbound.Mux suppressed, xmux survives streamData()
- no xmux → global subJsonMux used as outbound.Mux
* feat(ui): add scMinPostsIntervalMs to inbound XHTTP form
The inbound XHTTP form was missing scMinPostsIntervalMs, making it impossible
for admins to configure this client-only tuning knob through the panel. The
field already existed in the Zod schema and outbound form, and the wire
normalizer (PR #5393) now preserves non-default values for subscription
propagation.
Add Form.Item for scMinPostsIntervalMs in the packet-up section of the
inbound XHTTP form, after scMaxEachPostBytes. Use the existing translation
key and a placeholder that shows the range format without endorsing the
DPI-fingerprinted default (30).
Update the Zod schema comment to clarify that scMinPostsIntervalMs is now
preserved on inbound for subscriptions, while uplinkChunkSize and
noGRPCHeader remain outbound-only.
Add two integration tests:
- Non-default value (50-150) preserved through formValuesToWirePayload
- Default value (30) stripped through the full pipeline
* fix(ui): show packet-up fields for auto mode in inbound XHTTP form
When mode is 'auto', the server accepts all three XHTTP modes including
packet-up. The packet-up-specific fields (scMaxBufferedPosts,
scMaxEachPostBytes, scMinPostsIntervalMs) are therefore relevant and
should be configurable.
Change the conditional from 'packet-up' only to
'packet-up || auto' so admins using the default 'auto' mode can
configure these fields.
* fix(outbound): show scMinPostsIntervalMs for auto mode, update placeholder
- Show scMinPostsIntervalMs field when mode is 'auto' in addition
to 'packet-up', since auto+TLS resolves to packet-up client-side
- Change placeholder from '30' (DPI fingerprint) to 'e.g. 50-150'
for consistency with inbound form
* fix(inbound): show scMaxEachPostBytes for all modes, gate scMaxBufferedPosts behind packet-up/auto
scMaxEachPostBytes is used by xray-core in every mode (both handlePacketUp
and handleStreamUp validate it) and must be visible regardless of mode.
scMaxBufferedPosts is only used by handlePacketUp, so it remains gated
behind the packet-up/auto conditional.
Also show scMinPostsIntervalMs for auto mode in outbound form and change
placeholder from '30' (DPI fingerprint) to 'e.g. 50-150'.
Update snapshot to reflect the new field order.
* fix(inbound): correct XHTTP field visibility per xray-core source verification
- scMaxEachPostBytes: move behind packet-up/auto gate (server only checks
it in handlePacketUp, not handleStreamUp)
- scMaxBufferedPosts: show for packet-up, stream-up, and auto (server
uses uploadQueue in both handlePacketUp and handleStreamUp)
- scStreamUpServerSecs: already correct (stream-up only)
Verified against xray-core hub.go and dialer.go source code.
---------
Co-authored-by: w3struk <w3struk@gmail.com>
Co-authored-by: MHSanaei <ho3ein.sanaei@gmail.com>
The central panel stores node inbounds with an n<id>- prefix so tags stay
unique in its database, but pushes were sending that prefixed tag to the
remote node. A no-op save or reconcile could rename the remote inbound and
break Xray routing rules that still referenced the original tag.
Strip only this node's prefix in wireInbound before add/update so the remote
keeps its bare tag while central retains the aliased form locally.
Signed-off-by: aleskxyz <39186039+aleskxyz@users.noreply.github.com>
The public subscription http.Server set no timeouts, leaving the most exposed
listener open to slow-header/Slowloris exhaustion. Mirror the panel server
timeouts already used in internal/web/web.go.
Remote node HTTP responses were read with an unbounded io.ReadAll, so a
broken or hostile node could force the master panel to buffer an arbitrarily
large body. The single Remote.do choke point that all node calls funnel
through now:
- validates the HTTP status before reading any success payload (a non-OK
body is only read up to a small bounded diagnostic snippet, so a node
cannot make the master buffer a large body just to return an error);
- fast-fails on an honestly-declared oversize Content-Length;
- reads the success body through readCappedBody, an io.LimitReader cap
(64 MiB) that rejects oversize with a typed error.
The 64 MiB cap bounds one response's wire/decompressed size; it is documented
as not a process-wide memory bound (endpoint-specific caps and a concurrency
budget remain follow-ups).
Tests cover the cap+1 boundary, an oversize streamed body, a normal envelope,
and non-OK status precedence.
A panic in a goroutine without a recover takes the whole panel down. The
per-node heartbeat and traffic-sync goroutines run remote network I/O for
each node with no panic isolation, so one misbehaving node could crash the
master.
Add common.GoRecover(name, fn), which runs fn in a goroutine guarded by a
recover that logs the panic with a stack trace instead of crashing, and use
it for the per-node heartbeat, traffic-sync and global-push goroutines. The
deferred WaitGroup/semaphore releases still run during panic unwind, so the
group never stalls. Other background goroutines can adopt the same helper.
The scheduler was created without a panic recovery wrapper, so a panic in any
scheduled job (traffic write, IP check, etc.) propagated up and could take down
the whole panel process. Wrap jobs with cron.Recover so a panic is logged and
the scheduler keeps running.
* fix(xray): verify the release archive checksum before installing
UpdateXray downloaded the Xray-core release zip and installed the binary
from it after only a TLS fetch, an HTTP-200 check and a size cap — the
archive itself was never verified, so a corrupted or tampered release
asset would be extracted and run as the panel's xray binary.
Verify the downloaded archive against the SHA2-256 published in the
release's .dgst sidecar (which XTLS ships next to every asset) before
installing, and abort the update on mismatch, a missing/short SHA2-256
entry, or an unreachable .dgst. The digest parser and fetch are covered by
tests, including the real .dgst line format ("SHA2-256= <hex>").
* address review: clearer warning + re-download guidance on checksum mismatch
Per review feedback on the PR: on a SHA-256 mismatch, surface a plain-language
warning that the downloaded archive is corrupted or differs from the official
release and that the user should exit and re-download, instead of a terse
"checksum mismatch" error. The install still aborts so a mismatched binary is
never run; the message now tells the user the safe next step.
The process cmd, done and exitErr fields were written by Start/startCommand and
the waitForCommand goroutine while IsRunning/GetErr/GetResult/Stop read them
concurrently from other goroutines (the status endpoint and the check-xray
job) — a data race. Guard them with a RWMutex: writers take the write lock;
readers snapshot under the read lock and run any blocking syscall
(Wait/Signal/Kill) on the local copy without holding it. IsRunning now uses the
done channel as the exit signal instead of reading cmd.ProcessState, which
races with cmd.Wait. Adds a -race regression test.
Three related bugs caused inflated traffic counters and spurious quota
hits on multi-node setups, most visibly when a client email was renamed
while a node was offline or its PostgreSQL deadlocked.
**Fix 1 — phantom quota (root cause)** `setRemoteTrafficLocked`
new-row path: when master had no `client_traffics` row for an email
that a node reported, it seeded the row with `Up: cs.Up` — importing
the node's full accumulated counter as if it were fresh quota usage.
If the node retained stale data from a previously-deleted account (e.g.
a failed deletion during an outage), the ghost 50 GB appeared on the
new client immediately and triggered `disableInvalidClients` the same
tick. Fixed by seeding at `Up: 0`; the current node value still becomes
the baseline so only future increments count.
**Fix 2 — PostgreSQL deadlock** `addClientTraffic` did a
read-modify-write via `tx.Save(slice)`, issuing UPDATEs in slice order.
Two concurrent goroutines locking the same rows in opposite order
deadlock on PostgreSQL (SQLite avoids this with file-level
serialisation). Replaced with atomic per-email
`UPDATE SET up=up+?, down=down+?` statements. Also preserves the
delayed-start ExpiryTime conversion that `adjustTraffics` computes
in-memory but the old Save path persisted to the DB.
**Fix 3 & 4 — stale `inbound_id` filters** `autoRenewClients` used
`WHERE inbound_id NOT IN (node inbounds)` to skip node clients, but
`client_traffics.inbound_id` is set once on INSERT and never refreshed.
Replaced with an email-based subquery through `client_inbounds` (the
authoritative source). Also added a safe type assertion for
`settings["clients"].([]any)` that previously panicked on nil.
**Fix 5 — stale `inbound_id` in reset** `resetAllClientTrafficsLocked`
used `WHERE inbound_id = ?` to find which emails to reset; same staleness
problem. Replaced with the `client_inbounds` join for email lookup;
the `inbounds.last_traffic_reset_time` update still correctly uses the
inbound ID directly on the `inbounds` table.
Tests updated to reflect the new seeding-at-zero semantics and a new
`TestGhostData_NoPhantomTraffic` test reproduces the exact 50 GB
phantom scenario.