mirror of
https://github.com/MHSanaei/3x-ui.git
synced 2026-07-02 18:44:20 +00:00
9e13b32c34
* fix(script): download the live x-ui.sh script atomically before replacing it
update_menu(), update_shell(), and update.sh's update_x-ui() all overwrote
/usr/bin/x-ui in place via `curl -o`, truncating and rewriting the same
inode a currently-running x-ui process may still be reading from. A
network hiccup or slow write during that overwrite leaves a
half-old/half-new script on disk, which then fails with bogus syntax
errors on the next run. Download to /usr/bin/x-ui-temp and `mv -f` into
place instead, matching the atomic pattern install.sh already uses.
Also fixes update_menu() checking chmod's exit code instead of curl's,
which meant a failed download could still report "Update successful."
* fix(script): close remaining gaps in the atomic script-update path
Code review of the previous commit found the atomic mv fix was itself
incomplete:
- None of the mv -f calls checked their exit status, so a failed move
fell through to chmod and "success" messaging while /usr/bin/x-ui
stayed on the old file.
- update_shell()'s `[[ -s x-ui-temp ]]` guard couldn't tell "curl -z
got a 304, nothing to do" from "a stale temp file survived an
earlier crashed run" -- the latter could get moved into place with
no freshness check.
- update_menu(), update_shell(), and update_x-ui() all hardcoded the
same /usr/bin/x-ui-temp path, so two concurrent updates (e.g. a
cron auto-update racing an interactive menu update) could collide.
- update.sh's update_x-ui() was missing the non-empty-file guard
update_shell() already had.
x-ui.sh's update_menu() and update_shell() now share a
replace_xui_script() helper that uses a PID-suffixed temp path
(/usr/bin/x-ui-temp.$$), pre-cleans it before every attempt, and
checks the exit status of curl, the non-empty test, and mv before
treating the update as successful. update.sh's update_x-ui() gets the
same sequence inlined (it's fetched as a standalone script and can't
call x-ui.sh's function), closing the missing-guard gap and using its
own unique temp path.
* fix(script,panel): harden the remaining self-update download paths
install.sh had the same unguarded /usr/bin/x-ui-temp overwrite the two
already-fixed scripts had: no exit-status check on mv, and a fixed temp
name shared with x-ui.sh/update.sh's (now-unique) temp files. Give it
its own PID-suffixed temp path, an empty-file guard, and an mv
exit-status check, matching the pattern used there.
Audited the web dashboard's Go-native updater (panel.go) for the same
bug class: it already uses os.CreateTemp for a genuinely unique temp
file and cleans up via both a deferred Remove and a shell EXIT trap, so
it was never exposed to the fixed-path race. It was missing a check
for a zero-byte download (a 200 OK with an empty body would chmod +x
and exec an empty script) -- added that alongside the existing size
cap.
Not addressed here: once startUpdate()'s child process starts, the Go
service releases it and returns success immediately. If update.sh
fails partway through, the still-running old panel keeps answering
/status, so the frontend's poll can report success with no update
having happened. Fixing that needs update.sh to signal completion
status back and the frontend to check it -- a separate follow-up.
* feat(panel): report real completion status for the web self-update
Fixes the fire-and-forget gap flagged in the atomic-overwrite fix: once
startUpdate() launches update.sh detached, the Go service had no way to
learn whether it actually succeeded. If update.sh failed partway
(network drop, disk full, permission denied), the still-running old
panel kept answering /status, so the frontend's poll reported success
with nothing having changed.
update.sh now writes its outcome to a small JSON status file
(/etc/x-ui/update-status.json by default) via `trap ... EXIT`, which
covers every exit path in the script -- including the two bare `exit 1`
call sites that don't go through the existing _fail() helper. The Go
service generates a run ID before launching, passes it and the status
path to update.sh via XUI_UPDATE_RUN_ID/XUI_UPDATE_STATUS_FILE, and a
new GET /panel/api/server/getUpdateStatus endpoint reports it back. The
frontend now polls that instead of blindly trusting HTTP reachability,
and shows a distinct error or "couldn't confirm" message instead of
silently reloading into a false success.
Adversarial review of this surfaced three more issues, fixed here:
- No lock stopped two concurrent /updatePanel calls from launching two
update.sh runs that would race each other on the actual update work
(tar extraction, service unit swap). Added an in-memory guard with a
5-minute self-expiring window, so a run that never reaches a terminal
state doesn't lock out retries indefinitely.
- XUI_UPDATE_RUN_ID is read from the environment and was interpolated
unquoted into the status JSON; a malformed value would produce
invalid JSON. Now validated as digits-only before use.
- The run ID is a UnixNano timestamp (19 digits), sent as a raw JSON
number it would lose precision in JavaScript (past
Number.MAX_SAFE_INTEGER), letting two different runs round to the
same value on the wire and defeat the whole comparison. It's now a
decimal string end to end (Go, the status file, and the generated
frontend type).
install.sh's equivalent temp-file/mv path and the Go-native
downloadPanelUpdater() path were audited for the same bug classes
during this work; findings from that audit were addressed separately.
* fix(panel): release the update lock as soon as the run finishes
An exhaustive multi-angle review of the whole branch (12 finder angles,
3-vote adversarial verification, a fresh-eyes sweep) surfaced a real
bug in the concurrency guard added in the previous commit, plus several
smaller issues; this fixes what's actionable now.
The bug: acquireUpdateSlot only ever released on the 5-minute stale
timeout or if launching itself failed. If update.sh launched fine but
failed fast (bad GitHub API response, "x-ui not installed", any of its
early exit paths), the status file correctly reported "failed" within
seconds, but a retry was still rejected with "a panel update is
already in progress" for up to 5 more minutes -- the guard never
looked at the very status file this branch built to know a run was
done. It now tracks which run ID currently holds the slot and checks
that run's own status before falling back to the timeout, so a fast
failure clears the way for an immediate retry. Added a regression test
for this, plus one confirming a stale, unrelated runID can't be
mistaken for the current run finishing.
Also:
- Added a genuinely concurrent test for the guard: 200 goroutines
racing acquireUpdateSlot, asserting exactly one wins. The previous
tests only ever called it from one goroutine, so they gave no signal
if the mutex's check-then-set were silently broken -- verified this
by temporarily removing the lock and confirming the old tests still
passed while the new one caught it immediately under -race.
- Removed the redundant upfront "pending" status write: GetUpdateStatus
already defaults a missing/stale file to pending, and the frontend
matches by run ID regardless, so the write changed no observable
behavior. Deleted writeUpdateStatus entirely since that was its only
caller.
- Renamed replace_xui_script()'s unclear "conditional" parameter to
use_if_modified_since, matching what it actually controls.
- Added HTTP-level tests for the new getUpdateStatus endpoint,
including a regression test that the runId wire format is a JSON
string (decoding into a Go string field fails outright if it were
ever a bare number). updatePanel's actual launch path is not
covered: on a Linux test runner it would make a real network call
and could exec a real update.sh, so only its non-Linux guard path is
safely testable without mocking.
Not fixed here, tracked separately: the same unsafe-overwrite pattern
this branch eliminated for /usr/bin/x-ui is still present for the
systemd unit file install in update.sh and install.sh (lower severity
since systemd only reads it on daemon-reload, not continuously); and
startUpdate's systemd-run-vs-detached-fallback branching has no test
coverage since testing it safely needs dependency injection this fix
doesn't warrant bundling in.
* fix(script): make systemd unit file installation atomic
Same anti-pattern as the /usr/bin/x-ui overwrite fixed earlier: every
site that lands the systemd unit at ${xui_service}/x-ui.service --
copying it from the extracted release tarball, or falling back to a
GitHub download per distro family -- wrote straight onto the live
path via cp/curl, no temp file, no verification. A network drop
mid-download or an interrupted cp leaves the unit file truncated;
systemd then fails to parse it on the next daemon-reload/start,
leaving the panel unable to come up until an operator manually
re-copies a good unit file.
Lower severity than the /usr/bin/x-ui case (systemd only reads this
file on demand at daemon-reload time, not continuously the way bash
interprets a running script line by line), but it's the identical
gap, just left uncovered when that fix landed.
Added a small shared helper in both update.sh and install.sh --
_install_xui_service_unit() -- covering both source types (cp from
the tarball, curl from GitHub): write to a PID-suffixed temp file,
verify the copy/download succeeded and the result is non-empty, then
mv -f into place and check that exit status too, matching the pattern
already used for /usr/bin/x-ui. All 4 cp sites and the 3-way curl
fallback in each file now go through it; verified no other site
writes new content to the unit path (the remaining ${xui_service}
references are a pre-install existence check, an rm during old-version
cleanup, and the chown/chmod that already ran after the file is safely
in place -- none of those need atomicity).
Verified with bash -n on both files, plus a standalone scratch test
exercising cp-success, cp-with-missing-source, cp-with-empty-source,
and curl-failure paths: on every failure the previous, good unit file
content is left untouched and no temp file is leaked behind.
* fix(script): make Alpine's OpenRC init script install atomic; drop a stray comment
A final maximum-rigor review of the whole PR (12 finder angles including
a repo-wide sweep for any remaining instance of the bug class this PR
fixes) found two more real issues:
- Alpine's /etc/init.d/x-ui startup script is downloaded via a bare
`curl -fLRo` straight onto the live path in both update.sh and
install.sh -- the exact same unguarded-overwrite pattern already
fixed for /usr/bin/x-ui and the systemd unit file, just left
uncovered on the OpenRC side. A network drop mid-download truncates
the live init script; OpenRC then fails to source/execute it on the
next start, leaving the panel unable to come up. Fixed with the same
temp-file + non-empty check + mv -f (with its own exit-status check)
pattern used everywhere else in this PR. Verified with bash -n and a
standalone scratch-script test covering success, empty-download, and
destination-preserved-on-failure paths.
- internal/web/service/panel/panel_test.go had one line-level `//`
comment on a call site, which the root CLAUDE.md's hard rule ("No //
line comments in committed Go/TS... rename instead of annotating")
explicitly prohibits. The comment duplicated context already stated
in the test's own doc comment two lines above, so it's simply
removed rather than reworded.
Also flagged, deliberately not bundled here since it's a different
subsystem: x-ui.sh's update_geofiles() downloads Xray's live
geoip.dat/geosite.dat with the same unguarded curl -o pattern. Tracked
as its own follow-up.
* fix(script): make geo-data file downloads atomic
Same anti-pattern as /usr/bin/x-ui, the systemd unit file, and the
Alpine init script fixed in prior PRs: update_geofiles() downloaded
Xray's live geoip.dat/geosite.dat (and the IR/RU variants) with curl
writing straight onto the exact path Xray reads at runtime
(internal/xray/process.go's GetGeoipPath/GetGeositePath), no temp
file, no verification. The existing check only inspected the reported
HTTP status via -w '%{http_code}', not file integrity, so a network
drop mid-download could leave a truncated .dat file on disk that
passes the status check. Xray then fails to parse it on the next
restart/reload, breaking any routing rules that reference geoip:/
geosite:.
The -z conditional-GET usage needed care here: the original code
pointed both -z and -o at the same live path. Fixed by pointing -z at
the live file (to keep the "already current" freshness check) while
-o writes to a PID-suffixed temp file, matching the pattern already
proven in x-ui.sh's replace_xui_script(). Verified with a local HTTP
server that a 304 response leaves the temp file untouched/nonexistent
(so the existing "already up to date" branch still works unchanged),
and added a non-empty check plus a checked mv -f before treating a
download as installed.
Verified with bash -n and an end-to-end scratch test against a local
server covering: fresh download, 304-not-modified, empty response
body, and a 404 -- confirming a failure at any stage leaves the
previous good .dat file completely untouched and no temp file behind.
* fix(script): verify the release tarball extraction, not just the download
The final maximum-rigor review found the most significant remaining gap
in this whole effort: update.sh and install.sh check the tarball
download's exit status, but never check tar's exit status, and never
verify the extracted x-ui binary actually exists before continuing.
Worse, by the time extraction runs, the previous installation has
already been stopped and deleted -- there's no rollback. A truncated
download that still passes curl's own check, or a tar failure (disk
full, killed process), left the panel silently in a broken half-state:
chmod/config/service-install all continued to run against a missing or
empty binary, with no error surfaced anywhere. This is the same bug
class as everything else in this PR (unverified write to a path
something then depends on), just for the tarball itself rather than a
single file -- and it also covers the geo-data files this PR already
fixed once for the interactive/cron path, since they ship inside this
same tarball on every panel update.
Added: a non-empty check on the downloaded archive (both files, both
install.sh call sites) and a check that tar succeeded and produced a
non-empty x-ui binary before proceeding, failing loudly with a message
that explicitly says the previous install is already gone, since
silently continuing here is worse than anywhere else in this PR.
This doesn't make the multi-file extraction fully atomic (that would
mean extracting to a temp directory and atomically swapping the whole
install tree into place, a materially larger restructuring than
anything else in this PR) -- but it closes the "fails silently, user
discovers it days later when Xray can't start" gap, which was the
actual reported problem this whole effort traces back to.
Also fixed, all much smaller:
- replace_xui_script() in x-ui.sh implicitly returned chmod's exit
status instead of success, so a successful atomic install could be
reported as failed if chmod transiently failed after the mv already
landed the new script. Added an explicit `return 0`.
- update_geofiles() had no default case branch; an unrecognized
argument would silently reuse whatever dat_files/dat_source values a
previous call left in the un-scoped globals instead of failing.
Currently unreachable (all three call sites pass fixed literals) but
cheap, defensive, and worth having.
- internal/web/controller/server.go's updatePanel has one branch (an
unparseable "dev" form value) that's both untested and safe to test
on any platform, since it's rejected before any real exec/network
call. Added the missing test case.
Verified: bash -n on all three scripts; an empirical scratch test
covering an empty downloaded archive, a corrupt (non-gzip) archive,
and a successfully-extracting-but-empty archive, confirming each is
caught before the script proceeds; full go build/vet/test -race
across the whole module; frontend generation confirmed still in sync.
* fix(panel): base the update-slot staleness fallback on process liveness
Addresses the automated review on the upstream PR (MHSanaei/3x-ui#5711).
Blocking finding: acquireUpdateSlot's staleness fallback freed the
update slot purely on elapsed wall-clock time (5 minutes), with no
check on whether the update.sh process it launched was actually still
running. update.sh runs install_base() (apt-get/dnf/pacman update and
install) before update_x-ui even starts, plus several GitHub
downloads (release tarball, x-ui.sh, and possibly a service unit or
x-ui.rc) -- on a slow or throttled host, a small VPS being the typical
deployment target for this project, that alone can plausibly exceed 5
minutes with nothing wrong. A second /updatePanel call arriving in
that window (an admin retrying after the frontend's 90s poll times
out, or overlapping master-node bulk-update calls) would launch a
second update.sh, racing the exact rm/tar/mv/systemctl sequence this
whole PR exists to make safe.
Fixed by recording the launched process's PID (detached-fallback path
only; the systemd-run path's own process has already exited by the
time startUpdate returns, so it never learns update.sh's real PID) and
checking it via the standard POSIX kill(pid, 0) liveness probe before
treating a run as stale, following the existing panel_unix.go /
panel_other.go platform-split pattern already used for
setDetachedProcess. A confirmed-alive process now keeps the slot held
past updateStaleAfter (raised from 5 to 20 minutes as a safer baseline
for the systemd-run path, which still has no way to check liveness
directly). updateHardCeiling (2 hours) is an absolute backstop so a
genuinely wedged run can never lock out retries permanently even on
the PID-tracked path.
Added two regression tests exercising the new logic (gated to Linux,
since processAlive is a no-op stub elsewhere): a live PID keeps the
slot held past the stale window, and the hard ceiling overrides
liveness. Traced both by hand against the new acquireUpdateSlot logic;
could not execute-verify processAlive itself on this Windows dev
machine (no WSL distro installed, and installing one felt
disproportionate to validate kill(pid, 0), an extremely well-established
POSIX primitive), but cross-compiled clean for linux/amd64 and this
repo's CI runs the real test suite on Linux.
Also fixed, both suggestions from the same review:
- install.sh: two failure paths right after tarball extraction were
exiting without cleaning up the already-downloaded x-ui.sh temp file
(xui_script_temp), leaving it behind. Every other new failure branch
in this PR removes its temp file before exiting; these two now do
too.
- frontend/src/pages/api-docs/endpoints.ts: updatePanel's doc entry
did not reflect that a successful response now carries an obj with
runId. Added an inline response example matching the existing
pattern used for other ad hoc (non-schema-backed) responses like
getWebCertFiles.
Verified: go build/vet clean on both windows (native) and a linux/amd64
cross-compile; full go test ./... clean; go test -race on the panel
and controller packages; bash -n on all three shell scripts; npm run
gen confirms the openapi.json diff is exactly the new response example
with no stray changes to src/generated; TestAPIRoutesDocumented still
passes.