- remove the root of the LXC container after destroying it, with sudo as it may contain files owned by root while the runner id is not root
- os.RemoveAll only for native host runs, it is no longer needed for the LXC backend
- remove the CleanUp function that is an indirection with no use
Resolvesforgejo/runner#442
When running the test from a non-root user and without this fix, it fails as follow:
```
go test -v -count=1 -run='TestRunnerLXC' ./internal/app/run
=== RUN TestRunnerLXC
...
time="2025-10-03T15:05:12+02:00" level=debug msg=stopHostEnvironment
time="2025-10-03T15:05:13+02:00" level=debug msg="HostEnvironment.Remove /tmp/TestRunnerLXC1841090130/001/d29c1256e2912892/hostexecutor"
time="2025-10-03T15:05:13+02:00" level=error msg="Error while stop job container FORGEJO-ACTIONS-TASK-0_WORKFLOW-3ede81fbc69d42e6db70bef5820490fc3e7dc4d9dcbfb64981f2d00f08a30d6e_JOB-job: unlinkat /tmp/TestRunnerLXC1841090130/001/d29c1256e2912892/hostexecutor/some/directory/owned/by/root: permission denied"
=== NAME TestRunnerLXC
runner_test.go:469:
Error Trace: /home/earl-warren/software/runner/internal/app/run/runner_test.go:469
/home/earl-warren/software/runner/internal/app/run/runner_test.go:496
Error: Received unexpected error:
Error occurred running finally: unlinkat /tmp/TestRunnerLXC1841090130/001/d29c1256e2912892/hostexecutor/some/directory/owned/by/root: permission denied (original error: <nil>)
Test: TestRunnerLXC
Messages: OK
=== NAME TestRunnerLXC/OK
testing.go:1679: test executed panic(nil) or runtime.Goexit: subtest may have called FailNow on a parent test
=== NAME TestRunnerLXC
testing.go:1267: TempDir RemoveAll cleanup: unlinkat /tmp/TestRunnerLXC1841090130/001/d29c1256e2912892/hostexecutor/some/directory/owned/by/root: permission denied
--- FAIL: TestRunnerLXC (6.84s)
--- FAIL: TestRunnerLXC/OK (6.84s)
FAIL
FAIL code.forgejo.org/forgejo/runner/v11/internal/app/run 6.847s
FAIL
```
<!--start release-notes-assistant-->
<!--URL:https://code.forgejo.org/forgejo/runner-->
- bug fixes
- [PR](https://code.forgejo.org/forgejo/runner/pulls/1054): <!--number 1054 --><!--line 0 --><!--description Zml4OiByZW1vdmUgTFhDIGJhY2tlbmQgbGVmdG92ZXJzIHdoZW4gdGhlIGpvYiBjb21wbGV0ZXM=-->fix: remove LXC backend leftovers when the job completes<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://code.forgejo.org/forgejo/runner/pulls/1054
Reviewed-by: Mathieu Fenniak <mfenniak@noreply.code.forgejo.org>
Co-authored-by: Earl Warren <contact@earl-warren.org>
Co-committed-by: Earl Warren <contact@earl-warren.org>
Additional logging to support #1044.
Manual testing only. Cases tested:
Cancel a job from Forgejo UI; this seems like the most likely missing piece in #1044 as two jobs were simultaneously marked as "Failed". There are codepaths in Forgejo that can set this state to both cancelled and failed, but the runner didn't provide log output indicating that's why a job was stopping:
```
time="2025-10-02T13:22:53-06:00" level=info msg="UpdateTask returned task result RESULT_CANCELLED for a task that was in local state RESULT_UNSPECIFIED - beginning local task termination" func="[ReportState]" file="[reporter.go:410]"
```
Host-based executor hits step timeout in exec, or, is cancelled. This occurred but only logged the `err` from `exec`, not the context error indicating whether it was a timeout or a cancellation:
```
[Test Action/job1] this step has been cancelled: ctx: context deadline exceeded, exec: RUN signal: killed
[Test Action/job1] this step has been cancelled: ctx: context canceled, exec: RUN signal: killed
```
Unable to `ReportState` due to Forgejo inaccessible. If the runner isn't able to update state to Forgejo a job could be considered a zombie; this would trigger one of the codepaths where the job would be marked as failed. If connectivity was later restored, then the runner could identify it was marked as failed and cancel the job context. (This combination doesn't seem likely, but, I think it's reasonable to consider these failures as warnings because there may be unexpected errors here that we're not aware of).
```
time="2025-10-02T13:27:19-06:00" level=warning msg="ReportState error: unavailable: 502 Bad Gateway" func="[RunDaemon]" file="[reporter.go:207]"
```
Runner shutdown logging; just changed up to `Info` level:
```
time="2025-10-02T13:31:36-06:00" level=info msg="forcing the jobs to shutdown" func="[Shutdown]" file="[poller.go:93]"
[Test Action/job1] ❌ Failure - Main sleep 120
[Test Action/job1] this step has been cancelled: ctx: context canceled, exec: RUN signal: killed
```
<!--start release-notes-assistant-->
<!--URL:https://code.forgejo.org/forgejo/runner-->
- bug fixes
- [PR](https://code.forgejo.org/forgejo/runner/pulls/1048): <!--number 1048 --><!--line 0 --><!--description Zml4OiBpbXByb3ZlIGxvZ2dpbmcgdG8gZGlhZ25vc2UgbXlzdGVyeSBqb2IgdGVybWluYXRpb25z-->fix: improve logging to diagnose mystery job terminations<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://code.forgejo.org/forgejo/runner/pulls/1048
Reviewed-by: earl-warren <earl-warren@noreply.code.forgejo.org>
Co-authored-by: Mathieu Fenniak <mathieu@fenniak.net>
Co-committed-by: Mathieu Fenniak <mathieu@fenniak.net>
The working directory was not cleaned up upon completion of a LXC job because rc.stopJobContainer() -> rc.cleanUpJobContainer() -> rc.JobContainer.Remove() was never called for LXC containers.
- stopContainer() and closeContainer() must not call
rc.stopHostEnvironment(ctx) for LXC containers because
- it will needlessly be called twice
- it intercepts the call to
- rc.stopJobContainer()
- rc.JobContainer.Close()
- rc.stopHostEnvironment(ctx) must be called in rc.cleanUpJobContainer which is indirectly called by rc.stopJobContainer()
- since rc.JobContainer.Close() is a noop, not calling it for LXC containers had no consequence
Resolvesforgejo/runner#442
<!--start release-notes-assistant-->
<!--URL:https://code.forgejo.org/forgejo/runner-->
- bug fixes
- [PR](https://code.forgejo.org/forgejo/runner/pulls/1003): <!--number 1003 --><!--line 0 --><!--description Zml4OiByZW1vdmUgTFhDIHdvcmtpbmcgZGlyZWN0b3J5IHdoZW4gaXQgY29tcGxldGVz-->fix: remove LXC working directory when it completes<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://code.forgejo.org/forgejo/runner/pulls/1003
Reviewed-by: Mathieu Fenniak <mfenniak@noreply.code.forgejo.org>
Co-authored-by: Earl Warren <contact@earl-warren.org>
Co-committed-by: Earl Warren <contact@earl-warren.org>
the license change from MIT to GPLv3+ is a breaking change
Refs forgejo/runner#773
<!--start release-notes-assistant-->
<!--URL:https://code.forgejo.org/forgejo/runner-->
- other
- [PR](https://code.forgejo.org/forgejo/runner/pulls/940): <!--number 940 --><!--line 0 --><!--description Y2hvcmU6IGJ1bXAgdmVyc2lvbiB0byB2MTE=-->chore: bump version to v11<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://code.forgejo.org/forgejo/runner/pulls/940
Reviewed-by: Michael Kriese <michael.kriese@gmx.de>
Co-authored-by: Earl Warren <contact@earl-warren.org>
Co-committed-by: Earl Warren <contact@earl-warren.org>
If a --health-cmd is defined for a container, block until its status is healthy or unhealthy. The timeout is defined by the server internal logic based on associated --health-* defined delays. If it blocks indefinitely, the job timeout will eventually cancel it.
While waiting, the simplest solution would be to sleep 1 second until the container is healthy or unhealthy. To minimize log verbosity, the sleep interval is instead set to --health-interval and default to one second if it is not defined.
This logic does not apply to host containers as they do not support services. They are assumed to always be healthy.
If --health-cmd is set for the container running a job, the first step will start to run without waiting for the container to become healthy. There may be valid use cases for that but they are not the focus of this implementation.
<!--start release-notes-assistant-->
<!--URL:https://code.forgejo.org/forgejo/runner-->
- features
- [PR](https://code.forgejo.org/forgejo/runner/pulls/805): <!--number 805 --><!--line 0 --><!--description ZmVhdDogd2FpdCBmb3Igc2VydmljZXMgdG8gYmUgaGVhbHRoeSBiZWZvcmUgc3RhcnRpbmcgYSBqb2I=-->feat: wait for services to be healthy before starting a job<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://code.forgejo.org/forgejo/runner/pulls/805
Co-authored-by: Earl Warren <contact@earl-warren.org>
Co-committed-by: Earl Warren <contact@earl-warren.org>
It will be imported by Forgejo.
<!--start release-notes-assistant-->
<!--URL:https://code.forgejo.org/forgejo/runner-->
- other
- [PR](https://code.forgejo.org/forgejo/runner/pulls/777): <!--number 777 --><!--line 0 --><!--description Y2hvcmU6IHRvIGFsbG93IHRoZSBydW5uZXIgdG8gYmUgaW1wb3J0ZWQsIHY5IG5lZWRzIHRvIGJlIGluIHRoZSBnbyBtb2R1bGU=-->chore: to allow the runner to be imported, v9 needs to be in the go module<!--description-->
<!--end release-notes-assistant-->
Reviewed-on: https://code.forgejo.org/forgejo/runner/pulls/777
Reviewed-by: Michael Kriese <michael.kriese@gmx.de>
Co-authored-by: Earl Warren <contact@earl-warren.org>
Co-committed-by: Earl Warren <contact@earl-warren.org>
- upgrade to golangci-lint@v1.62.2
- make it renovate friendly
- remove most frequent lint check that are not of consequence (unused
args, etc.)
- fix remaining lint errors
- add renovate custom manager to update the Makefile variable
act PR https://github.com/nektos/act/pull/1682
* shell script to start the LXC container
* create and destroy a LXC container
* run commands with lxc-attach
* expose additional devices for docker & libvirt to work
* install node 16 & git for checkout to work
[FORGEJO] start/stop lxc working directory is /tmp
[FORGEJO] use lxc-helpers to create/destroy containers
[FORGEJO] do not setup LXC
(cherry picked from commit 5b94ff3226848791b93e72d2e0f0ee4bba29a989)
Conflicts:
pkg/container/host_environment.go
Conflicts:
pkg/container/host_environment.go
[FORGJEO] upgrade to node20
* feat: CopyTarStream
Prepare for new process and thread safe action cache
* fix unused param
---------
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
* refactor: share UpdateFromEnv logic
* Add test for GITHUB_OUTPUT
Co-authored-by: Ben Randall <veleek@gmail.com>
* Add GITHUB_STATE test
* Add test for the old broken parser
Co-authored-by: Ben Randall <veleek@gmail.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>