I had to re-run all jobs in the CI workflow at least 10 times in the
past 2 days. The problem is that when one jobs fails, all jobs need to
re-run, which sometimes results in different jobs failing. It would be
great if we could only re-run the jobs that failed, rather than all the
jobs in the CI workflow. Going forward, we should focus on improving
flaky tests, and speed the jobs which take the longest, but for now this
is a good start.
Before this change, we were wasting a lot of dev time - 2h in total for
my last PR #1476 - but also wasting CI minutes. Some of us were even
tempted to ignore CI 😱. This is a very slipper slope, and while it may
feel liberating in the short-term, there are many "windmill monsters"
down this path - don't do it.
Have a look at the CI workflow before this change to see how many
failures we had:
https://github.com/dagger/dagger/actions/workflows/ci.yml
Without looking at the jobs that failed, can you guess which areas are
the flakiest and need our attention the most? Integration & Universe are
good guesses, and I wish we could see this without digging into the CI
workflow - this change does that.
There is a lot more that can be improved here, but I didn't want to get
too carried away. The biggest improvement that we can make is switch
this to Dagger, which has some challenges, but I definitely intend to
tackle them because it feels worth it. This is good enough for now.
This is a ship & show PR. If all tests pass, this is a straight merge. I
am keeping it atomic so that we can revert it if we don't like it.
cc @aluzzardi @talentedmrjones @jlongtine @samalba @shykes @grouville
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Last time this ran, GoReleaser built an artefact with the wrong version
- it didn't bump it correctly. It was meant to build 0.1.0-alpha.33, but
it built 0.1.0-alpha.32 instead:
https://github.com/dagger/dagger/runs/4860126130?check_suite_focus=true#step:7:94
This new approach is a simpler and more explicit tag bump by leveraging
the semver-tool directly. A link to this utility is included in the
comments. We version it in this repository so that it is all
self-contained.
We also use the gh CLI tool directly, instead of a GitHub Action that
hides the implementation detail behind Typescript. We now have two very
simple gh CLI invocations that do all that. While we still use the
https://github.com/lewagon/wait-on-check-action GitHub Action to wait
on running checks, and abort if any check failed, I didn't want to
bundle that improvement into this PR - it's already big enough.
As a meaningful improvement, we should have a Dagger package that bumps
versions. It would have been so much easier to use that Dagger package.
That implies us switching our GitHub Actions to Dagger, which we should
totally do. Small steps ftw!
Next step: run 0.1.0 release manually
Step 2: run 0.2.0-alpha.1 release manually
Step 3: wait for 0.2.0-alpha.2 to be produced automatically, tomorrow.
Pair: @aluzzardi
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
We fixed a few issues with @shykes & @jlongtine, and @talentedmrjones
gave us this great command to run:
cd pkg/universe.dagger.io/examples/changelog.com/highlevel
dagger up --europa ./gerhard --log debug --log-format plain
7:42PM DBG system | detected buildkit config haveHostNetwork=true isActive=true version=v0.9.3
7:42PM DBG system | loading plan args=[
"./gerhard/"
]
7:42PM DBG system | vendoring packages mod=/Users/gerhard/github.com/gerhard/dagger/pkg/universe.dagger.io
7:42PM DBG system | spawning buildkit job attrs=null localdirs={
"/Users/gerhard/github.com/thechangelog/changelog.com/": "/Users/gerhard/github.com/thechangelog/changelog.com"
}
7:42PM INF actions.test.db.pull._op | computing
7:42PM INF actions.test.run._exec | computing
7:42PM INF inputs.directories.app | computing
7:42PM INF actions.dev.build._dag."0"._op | computing
7:42PM INF actions.test.build._dag."0"._op | computing
7:42PM ERR actions.test.run._exec | failed: invalid FS at path "actions.test.run._exec.input": FS is not set duration=0s
7:42PM DBG inputs.directories.app | loading local directory path=/Users/gerhard/github.com/thechangelog/changelog.com/
7:42PM ERR actions.dev.build._dag."0"._op | canceled duration=0s
7:42PM ERR actions.test.db.pull._op | canceled duration=0s
7:42PM ERR actions.test.build._dag."0"._op | canceled duration=0s
7:42PM ERR inputs.directories.app | canceled duration=0s
7:42PM FTL system | failed to up environment: task failed: actions.test.run._exec: invalid FS at path "actions.test.run._exec.input": FS is not set
The next step is to figure out why this is failing @jlongtine.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>