fb09e2a1c6
I had to re-run all jobs in the CI workflow at least 10 times in the past 2 days. The problem is that when one jobs fails, all jobs need to re-run, which sometimes results in different jobs failing. It would be great if we could only re-run the jobs that failed, rather than all the jobs in the CI workflow. Going forward, we should focus on improving flaky tests, and speed the jobs which take the longest, but for now this is a good start. Before this change, we were wasting a lot of dev time - 2h in total for my last PR #1476 - but also wasting CI minutes. Some of us were even tempted to ignore CI 😱. This is a very slipper slope, and while it may feel liberating in the short-term, there are many "windmill monsters" down this path - don't do it. Have a look at the CI workflow before this change to see how many failures we had: https://github.com/dagger/dagger/actions/workflows/ci.yml Without looking at the jobs that failed, can you guess which areas are the flakiest and need our attention the most? Integration & Universe are good guesses, and I wish we could see this without digging into the CI workflow - this change does that. There is a lot more that can be improved here, but I didn't want to get too carried away. The biggest improvement that we can make is switch this to Dagger, which has some challenges, but I definitely intend to tackle them because it feels worth it. This is good enough for now. This is a ship & show PR. If all tests pass, this is a straight merge. I am keeping it atomic so that we can revert it if we don't like it. cc @aluzzardi @talentedmrjones @jlongtine @samalba @shykes @grouville Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk> |
||
---|---|---|
.. | ||
auto-release.yml | ||
docs.yml | ||
lint.yml | ||
release.yml | ||
test-docs.yml | ||
test-integration.yml | ||
test-unit.yml | ||
test-universe.yml | ||
website.yml |