Skip to content

[WIP] Compatibility layer for EESSI version 2026.x#235

Draft
bedroge wants to merge 56 commits into
EESSI:mainfrom
bedroge:2026.x
Draft

[WIP] Compatibility layer for EESSI version 2026.x#235
bedroge wants to merge 56 commits into
EESSI:mainfrom
bedroge:2026.x

Conversation

@bedroge
Copy link
Copy Markdown
Collaborator

@bedroge bedroge commented May 8, 2026

Still WIP, but testing some new functionality already...

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 11, 2026

bot: build repo:eessi.io-2025.06-compat arch:x86_64/generic

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 11, 2026

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 11, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/156325

date job status comment
May 11 19:48:57 UTC 2026 submitted job id 156325 awaits release by job manager
May 11 19:49:15 UTC 2026 released job awaits launch by Slurm scheduler
May 11 19:55:17 UTC 2026 running job 156325 is running
May 11 19:57:20 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-156325.out
❌ some task failed
❌ no tarball found
Artefacts
Details
May 11 19:57:20 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-156325.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 11, 2026

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 11, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/156326

date job status comment
May 11 21:20:19 UTC 2026 submitted job id 156326 awaits release by job manager
May 11 21:20:30 UTC 2026 released job awaits launch by Slurm scheduler
May 11 21:25:33 UTC 2026 running job 156326 is running
May 12 00:33:38 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-156326.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1778545846.tar.gzsize: 1316 MiB (1380624192 bytes)
entries: 183549
May 12 00:33:38 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-156326.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 12, 2026

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 12, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/156327

date job status comment
May 12 07:28:32 UTC 2026 submitted job id 156327 awaits release by job manager
May 12 07:29:10 UTC 2026 released job awaits launch by Slurm scheduler
May 12 07:35:13 UTC 2026 running job 156327 is running
May 12 07:56:42 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-156327.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1778572503.tar.gzsize: 389 MiB (408295890 bytes)
entries: 173293
May 12 07:56:42 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-156327.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 12, 2026

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 12, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/156328

date job status comment
May 12 08:17:53 UTC 2026 submitted job id 156328 awaits release by job manager
May 12 08:18:48 UTC 2026 released job awaits launch by Slurm scheduler
May 12 08:19:51 UTC 2026 running job 156328 is running
May 12 14:40:14 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-156328.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1778596658.tar.gzsize: 1327 MiB (1391465593 bytes)
entries: 179430
May 12 14:40:14 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-156328.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 12, 2026

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 12, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/156329

date job status comment
May 12 15:00:39 UTC 2026 submitted job id 156329 awaits release by job manager
May 12 15:01:20 UTC 2026 released job awaits launch by Slurm scheduler
May 12 15:02:23 UTC 2026 running job 156329 is running
May 12 19:02:42 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-156329.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1778612402.tar.gzsize: 1442 MiB (1512120372 bytes)
entries: 196326
May 12 19:02:42 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-156329.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 26, 2026

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 26, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/160520

date job status comment
May 26 08:09:38 UTC 2026 submitted job id 160520 awaits release by job manager
May 26 08:10:17 UTC 2026 released job awaits launch by Slurm scheduler
May 26 08:11:20 UTC 2026 running job 160520 is running
May 26 12:08:19 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-160520.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1779797095.tar.gzsize: 1957 MiB (2052887348 bytes)
entries: 205922
May 26 12:08:19 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-160520.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 26, 2026

It got all the way to the end, but the tests failed when trying to install ReFrame due to DNS issues. This is because we had already replaced /etc/resolv.conf by a variant symlink, but that doesn't work in the container, meaning that it's an invalid symlink. Not sure how to fix that, other than skipping the tests in the playbook, and running them in the actual test step of the bot inside a container with a CVMFS mount (and, hence, actually using the variant symlinks). That will require a bit of work though.

@boegel
Copy link
Copy Markdown
Contributor

boegel commented May 27, 2026

It got all the way to the end, but the tests failed when trying to install ReFrame due to DNS issues. This is because we had already replaced /etc/resolv.conf by a variant symlink, but that doesn't work in the container, meaning that it's an invalid symlink. Not sure how to fix that, other than skipping the tests in the playbook, and running them in the actual test step of the bot inside a container with a CVMFS mount (and, hence, actually using the variant symlinks). That will require a bit of work though.

Isn't there a default target for that variant symlink? There definitely should be...
Why doesn't it work in a container?
Can't we bind-mount the /etc/resolv.conf from the host?

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 27, 2026

It got all the way to the end, but the tests failed when trying to install ReFrame due to DNS issues. This is because we had already replaced /etc/resolv.conf by a variant symlink, but that doesn't work in the container, meaning that it's an invalid symlink. Not sure how to fix that, other than skipping the tests in the playbook, and running them in the actual test step of the bot inside a container with a CVMFS mount (and, hence, actually using the variant symlinks). That will require a bit of work though.

Isn't there a default target for that variant symlink? There definitely should be... Why doesn't it work in a container? Can't we bind-mount the /etc/resolv.conf from the host?

The issue is that these variant symlinks are a CVMFS feature. You can create them without CVMFS, but then the target is just a weird-looking string / non-existing path like resolv.conf -> '$(EESSI_COMPAT_ETC_RESOLV_CONF):-/etc/resolv.conf'. So you'd really need the CVMFS client to turn that into an actual symlink, and this is done at mount time. I was hoping that I could make them work by adding them to the overlay-upper directory and launching the EESSI container, but obviously that will not work as they're not actually part of the CVMFS repo that way.

I tried looking for some other FUSE-based tool to use variant symlinks in Apptainer, and found for instance https://github.com/cccheng/varlinkfs. But the ones that I found won't work in the way we need it to work. Varlinkfs, for instance, only allows for a single directory, and the naming of the variant symlinks is different.

What I could do in the Ansible playbook is to first make regular symlinks by stripping the $(EESSI_COMPAT_ETC_RESOLV_CONF):- part, and overwrite them with variant symlinks after the tests have run. I don't fully like that, but I can't really think of a better and easy solution.

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 28, 2026

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 28, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/161100

date job status comment
May 28 13:29:54 UTC 2026 submitted job id 161100 awaits release by job manager
May 28 13:30:53 UTC 2026 released job awaits launch by Slurm scheduler
May 28 13:31:57 UTC 2026 running job 161100 is running
May 28 17:30:53 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-161100.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1779989158.tar.gzsize: 2382 MiB (2498289369 bytes)
entries: 283543
May 28 17:30:53 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-161100.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 28, 2026

It got all the way to the end, but the tests failed when trying to install ReFrame due to DNS issues. This is because we had already replaced /etc/resolv.conf by a variant symlink, but that doesn't work in the container, meaning that it's an invalid symlink. Not sure how to fix that, other than skipping the tests in the playbook, and running them in the actual test step of the bot inside a container with a CVMFS mount (and, hence, actually using the variant symlinks). That will require a bit of work though.

Isn't there a default target for that variant symlink? There definitely should be... Why doesn't it work in a container? Can't we bind-mount the /etc/resolv.conf from the host?

The issue is that these variant symlinks are a CVMFS feature. You can create them without CVMFS, but then the target is just a weird-looking string / non-existing path like resolv.conf -> '$(EESSI_COMPAT_ETC_RESOLV_CONF):-/etc/resolv.conf'. So you'd really need the CVMFS client to turn that into an actual symlink, and this is done at mount time. I was hoping that I could make them work by adding them to the overlay-upper directory and launching the EESSI container, but obviously that will not work as they're not actually part of the CVMFS repo that way.

I tried looking for some other FUSE-based tool to use variant symlinks in Apptainer, and found for instance https://github.com/cccheng/varlinkfs. But the ones that I found won't work in the way we need it to work. Varlinkfs, for instance, only allows for a single directory, and the naming of the variant symlinks is different.

What I could do in the Ansible playbook is to first make regular symlinks by stripping the $(EESSI_COMPAT_ETC_RESOLV_CONF):- part, and overwrite them with variant symlinks after the tests have run. I don't fully like that, but I can't really think of a better and easy solution.

For now I've solved it by creating regular symlinks right after a new Prefix is built, and replace them by variant symlinks at the very end. That should probably fix it.

I've also asked an AI friend if it would be possible to implement a FUSE-based tool that would allow us to use the actual variant symlinks. It actually came up with a Python script that does the trick and mounts a directory with variant symlinks at a given mount point (read-only though), where the variant symlinks can be configured by setting environment variables (before mounting, and you can only modify them by remounting). We could take that approach as well, and do the read-only mount for the bot's test step. But I found it a bit cumbersome, so I'm testing the other approach first.

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 28, 2026

Last build failed due to an download issue for a elfx86exts dependency (https://crates.io/api/v1/crates/anstyle-parse/0.2.2/download). Let's try again...

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 28, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/161102

date job status comment
May 28 17:47:25 UTC 2026 submitted job id 161102 awaits release by job manager
May 28 17:47:59 UTC 2026 released job awaits launch by Slurm scheduler
May 28 17:49:02 UTC 2026 running job 161102 is running
May 28 21:44:20 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-161102.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1780004393.tar.gzsize: 2382 MiB (2498302871 bytes)
entries: 283543
May 28 21:44:20 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-161102.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 29, 2026

Same issue..,

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 29, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/161347

date job status comment
May 29 05:16:52 UTC 2026 submitted job id 161347 awaits release by job manager
May 29 05:16:56 UTC 2026 released job awaits launch by Slurm scheduler
May 29 05:20:59 UTC 2026 running job 161347 is running
May 29 09:21:51 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-161347.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1780046207.tar.gzsize: 2382 MiB (2498643892 bytes)
entries: 283543
May 29 09:21:51 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-161347.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 29, 2026

Failed again with a 403 Forbidden, and a manual curl on the AWS cluster results in

{"errors":[{"detail":"We are unable to process your request at this time. This usually means that you are in violation of our API data access policy (https://crates.io/data-access). Please email help@crates.io and provide the request id 4a52a4cd-c9cd-fc08-bfc1-7fb306d4548a"}]}

Looks like it's related to gentoo/gentoo@3929f73, so let me try a newer commit.

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 29, 2026

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 29, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/161416

date job status comment
May 29 18:54:49 UTC 2026 submitted job id 161416 awaits release by job manager
May 29 18:55:37 UTC 2026 released job awaits launch by Slurm scheduler
May 29 19:00:40 UTC 2026 running job 161416 is running
May 29 22:59:28 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-161416.out
❌ some task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1780095367.tar.gzsize: 1980 MiB (2076767570 bytes)
entries: 206581
May 29 22:59:28 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-161416.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 30, 2026

The build finally worked again, just need to fix some failing tests (which are run as Ansible task at the end of the build).

bedroge added 2 commits May 30, 2026 15:18
Add task to create regular symlinks after building Prefix.
@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 30, 2026

The symlinks were skipped, moved that task to install_prefix.yml.

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=x86_64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 30, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: x86_64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/161706

date job status comment
May 30 13:19:51 UTC 2026 submitted job id 161706 awaits release by job manager
May 30 13:20:33 UTC 2026 released job awaits launch by Slurm scheduler
May 30 13:25:36 UTC 2026 running job 161706 is running
May 30 17:26:24 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-161706.out
✅ no task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-x86_64-1780161770.tar.gzsize: 1986 MiB (2082478502 bytes)
entries: 206407
May 30 17:26:24 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-161706.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 30, 2026

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-mc-aws for:arch=aarch64/generic

@eessi-bot-aws
Copy link
Copy Markdown

eessi-bot-aws Bot commented May 30, 2026

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-compat
Building on: generic
Building for: aarch64/generic
Job dir: /project/def-users/SHARED/jobs/2026.05/pr_235/161708

date job status comment
May 30 19:52:08 UTC 2026 submitted job id 161708 awaits release by job manager
May 30 19:52:58 UTC 2026 released job awaits launch by Slurm scheduler
May 30 19:58:01 UTC 2026 running job 161708 is running
May 31 00:02:37 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-161708.out
✅ no task failed
✅ found tarball
Artefacts
eessi-2026.06-compat-linux-aarch64-1780185524.tar.gzsize: 1939 MiB (2033491280 bytes)
entries: 206386
May 31 00:02:37 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
Failed for unknown reason
Details
✅ job output file slurm-161708.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@bedroge
Copy link
Copy Markdown
Collaborator Author

bedroge commented May 30, 2026

bot: build repo:eessi.io-2025.06-compat instance:eessi-bot-riscv for:arch=riscv64/generic

@riscv-eessi-io-bot
Copy link
Copy Markdown

riscv-eessi-io-bot Bot commented May 30, 2026

New job on instance eessi-bot-riscv for repository eessi.io-2025.06-compat
Building on: generic
Building for: riscv64/generic
Job dir: /home/eessibot/shared/jobs/2026.05/pr_235/309526

date job status comment
May 30 21:18:13 UTC 2026 submitted job id 309526 awaits release by job manager
May 30 21:18:47 UTC 2026 released job awaits launch by Slurm scheduler
May 30 21:19:51 UTC 2026 running job 309526 is running

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants