SIG/HPC Meeting 2024-03-07
Attendees
- Forrest Burt
- Brian Phan
- Sherif Nagy
- Enrico Billi
- Neil Hanlon
- Jeremy Siadal
- Chris Stackpole
Old Business
- Intel Driver -
- Sherif is working on this, has a prototype, needs DKMS
- Used
make spec
script in the branch to create spec, and import from there
- We think that upstream should adopt a different format/packaging methodology
- Perhaps packit could be helpful?
- What branch/version to use?
- rhel-specific branches say not to use them; use the 'backports' branches instead
- sherif appears to be in the right place
- Next steps:
- Neil to bring dkms from epel into projects
- Sherif to upload to public location for review and testing
- Jeremy to work on testing with some latest hardware
- AI SIG
- where will userspace tools live? HPC? AI? Both?
- Neil: it should be reasonable for us to have the ability to easily release a package in multiple SIGs
- NVidia GPU driver Testing -
- Did not get time to review Chris's work - will try to review this cycle
- Kernel Cnode / MoS
- re-actioning - Jeremy to work on once he has some time
New Business
- Testing Warewulf - Brian
- Current plan: put the tests upstream into Warewulf repo, Testing team can pull from / engage with upstream
- What precisely are we going to test?
- Functional/E2E tests -- provision a small cluster, etc (see last week's discussions)
- Future work can include e.g. slurm
- Chris to check on status of slurm
- Packages to bring in
- List on the wiki; needs updating (along with the rest of the wiki)
- if anyone wants to bring something in, has questions, etc. Please ask/get in touch!
- Neil to update the wiki
Open Floor
- Vulnerability in lustre - related to user namespaces
- Sherif was working on lustre-server, but it's a beast
- DDN already builds RPMS, but... is it worth it to rebuild vs just use upstream?
- Sherif: thinks it makes sense to rebuild against our specific user/kernel space
- there are lustre-server for 8, but not 9, it appears.. why?
- documentation supports this but again.. why?
- Sherif to look into why lustre-server exists for 8 but not 9
- Next meeting in two weeks on Thursday, March 1
Action Items
- [ ] Chris to check on status of slurm
- [ ] Neil to update the wiki
- [ ] Sherif to look into why lustre-server exists for 8 but not 9
- [ ] Neil to bring dkms from epel into projects
- [ ] Sherif to upload to public location for review and testing
- [ ] Jeremy to work on testing with some latest hardware