Skip to content

SIG/HPC Meeting 2024-03-07

Attendees

  • Forrest Burt
  • Brian Phan
  • Sherif Nagy
  • Enrico Billi
  • Neil Hanlon
  • Jeremy Siadal
  • Chris Stackpole

Old Business

  • Intel Driver -
  • Sherif is working on this, has a prototype, needs DKMS
    • Used make spec script in the branch to create spec, and import from there
    • We think that upstream should adopt a different format/packaging methodology
    • Perhaps packit could be helpful?
  • What branch/version to use?
    • rhel-specific branches say not to use them; use the 'backports' branches instead
    • sherif appears to be in the right place
  • Next steps:
    • Neil to bring dkms from epel into projects
    • Sherif to upload to public location for review and testing
    • Jeremy to work on testing with some latest hardware
  • AI SIG
    • where will userspace tools live? HPC? AI? Both?
    • Neil: it should be reasonable for us to have the ability to easily release a package in multiple SIGs
  • NVidia GPU driver Testing -
  • Did not get time to review Chris's work - will try to review this cycle
  • Kernel Cnode / MoS
  • re-actioning - Jeremy to work on once he has some time

New Business

  • Testing Warewulf - Brian
  • Current plan: put the tests upstream into Warewulf repo, Testing team can pull from / engage with upstream
    • What precisely are we going to test?
    • Functional/E2E tests -- provision a small cluster, etc (see last week's discussions)
    • Future work can include e.g. slurm
    • Chris to check on status of slurm
  • Packages to bring in
  • List on the wiki; needs updating (along with the rest of the wiki)
  • if anyone wants to bring something in, has questions, etc. Please ask/get in touch!
  • Neil to update the wiki

Open Floor

  • Vulnerability in lustre - related to user namespaces
  • Sherif was working on lustre-server, but it's a beast
  • DDN already builds RPMS, but... is it worth it to rebuild vs just use upstream?
    • Sherif: thinks it makes sense to rebuild against our specific user/kernel space
    • there are lustre-server for 8, but not 9, it appears.. why?
    • documentation supports this but again.. why?
    • Sherif to look into why lustre-server exists for 8 but not 9
  • Next meeting in two weeks on Thursday, March 1

Action Items

  • [ ] Chris to check on status of slurm
  • [ ] Neil to update the wiki
  • [ ] Sherif to look into why lustre-server exists for 8 but not 9
  • [ ] Neil to bring dkms from epel into projects
  • [ ] Sherif to upload to public location for review and testing
  • [ ] Jeremy to work on testing with some latest hardware