Description du poste
The HPC Infrastructure and architecture engineer will be part of the HPC team to contribute to the operational services of the UL facility of the University (especially at the networking and storage level) and to the research and knowledge in HPC by analyzing user needs, and tailoring solutions matching those needs.
Duties of the position include, but are not limited to :
Contribution to the support of the HPC facilities and associated research infrastructures, for both the first line level (Computer Help Information Point) and second line support (HPC platform maintenance, R&D, SLA enforcement) to troubleshoot and debug problems in our production systems
Assistance to the HPC direction for the plan and design of the future infrastructure (hardware and software), to constantly meet the needs in a consistent, flexible and scalable way
Contribution to the development of best-practices and cutting-edge / robust technologies in the HPC and devops ecosystem of the University;
In particular, (s)he is expected to play a leading role in the management of the HPC network (both at the Ethernet and InfiniBand level) as well as the HPC storage (SpectrumScale / GPFS and Lustre)
Ensuring the work quality and meeting deadlines
Serving as a privileged interface with the users of the UL HPC platform and contributing to the tutoring and training of UL staff members
The HPC Infrastructure and architecture engineer will report to the HPC direction (head and deputy head). (S)he will also act as a research and development engineer on specific research projects.
For further information, please contact :
sebastien.varrette uni.lu (ENSIMAG Telecoms - 2003)
Master degree in Computer Science, or equivalent degree, ideally with a speciality in networking, security and / or distributed computing
Expert knowledge in Linux system administration (especially on Redhat / CentOS distribution) and good knowledge with a solid experience for the management of networked computing environments.
Certification(s) in these domains (Redhat, Cisco etc.) is considered an asset
System administration best practices as part of all actions. In particular, are considered as an asset (in addition to the above-mentioned qualifications) :
expert level knowledge of networking (Ethernet), high speed interconnects (Infiniband), and network security principles in an HPC environment;
expert knowledge of security measures necessary to protect the facility and its data (firewalls, ACLs, network monitoring)
a good knowledge and experience in the management of parallel and distributed HPC filesystems (such as SpectrumScale / GPFS or Lustre)
understand, implement, troubleshoot, and support job scheduling, resource management and workload management systems (ideally in a Slurm-based environment), including diagnosis of failed jobs, implementation of policies, and investigations of new features and services
experience in the management and provisioning of virtualized (i.e. containerized / cloud) environments (vagrant, docker / singularity / sarus, KVM, OpenStack etc.)
experience in the monitoring of systems and storage performance, up to and including network components
excellent scripting skills (python, ruby, shell) and knowledge of configuration management and monitoring tools (puppet, ansible, icinga, cacti etc)
Experience with algorithm, computational methodologies and software development in the field of computational science.
Knowledge of machine learning, AI and / or GPU programming is desirable
Understanding and implementation of IT project management best practices. In particular, ability to manage multiple projects under strict timelines as well as the ability to work well in a demanding, dynamic environment and meet overall objectives
Commitment, team working, interpersonal skills and a critical mind
Fluent written and verbal communication skills in English are mandatory. The University of Luxembourg is set in a multilingual context, thus knowledge in at least one of the two official languages of Luxembourg (French or German) is an asset
Research experience is a plus
Voir le fichier joint