user avatar

HPC Support Engineer

General Dynamics Information Technology

Today
Public Trust
Unspecified
Unspecified
Rockville, MD (On-Site/Office)

Seize your opportunity to make a personal impact as a HPC Systems Engineer supporting NIH's National Institute of Allergy and Infectious Disease. GDIT is your place to make meaningful contributions to challenging projects and grow a rewarding career.

At GDIT, people are our differentiator. As a HPC Systems Engineer you will help ensure today is safe and tomorrow is smarter. Our work depends on HPC Systems Engineer joining our team to bridge the gap between our researchers and the high performance computing resources. You will be one of the faces of our High Performance Compute (HPC) clusters to the NIAID research community who will rely on you to help them get their important research work done. You will focus on supporting HPC hardware, installing scientific applications, optimizing submission scripts and running jobs, and monitoring the health of NIAID's HPC clusters; a 4000+ core HPC cluster that is GPU-focused and a 1,500+ core HPC cluster.

Position is primarily remote; however, you must be able to commute at your own expense to the NIAID's datacenter in Rockville, Maryland approximately once a week to meet contractual obligations.

How a HPC Systems Engineer will Make an Impact:
  • Work with a 4000+ core HPC cluster that is GPU-focused and a 1,500+ HPC cluster supporting the hardware and operating system environments
  • Supporting bioinformatics applications for a large and diverse research community with needs in genomics, cryo-electron microscopy, and AI/ML
  • Monitor the portfolio of software applications and be proactive in planning upgrades and license renewals
  • Monitor and report on cluster performance and generate data to show usage and trends
  • Triage support requests from the research community and work with others in the Scientific Infrastructure team to resolve issues and complete service requests
  • Collaborate with researchers to guide them in effective use of the HPC resources, such as job scheduler submission, data formats, and building data workflows
  • Engage with researchers to understand their HPC needs to include data life cycle management, integration of scientific instruments to HPC, and storage capacity and compute requirements
  • Provide input to the Scientific Infrastructure team leader for setting priorities for cluster operations, scheduling policies, resources needed, etc.
  • Attend and actively participate in daily standup meetings to provide updates on progress, discuss obstacles, and co-ordinate tasks with other team members
  • Work collaboratively in a team environment to achieve project goals
  • Engage in open communication, share knowledge, and support fellow teammates
  • Provide feedback and contribute to the continuous improvement of team processes


Required Skills and Experience:
  • BS/BA (or equivalent) and 5+ years of related experience
  • 5+ years of experience managing physical servers, datacenters, networking, and related technologies
  • 5+ years of experience managing Linux systems
  • Experience with Spack package manager, including making packages from PyPi, R, Github
  • Experience installing and packaging GPU applications and optimizing job submission scripts that are used for ML model training, data mining operations, or high-res graphics rendering
  • Experience with Python scripting
  • Experience using Git distributed workflows
  • Experience with Ansible manage system configuration
  • Experience with Terraform for provisioning systems
  • Ability to translate technical concepts in HPC and research computing to scientists and other non- technical personnel
  • Ability to determine meaningful metrics and usage data for leadership
  • HPC scheduler experience (esp. SLURM)
  • Must be able to obtain a NIH Public Trust


Desired Skills and Experience
  • 10 years managing Linux systems
  • 10 years managing hardware in datacenters
  • 3 years of experience using Amazon Web Services (AWS)
  • Experience developing Continuous Integration / Continuous Delivery workflows


Work Requirements
group id: 90979310

Explore the Art of the Possible | GDIT

job ad image
Find General Dynamics Information Technology on Social Media
Network Employers
user avatar
About Us
We are GDIT. The people supporting and securing some of the most complex government, defense, and intelligence projects across the country. We ensure today is safe and tomorrow is smarter. Our work has meaning and impact on the world around us, but also on us, and that’s important.

GDIT is your place. You make it your own by embracing autonomy, seizing opportunity, and being trusted to deliver your best every day

GDIT
Opportunity Owned
job ad2 image

General Dynamics Information Technology Jobs


Clearance Level
Public Trust