user avatar

HPC Systems Engineer

Mount Indie, LLC

Posted today
Top Secret
Unspecified
Polygraph
IT - Software
Huntsville, AL (On-Site/Office)

In this role, your daily impact spans the entire spectrum of systems engineering. One hour, you might be performing routine lifecycle maintenance-patching a fleet of RHEL workstations or managing user identities across a heterogeneous domain-to ensure the baseline stability of our enterprise. The next, you are diving into the high-performance fabric, debugging a latency spike on an InfiniBand card or fine-tuning a Slurm scheduler to prioritize a mission-critical simulation.

You aren't just managing boxes; you are the bridge between raw silicon and national security breakthroughs. Whether it's the methodical "hardening" of a standard server build to meet SAP requirements or the high-adrenaline optimization of a multi-petabyte Lustre filesystem, your work ensures that our researchers never have to wait on the infrastructure to catch up with their imagination. This position is 100% on-site.

Responsibilities
  • Architect & Deploy: Lead the design and lifecycle management of mission-critical Linux workstations, enterprise-grade servers, and high-performance computing (HPC) clusters.
  • Engineer Filesystems: Master the art of data movement. Administer complex local and distributed filesystems (Lustre, GPFS/Spectrum Scale) to ensure extreme-speed access across the fabric.
  • Infrastructure as Code (IaC): Treat the data center as a codebase. Develop sophisticated automation workflows using Python, Bash, and Ansible to eliminate manual toil and ensure drift-free configurations.
  • Defensive Engineering: Implement "Hardened by Design" security. Fine-tune SELinux policies and advanced firewall configurations to protect sensitive data without sacrificing computational performance.
  • Container Orchestration: Modernize scientific workflows by deploying and managing isolated environments using Podman while working to establish a Kubernetes environment.
  • HPC Performance Tuning: Push the limits of the silicon. Optimize cluster scheduling and management utilizing industry-leading tools like Bright Cluster Manager and Slurm.
  • Low-Latency Networking: Configure and optimize high-bandwidth networking, including InfiniBand fabrics, for seamless inter-node communication.
  • Technical Documentation: Author high-fidelity playbooks and strategic architectural diagrams that serve as the blueprint for our evolving infrastructure.


Required Experience:
  • Bachelor's Degree in related fied or equivalent high-level professional experience in mission-critical environments
  • Minimum of 1 to 10 years of related experience
  • U.S. Citizenship required: Active DoD Top Secret security clearance with eligibility for SCI along with successful completion of CI Scope Polygraph within 180 days of hire
  • Ability and willingness to obtain and maintain Special Access Program (SAP) eligibility
  • Active DoD 8570.01-M baseline certification (Security+ CE, SSCP, or equivalent)
  • Deep-tier professional experience in Linux systems engineering (RHEL/ /Rocky preferred).

Preferred Qualifications
  • Active TS/SCI clearance with a current CI Polygraph
  • Advanced Certification: RHCE, RHCSA, or similar
  • Direct experience tuning kernel parameters and MPI libraries for large-scale distributed computing
  • Expertise in VMware, Nutanix, or KVM within a heterogeneous environment that include Windows integration.
group id: 91082210
N
Name Hidden

Match Score

Powered by IntelliSearchâ„¢
image match score
Create an account or Login to see how closely you match to this job!

Similar Jobs


Job Category
IT - Software
Clearance Level
Top Secret