Today
Top Secret/SCI
Senior Level Career (10+ yrs experience)
$150,000 - $200,000
No Traveling
IT - Software
Bethesda, MD (On-Site/Office)
We are seeking a forward-leaning High Compute Engineer to lead the design, optimization, and integration of GPU-centric high-performance compute environments. The ideal candidate will be responsible for managing existing NVIDIA A100 and DGX-1 systems while designing scalable architectures to incorporate emerging GPU hardware as mission demands evolve.
This role is critical to our advanced compute initiatives, where performance, stability, and future-readiness drive every architectural decision. You'll work cross-functionally with data scientists, AI/ML developers, cybersecurity experts, and infrastructure teams to create a robust, secure, and performant GPU compute ecosystem.
This is a 100% on-site position. All work must be performed at the customer site in Bethesda at the Intelligence Community Campus.
Primary Responsibilities
• Manage, optimize, and monitor existing high-performance GPU systems including NVIDIA A100s and DGX-1 platforms.
• Architect integration plans for scaling GPU compute infrastructure, including newer platforms (e.g., H100, Grace Hopper, AMD Instinct).
• Collaborate with data science teams to fine-tune GPU workloads for AI/ML pipelines.
• Design and implement high-speed networking (InfiniBand/RDMA) and storage solutions optimized for GPU data flow.
• Develop automation workflows using infrastructure-as-code (IaC) tools (e.g., Ansible, Terraform, SaltStack).
• Ensure system security, compliance, and patch management in alignment with NIST, RMF, or agency-specific controls.
• Analyze compute performance metrics and provide strategic recommendations for system enhancements.
• Maintain documentation on system architectures, configurations, and operational procedures.
Basic Qualifications
• Bachelor's or higher degree in Computer Engineering, Computer Science, or a related field with at least 12 years of related technical experience. Additional years of experience may be considered in lieu of a degree.
• 5+ years experience supporting GPU compute environments in mission-critical or enterprise settings.
• Proficiency with NVIDIA technologies: A100, DGX-1, CUDA, cuDNN, NCCL.
• Strong background in Linux (RHEL/CentOS/Ubuntu), kernel tuning, and HPC stack deployment.
• Experience with containerized GPU workloads using Docker, Kubernetes, and NVIDIA GPU Operator.
• Familiarity with distributed compute frameworks (e.g., SLURM, Kubernetes, Ray).
• Strong scripting skills: Bash, Python, or similar.
• Proven ability to plan and execute large-scale system upgrades and migrations.
• Candidate must, at a minimum, meet DoD 8570.11- IAT Level II certification requirements (currently Security+ CE, CCNA-Security, GICSP, GSEC, or SSCP along with an appropriate computing environment (CE) certification). An IAT Level III certification would also be acceptable (CASP+, CCNP Security, CISA, CISSP, GCED, GCIH, CCSP).
Clearance
• Due to the nature of the government contracts we support, US Citizenship is required.
• TS/SCI clearance with Polygraph required or a TS/SCI and willingness to get a Poly.
Preferred Qualifications
• Experience with hybrid cloud GPU environments (AWS, GCP, or Azure with NVIDIA support).
• Familiarity with AI/ML tooling such as PyTorch, TensorFlow, ONNX, and RAPIDS.
• Experience integrating GPUs with storage systems (e.g., Lustre, BeeGFS, Ceph).
• Exposure to hardware acceleration platforms (e.g., FPGA, custom ASIC).
Company Benefits
Eligibility requirements apply to some benefits and may depend on your job
classification and length of employment. Benefits are subject to change and may be
subject to specific elections, plan, or program terms. If eligible, the benefits
available for this temporary role may include the following:
• Medical, dental & vision
• Critical Illness, Accident, and Hospital
• 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
• Life Insurance (Voluntary Life & AD&D for the employee and dependents)
• Short and long-term disability
• Health Spending Account (HSA)
• Transportation benefits
• Employee Assistance Program
• Time Off/Leave (PTO, Vacation or Sick Leave)
This role is critical to our advanced compute initiatives, where performance, stability, and future-readiness drive every architectural decision. You'll work cross-functionally with data scientists, AI/ML developers, cybersecurity experts, and infrastructure teams to create a robust, secure, and performant GPU compute ecosystem.
This is a 100% on-site position. All work must be performed at the customer site in Bethesda at the Intelligence Community Campus.
Primary Responsibilities
• Manage, optimize, and monitor existing high-performance GPU systems including NVIDIA A100s and DGX-1 platforms.
• Architect integration plans for scaling GPU compute infrastructure, including newer platforms (e.g., H100, Grace Hopper, AMD Instinct).
• Collaborate with data science teams to fine-tune GPU workloads for AI/ML pipelines.
• Design and implement high-speed networking (InfiniBand/RDMA) and storage solutions optimized for GPU data flow.
• Develop automation workflows using infrastructure-as-code (IaC) tools (e.g., Ansible, Terraform, SaltStack).
• Ensure system security, compliance, and patch management in alignment with NIST, RMF, or agency-specific controls.
• Analyze compute performance metrics and provide strategic recommendations for system enhancements.
• Maintain documentation on system architectures, configurations, and operational procedures.
Basic Qualifications
• Bachelor's or higher degree in Computer Engineering, Computer Science, or a related field with at least 12 years of related technical experience. Additional years of experience may be considered in lieu of a degree.
• 5+ years experience supporting GPU compute environments in mission-critical or enterprise settings.
• Proficiency with NVIDIA technologies: A100, DGX-1, CUDA, cuDNN, NCCL.
• Strong background in Linux (RHEL/CentOS/Ubuntu), kernel tuning, and HPC stack deployment.
• Experience with containerized GPU workloads using Docker, Kubernetes, and NVIDIA GPU Operator.
• Familiarity with distributed compute frameworks (e.g., SLURM, Kubernetes, Ray).
• Strong scripting skills: Bash, Python, or similar.
• Proven ability to plan and execute large-scale system upgrades and migrations.
• Candidate must, at a minimum, meet DoD 8570.11- IAT Level II certification requirements (currently Security+ CE, CCNA-Security, GICSP, GSEC, or SSCP along with an appropriate computing environment (CE) certification). An IAT Level III certification would also be acceptable (CASP+, CCNP Security, CISA, CISSP, GCED, GCIH, CCSP).
Clearance
• Due to the nature of the government contracts we support, US Citizenship is required.
• TS/SCI clearance with Polygraph required or a TS/SCI and willingness to get a Poly.
Preferred Qualifications
• Experience with hybrid cloud GPU environments (AWS, GCP, or Azure with NVIDIA support).
• Familiarity with AI/ML tooling such as PyTorch, TensorFlow, ONNX, and RAPIDS.
• Experience integrating GPUs with storage systems (e.g., Lustre, BeeGFS, Ceph).
• Exposure to hardware acceleration platforms (e.g., FPGA, custom ASIC).
Company Benefits
Eligibility requirements apply to some benefits and may depend on your job
classification and length of employment. Benefits are subject to change and may be
subject to specific elections, plan, or program terms. If eligible, the benefits
available for this temporary role may include the following:
• Medical, dental & vision
• Critical Illness, Accident, and Hospital
• 401(k) Retirement Plan – Pre-tax and Roth post-tax contributions available
• Life Insurance (Voluntary Life & AD&D for the employee and dependents)
• Short and long-term disability
• Health Spending Account (HSA)
• Transportation benefits
• Employee Assistance Program
• Time Off/Leave (PTO, Vacation or Sick Leave)
group id: 10105424
Accelerating IT transformation in the public sector