Job Requirements
Bethesda, MD
Top Secret/SCI CI Polygraph
Career Level not specified
$175,000 - $260,000
Job Description
GPU Systems Engineer (Advanced / Senior)
Overview
We are seeking a highly experienced systems engineer with deep expertise in high-performance computing environments, including GPU-based infrastructure, operating systems, and high-speed networking. This role focuses on designing, optimizing, and maintaining large-scale GPU-enabled environments that support advanced computational workloads, including AI/ML processing and data-intensive applications.
This position requires hands-on work within a secure, on-site environment supporting complex technical systems and mission-critical operations.
Key Responsibilities
Design, configure, and maintain GPU-based compute clusters supporting large-scale processing workloads
Collaborate with cross-functional engineering teams to define system architectures that meet performance, scalability, and efficiency requirements
Integrate GPU platforms into Linux-based environments, ensuring compatibility, reliability, and optimized performance
Analyze system and GPU performance, identify bottlenecks, and implement improvements across hardware and software layers
Develop and maintain tools for debugging, profiling, and performance analysis in Linux environments
Leverage scripting and automation tools such as Python, Bash, and configuration management frameworks to streamline operations
Maintain technical documentation including system architectures, configurations, and operational procedures
Support compliance efforts and ensure adherence to required security and operational standards
Required Qualifications
Extensive experience in systems engineering, with a focus on high-performance or GPU-enabled environments
Strong background working with GPU data center platforms, including modern accelerator technologies
Experience with enterprise server hardware components, including storage systems, network interfaces, and high-speed interconnects
Advanced knowledge of Linux operating systems (such as common enterprise distributions)
Proven ability to troubleshoot complex system and infrastructure issues across hardware and software layers
Strong collaboration and problem-solving skills within technical team environments
Relevant industry certification aligned with information assurance or system security requirements
Preferred / Nice-to-Have Skills
Experience managing containerized or orchestrated environments, including Kubernetes-based systems
Familiarity with AI/ML workflow orchestration tools or similar pipeline frameworks
Exposure to GPU virtualization or cloud-based GPU infrastructure
Experience implementing or supporting system monitoring and observability tools
Familiarity with distributed workload scheduling systems used in high-performance computing environments
Work Environment
Full-time, on-site role within a secure operational facility
Daily on-site presence required
Collaboration with multidisciplinary technical teams supporting advanced computing environments
Overview
We are seeking a highly experienced systems engineer with deep expertise in high-performance computing environments, including GPU-based infrastructure, operating systems, and high-speed networking. This role focuses on designing, optimizing, and maintaining large-scale GPU-enabled environments that support advanced computational workloads, including AI/ML processing and data-intensive applications.
This position requires hands-on work within a secure, on-site environment supporting complex technical systems and mission-critical operations.
Key Responsibilities
Design, configure, and maintain GPU-based compute clusters supporting large-scale processing workloads
Collaborate with cross-functional engineering teams to define system architectures that meet performance, scalability, and efficiency requirements
Integrate GPU platforms into Linux-based environments, ensuring compatibility, reliability, and optimized performance
Analyze system and GPU performance, identify bottlenecks, and implement improvements across hardware and software layers
Develop and maintain tools for debugging, profiling, and performance analysis in Linux environments
Leverage scripting and automation tools such as Python, Bash, and configuration management frameworks to streamline operations
Maintain technical documentation including system architectures, configurations, and operational procedures
Support compliance efforts and ensure adherence to required security and operational standards
Required Qualifications
Extensive experience in systems engineering, with a focus on high-performance or GPU-enabled environments
Strong background working with GPU data center platforms, including modern accelerator technologies
Experience with enterprise server hardware components, including storage systems, network interfaces, and high-speed interconnects
Advanced knowledge of Linux operating systems (such as common enterprise distributions)
Proven ability to troubleshoot complex system and infrastructure issues across hardware and software layers
Strong collaboration and problem-solving skills within technical team environments
Relevant industry certification aligned with information assurance or system security requirements
Preferred / Nice-to-Have Skills
Experience managing containerized or orchestrated environments, including Kubernetes-based systems
Familiarity with AI/ML workflow orchestration tools or similar pipeline frameworks
Exposure to GPU virtualization or cloud-based GPU infrastructure
Experience implementing or supporting system monitoring and observability tools
Familiarity with distributed workload scheduling systems used in high-performance computing environments
Work Environment
Full-time, on-site role within a secure operational facility
Daily on-site presence required
Collaboration with multidisciplinary technical teams supporting advanced computing environments
group id: kforcecx
We offer roles across all three clearance levels: Confidential, Secret and Top Secret. With a Top Secret Facilities clearance, a proven subcontractor track record and a deep understanding of agencies across Defense, Intelligence, Homeland, Justice and Federal Civilian Sectors, Kforce brings more than 20 years of experience to supporting critical missions at federal, state and local levels.