Job Requirements
Herndon, VA Springfield, VA
Top Secret/SCI CI Polygraph
Mid Level Career (5+ yrs experience)
Salary not specified
Join Premium to unlock estimated salaries
Job Description
Description of Services/Responsibilities:
• Design and execute fine-tuning pipelines for Vision-Language Models (VLMs) on domain-specific imagery datasets, including data preprocessing, training orchestration, and hyperparameter optimization
• Develop and implement evaluation frameworks for multimodal model performance, including task-specific metrics for image understanding, visual question answering, and spatial reasoning
• Build scalable training infrastructure on AWS (SageMaker, EC2 GPU instances) for distributed fine-tuning of large multimodal models
Engineer data pipelines for curating, annotating, and transforming geospatial imagery datasets into model-ready formats for supervised and instruction-tuning workflows
• Collaborate with applied scientists and solutions architects to iterate on model architectures, adapter strategies (LoRA/QLoRA), and inference optimization techniques
Basic Requirements
• TS/SCI with CI Poly required with current NGA eligibility and SBU/SECNet/COE accounts
• Must be willing to work in SCIF daily or as needed
• 5+ years of professional machine learning engineering experience with a focus on deep learning
• 1+ years of hands-on experience fine-tuning large foundation models (LLMs or VLMs)
• Experience with parameter-efficient fine-tuning methods (LoRA, QLoRA, adapters)
• Familiarity with supervised fine-tuning, instruction tuning, and RLHF/DPO alignment techniques
• 4+ years of advanced Python development for ML workloads
• Strong proficiency with PyTorch and the HuggingFace ecosystem (Transformers, PEFT, Datasets, Accelerate)
• Experience with distributed training frameworks (DeepSpeed, FSDP, or Megatron)
• 3+ years of experience with computer vision or multimodal models
• Understanding of vision transformer architectures (ViT, CLIP, LLaVA-family models, or similar)
• Experience processing and augmenting image datasets at scale
• 3+ years of experience with AWS ML infrastructure
SageMaker Training jobs, Processing jobs, and endpoint deployment
GPU instance selection, multi-node training, and cost optimization on EC2 (P4/P5/G5/G6e)
S3 data management for large-scale training datasets
• 2+ years of experience building ML evaluation pipelines
Automated benchmarking, metric computation, and result analysis
Experience with both quantitative metrics and qualitative/human evaluation approaches
• Strong software engineering fundamentals (version control, testing, CI/CD for ML workflows)
Preferred Qualifications:
• 2+ years of experience with geospatial or remote sensing imagery
Familiarity with electro-optical and SAR satellite imagery formats and characteristics
Understanding of geospatial metadata, coordinate systems, and imagery preprocessing
• Experience with model quantization and inference optimization (vLLM, TensorRT, ONNX)
Experience with MLOps and experiment tracking tools (MLflow, Weights & Biases, SageMaker Experiments)
Familiarity with data annotation platforms and active learning workflows for imagery
Experience with containerized ML workflows (Docker, ECR, ECS/EKS)
2+ years of experience with Authority to Operate (ATO) processes in government environments
Implementation of NIST 800-53 controls and security compliance for ML systems
• Experience deploying models in air-gapped or disconnected environments
Familiarity with multimodal evaluation benchmarks (MMMU, MMBench, GQA, or domain-specific equivalents)
Publications or demonstrated contributions in computer vision, VLMs, or multimodal AI
Experience with synthetic data generation for training data augmentation
Complete items below line after a partner is selected
• Design and execute fine-tuning pipelines for Vision-Language Models (VLMs) on domain-specific imagery datasets, including data preprocessing, training orchestration, and hyperparameter optimization
• Develop and implement evaluation frameworks for multimodal model performance, including task-specific metrics for image understanding, visual question answering, and spatial reasoning
• Build scalable training infrastructure on AWS (SageMaker, EC2 GPU instances) for distributed fine-tuning of large multimodal models
Engineer data pipelines for curating, annotating, and transforming geospatial imagery datasets into model-ready formats for supervised and instruction-tuning workflows
• Collaborate with applied scientists and solutions architects to iterate on model architectures, adapter strategies (LoRA/QLoRA), and inference optimization techniques
Basic Requirements
• TS/SCI with CI Poly required with current NGA eligibility and SBU/SECNet/COE accounts
• Must be willing to work in SCIF daily or as needed
• 5+ years of professional machine learning engineering experience with a focus on deep learning
• 1+ years of hands-on experience fine-tuning large foundation models (LLMs or VLMs)
• Experience with parameter-efficient fine-tuning methods (LoRA, QLoRA, adapters)
• Familiarity with supervised fine-tuning, instruction tuning, and RLHF/DPO alignment techniques
• 4+ years of advanced Python development for ML workloads
• Strong proficiency with PyTorch and the HuggingFace ecosystem (Transformers, PEFT, Datasets, Accelerate)
• Experience with distributed training frameworks (DeepSpeed, FSDP, or Megatron)
• 3+ years of experience with computer vision or multimodal models
• Understanding of vision transformer architectures (ViT, CLIP, LLaVA-family models, or similar)
• Experience processing and augmenting image datasets at scale
• 3+ years of experience with AWS ML infrastructure
SageMaker Training jobs, Processing jobs, and endpoint deployment
GPU instance selection, multi-node training, and cost optimization on EC2 (P4/P5/G5/G6e)
S3 data management for large-scale training datasets
• 2+ years of experience building ML evaluation pipelines
Automated benchmarking, metric computation, and result analysis
Experience with both quantitative metrics and qualitative/human evaluation approaches
• Strong software engineering fundamentals (version control, testing, CI/CD for ML workflows)
Preferred Qualifications:
• 2+ years of experience with geospatial or remote sensing imagery
Familiarity with electro-optical and SAR satellite imagery formats and characteristics
Understanding of geospatial metadata, coordinate systems, and imagery preprocessing
• Experience with model quantization and inference optimization (vLLM, TensorRT, ONNX)
Experience with MLOps and experiment tracking tools (MLflow, Weights & Biases, SageMaker Experiments)
Familiarity with data annotation platforms and active learning workflows for imagery
Experience with containerized ML workflows (Docker, ECR, ECS/EKS)
2+ years of experience with Authority to Operate (ATO) processes in government environments
Implementation of NIST 800-53 controls and security compliance for ML systems
• Experience deploying models in air-gapped or disconnected environments
Familiarity with multimodal evaluation benchmarks (MMMU, MMBench, GQA, or domain-specific equivalents)
Publications or demonstrated contributions in computer vision, VLMs, or multimodal AI
Experience with synthetic data generation for training data augmentation
Complete items below line after a partner is selected
group id: 91136741