user avatar

AI/ML Engineer

PTFS

Posted today
Top Secret
Unspecified
Unspecified
Rockville, MD (On-Site/Office)

AI/ML Engineer - Local LLM & RAG Systems

PTFS is seeking an experienced AI/ML Engineer with strong expertise in deploying and managing locally hosted Large Language Models (LLMs) and building

Retrieval-Augmented Generation (RAG) pipelines. The ideal candidate has hands-on experience with frameworks such as Ollama, LangChain, LlamaIndex, or VLLM, and is highly skilled in Python-based orchestration, vector search, and scalable data storage systems such as Vector Databases or Apache Solr. This role will be responsible for designing, optimizing, and maintaining our on-premise or air-gapped GenAI infrastructure, integrating new models, and keeping our architecture modular and future-proof.

LLM Deployment & Orchestration

Deploy, run, and optimize locally hosted LLMs using frameworks such as Ollama, VLLM, GPT4All, or HuggingFace Transformers.
  • Build and maintain model-serving pipelines with Python, including GPU optimization, quantization, batching, and model switching.
  • Implement flexible architecture allowing rapid integration of new open-source or proprietary models.

RAG Pipeline Development

Architect end-to-end Retrieval-Augmented Generation (RAG) systems.
  • Design and implement vector embedding, indexing, and retrieval layers, including chunking, metadata management, and routing logic.
  • Integrate RAG flows using LangChain or LlamaIndex, ensuring low latency and high retrieval accuracy.


Data Storage and Retrieval

  • Develop and maintain Vector Databases such as:
    • Pinecone
    • Weaviate
    • Chroma
    • Milvus
    • FAISS
  • Or , architect a schema and search strategy for a Solr-based alternative using traditional indexing/search if vectors are not used.
  • Manage ingestion pipelines, embedding generation, and update workflows for newly added data sources.


Application & API Development

Build backend services and APIs that interact with LLMs, embedding pipelines, and retrieval layers.
  • Integrate agents, tools, and orchestration flows using:


  • LangChain
  • OpenAI function-calling equivalents in local models
  • Custom Python toolchains


  • Deploy services using Docker, Kubernetes, or local orchestrators when needed.

System Performance, Optimization & Monitoring
  • Optimize model performance, including:
    • quantization (GGUF, GPTQ, AWQ)
    • tensor parallelization
    • caching strategies


  • Monitor system resources for memory, GPU/CPU utilization, and throughput.
  • Implement automated pipelines to update models, refresh embedding stores, and version datasets.


** Collaboration & Architecture**

  • Work with cross-functional teams to align the LLM capabilities with business needs.
  • Provide guidance on GenAI trends, limitations, and best practices.
  • Contribute to documentation and provide internal training when needed.


Required Skills & Experience
Technical Skills

  • 3-7+ years of experience in Machine Learning, MLOps, Backend Engineering, or AI Infrastructure.
  • Expert-level proficiency in Python and relevant libraries (FastAPI, Pydantic, PyTorch, HuggingFace Ecosystem).
  • Hands-on experience with LLM deployment via:
    • Ollama
    • VLLM
    • GPT4All
    • HuggingFace Transformers
    • LM Studio
  • Strong experience with RAG frameworks:
    • LangChain
    • LlamaIndex
  • Proficiency with vector databases (Pinecone, Chroma, Weaviate, FAISS, Milvus).
  • Experience with Solr, Elasticsearch, or OpenSearch (schema design, analyzers, indexing).
  • Experience developing embeddings pipelines, chunking strategies, and metadata retrieval.
  • Familiarity with containerization and orchestration (Docker, Kubernetes optional).
  • Strong experience with model inference optimization: quantization, batching, GPU acceleration.


ML/AI Knowledge

  • Understanding of foundational LLM mechanics: transformers, tokenization, context windows, prompt engineering.
  • Experience with model fine-tuning, LoRA adapters, or supervised fine-tuning (a plus).
  • Knowledge of GenAI architectural patterns, agents, routing, tool use, and document indexing strategies.Preferred Qualifications Experience working in air-gapped or on-premise environments.
  • Experience with CI/CD for ML systems.
  • Familiarity with:
    • Nvidia GPU stack (CUDA, cuBLAS, TensorRT)
    • DevOps tools (Terraform, Ansible, Helm charts)
  • Exposure to hybrid search systems combining vector + keyword retrieval (BM25 + embeddings).
  • Experience integrating LLMs into enterprise systems.Education
  • Bachelor's degree in Computer Science, Data Science, Engineering, Mathematics, or related field.
  • Master's or higher preferred, but equivalent experience accepted.


Soft Skills
  • Strong problem-solving ability and comfort working with ambiguous or evolving requirements.
  • Excellent communication and ability to translate technical concepts for non-technical teams.
  • Self-driven with a passion for exploring new GenAI technologies and keeping current with evolving LLM tools.


Summary

This role is ideal for someone who enjoys building practical, production-ready AI systems, particularly local LLMs, and wants to work at the cutting edge of the GenAI landscape-integrating models, designing robust retrieval systems, and ensuring future scalability.
group id: RTL253009
N
Name HiddenRecruiter

Match Score

Powered by IntelliSearchâ„¢
image match score
Create an account or Login to see how closely you match to this job!

Similar Jobs


Clearance Level
Top Secret
Employer
PTFS