Posted today
Top Secret
Unspecified
Unspecified
Rockville, MD (On-Site/Office)
AI/ML Engineer - Local LLM & RAG Systems
PTFS is seeking an experienced AI/ML Engineer with strong expertise in deploying and managing locally hosted Large Language Models (LLMs) and building
Retrieval-Augmented Generation (RAG) pipelines. The ideal candidate has hands-on experience with frameworks such as Ollama, LangChain, LlamaIndex, or VLLM, and is highly skilled in Python-based orchestration, vector search, and scalable data storage systems such as Vector Databases or Apache Solr. This role will be responsible for designing, optimizing, and maintaining our on-premise or air-gapped GenAI infrastructure, integrating new models, and keeping our architecture modular and future-proof.
LLM Deployment & Orchestration
Deploy, run, and optimize locally hosted LLMs using frameworks such as Ollama, VLLM, GPT4All, or HuggingFace Transformers.
RAG Pipeline Development
Architect end-to-end Retrieval-Augmented Generation (RAG) systems.
Data Storage and Retrieval
Application & API Development
Build backend services and APIs that interact with LLMs, embedding pipelines, and retrieval layers.
System Performance, Optimization & Monitoring
** Collaboration & Architecture**
Required Skills & Experience
Technical Skills
ML/AI Knowledge
Soft Skills
Summary
This role is ideal for someone who enjoys building practical, production-ready AI systems, particularly local LLMs, and wants to work at the cutting edge of the GenAI landscape-integrating models, designing robust retrieval systems, and ensuring future scalability.
PTFS is seeking an experienced AI/ML Engineer with strong expertise in deploying and managing locally hosted Large Language Models (LLMs) and building
Retrieval-Augmented Generation (RAG) pipelines. The ideal candidate has hands-on experience with frameworks such as Ollama, LangChain, LlamaIndex, or VLLM, and is highly skilled in Python-based orchestration, vector search, and scalable data storage systems such as Vector Databases or Apache Solr. This role will be responsible for designing, optimizing, and maintaining our on-premise or air-gapped GenAI infrastructure, integrating new models, and keeping our architecture modular and future-proof.
LLM Deployment & Orchestration
Deploy, run, and optimize locally hosted LLMs using frameworks such as Ollama, VLLM, GPT4All, or HuggingFace Transformers.
- Build and maintain model-serving pipelines with Python, including GPU optimization, quantization, batching, and model switching.
- Implement flexible architecture allowing rapid integration of new open-source or proprietary models.
RAG Pipeline Development
Architect end-to-end Retrieval-Augmented Generation (RAG) systems.
- Design and implement vector embedding, indexing, and retrieval layers, including chunking, metadata management, and routing logic.
- Integrate RAG flows using LangChain or LlamaIndex, ensuring low latency and high retrieval accuracy.
Data Storage and Retrieval
- Develop and maintain Vector Databases such as:
- Pinecone
- Weaviate
- Chroma
- Milvus
- FAISS
- Or , architect a schema and search strategy for a Solr-based alternative using traditional indexing/search if vectors are not used.
- Manage ingestion pipelines, embedding generation, and update workflows for newly added data sources.
Application & API Development
Build backend services and APIs that interact with LLMs, embedding pipelines, and retrieval layers.
- Integrate agents, tools, and orchestration flows using:
- LangChain
- OpenAI function-calling equivalents in local models
- Custom Python toolchains
- Deploy services using Docker, Kubernetes, or local orchestrators when needed.
System Performance, Optimization & Monitoring
- Optimize model performance, including:
- quantization (GGUF, GPTQ, AWQ)
- tensor parallelization
- caching strategies
- Monitor system resources for memory, GPU/CPU utilization, and throughput.
- Implement automated pipelines to update models, refresh embedding stores, and version datasets.
** Collaboration & Architecture**
- Work with cross-functional teams to align the LLM capabilities with business needs.
- Provide guidance on GenAI trends, limitations, and best practices.
- Contribute to documentation and provide internal training when needed.
Required Skills & Experience
Technical Skills
- 3-7+ years of experience in Machine Learning, MLOps, Backend Engineering, or AI Infrastructure.
- Expert-level proficiency in Python and relevant libraries (FastAPI, Pydantic, PyTorch, HuggingFace Ecosystem).
- Hands-on experience with LLM deployment via:
- Ollama
- VLLM
- GPT4All
- HuggingFace Transformers
- LM Studio
- Strong experience with RAG frameworks:
- LangChain
- LlamaIndex
- Proficiency with vector databases (Pinecone, Chroma, Weaviate, FAISS, Milvus).
- Experience with Solr, Elasticsearch, or OpenSearch (schema design, analyzers, indexing).
- Experience developing embeddings pipelines, chunking strategies, and metadata retrieval.
- Familiarity with containerization and orchestration (Docker, Kubernetes optional).
- Strong experience with model inference optimization: quantization, batching, GPU acceleration.
ML/AI Knowledge
- Understanding of foundational LLM mechanics: transformers, tokenization, context windows, prompt engineering.
- Experience with model fine-tuning, LoRA adapters, or supervised fine-tuning (a plus).
- Knowledge of GenAI architectural patterns, agents, routing, tool use, and document indexing strategies.Preferred Qualifications Experience working in air-gapped or on-premise environments.
- Experience with CI/CD for ML systems.
- Familiarity with:
- Nvidia GPU stack (CUDA, cuBLAS, TensorRT)
- DevOps tools (Terraform, Ansible, Helm charts)
- Exposure to hybrid search systems combining vector + keyword retrieval (BM25 + embeddings).
- Experience integrating LLMs into enterprise systems.Education
- Bachelor's degree in Computer Science, Data Science, Engineering, Mathematics, or related field.
- Master's or higher preferred, but equivalent experience accepted.
Soft Skills
- Strong problem-solving ability and comfort working with ambiguous or evolving requirements.
- Excellent communication and ability to translate technical concepts for non-technical teams.
- Self-driven with a passion for exploring new GenAI technologies and keeping current with evolving LLM tools.
Summary
This role is ideal for someone who enjoys building practical, production-ready AI systems, particularly local LLMs, and wants to work at the cutting edge of the GenAI landscape-integrating models, designing robust retrieval systems, and ensuring future scalability.
group id: RTL253009