Job Requirements
McLean, VA
Intel Agency (NSA, CIA, FBI, etc) Full Scope Polygraph
Mid Level Career (5+ yrs experience)
Salary not specified
Join Premium to unlock estimated salaries
Job Description
We are seeking a Data Scientist with strong data engineering capabilities to design, build, and maintain scalable data infrastructure and analytics solutions. This role focuses on developing robust data pipelines, enabling data-driven decision-making, and supporting machine learning and advanced analytics use cases across structured and unstructured data environments.
The ideal candidate has strong programming skills, experience working across distributed data systems, and the ability to translate complex technical data problems into reusable, production-ready solutions for stakeholders.
Duties & Responsibilities:
Design, build, and maintain scalable data pipelines and infrastructure supporting analytics, reporting, and machine learning use cases
Develop and optimize ETL and ELT workflows for structured and unstructured data sources
Build and maintain data integration layers across cloud, web, and on-premise systems
Ensure data quality, consistency, security, and reliability across data pipelines and storage systems
Develop data processing solutions using Python, SQL, and Bash scripting in Linux environments
Construct and optimize complex queries across multiple data sources (e.g., PostgreSQL, MySQL, Neo4j, RDS)
Develop and manage ingestion pipelines using tools such as Apache NiFi
Process and transform large-scale datasets from diverse structured and unstructured sources
Develop reusable, tested, and reproducible data workflows and Python-based modules
Use Elasticsearch and Kibana for search, indexing, and data visualization use cases
Document technical solutions, data pipelines, and methodologies for both technical and non-technical stakeholders
Communicate findings through written reports, dashboards, and oral briefings to stakeholders
Collaborate across multiple teams to support data-driven decision-making and analytics initiatives
Support knowledge sharing by explaining complex data concepts to junior team members
Requirements:
Strong experience in data engineering and data pipeline development, including ETL/ELT design and implementation
Proficiency in Python programming for data processing and automation
Strong experience with SQL and relational database systems
Experience working in Linux environments with advanced Bash scripting
Experience building and managing data pipelines using Apache NiFi or similar tools
Experience processing both structured and unstructured data sources
Experience working with Elasticsearch and Kibana
Experience using Git-based version control systems
Experience using Jupyter Notebooks for analysis and prototyping
Experience delivering technical results through documentation and stakeholder briefings
Strong communication skills and experience working with multiple stakeholders
Experience creating reusable, tested, and maintainable data solutions
Academic or professional background in math, statistics, physics, computer science, data science, or related fields
Preferred Qualifications:
Experience with cloud platforms such as AWS and cloud-based data architectures
Experience with big data processing frameworks such as Apache Spark or Trino
Experience applying machine learning algorithms and NLP techniques
Experience with containerization technologies such as Docker or Kubernetes
Experience with data visualization tools such as Tableau, Kibana, or Apache Superset
Experience working with or designing machine learning workflows and models
Experience creating training materials or technical curriculum in data or scientific domains
Familiarity with data science MLOps or production ML workflows
What we offer: Flexible time off
Full medical coverage
401(k) with company match
Referral bonuses
Performance bonuses
Life insurance and disability coverage
Tuition and training reimbursement
The ideal candidate has strong programming skills, experience working across distributed data systems, and the ability to translate complex technical data problems into reusable, production-ready solutions for stakeholders.
Duties & Responsibilities:
Design, build, and maintain scalable data pipelines and infrastructure supporting analytics, reporting, and machine learning use cases
Develop and optimize ETL and ELT workflows for structured and unstructured data sources
Build and maintain data integration layers across cloud, web, and on-premise systems
Ensure data quality, consistency, security, and reliability across data pipelines and storage systems
Develop data processing solutions using Python, SQL, and Bash scripting in Linux environments
Construct and optimize complex queries across multiple data sources (e.g., PostgreSQL, MySQL, Neo4j, RDS)
Develop and manage ingestion pipelines using tools such as Apache NiFi
Process and transform large-scale datasets from diverse structured and unstructured sources
Develop reusable, tested, and reproducible data workflows and Python-based modules
Use Elasticsearch and Kibana for search, indexing, and data visualization use cases
Document technical solutions, data pipelines, and methodologies for both technical and non-technical stakeholders
Communicate findings through written reports, dashboards, and oral briefings to stakeholders
Collaborate across multiple teams to support data-driven decision-making and analytics initiatives
Support knowledge sharing by explaining complex data concepts to junior team members
Requirements:
Strong experience in data engineering and data pipeline development, including ETL/ELT design and implementation
Proficiency in Python programming for data processing and automation
Strong experience with SQL and relational database systems
Experience working in Linux environments with advanced Bash scripting
Experience building and managing data pipelines using Apache NiFi or similar tools
Experience processing both structured and unstructured data sources
Experience working with Elasticsearch and Kibana
Experience using Git-based version control systems
Experience using Jupyter Notebooks for analysis and prototyping
Experience delivering technical results through documentation and stakeholder briefings
Strong communication skills and experience working with multiple stakeholders
Experience creating reusable, tested, and maintainable data solutions
Academic or professional background in math, statistics, physics, computer science, data science, or related fields
Preferred Qualifications:
Experience with cloud platforms such as AWS and cloud-based data architectures
Experience with big data processing frameworks such as Apache Spark or Trino
Experience applying machine learning algorithms and NLP techniques
Experience with containerization technologies such as Docker or Kubernetes
Experience with data visualization tools such as Tableau, Kibana, or Apache Superset
Experience working with or designing machine learning workflows and models
Experience creating training materials or technical curriculum in data or scientific domains
Familiarity with data science MLOps or production ML workflows
What we offer: Flexible time off
Full medical coverage
401(k) with company match
Referral bonuses
Performance bonuses
Life insurance and disability coverage
Tuition and training reimbursement
group id: 91165268