Job Requirements
Washington, DC
Dept of Homeland Security Polygraph Unspecified
Career Level not specified
$130,000 - $155,000
Job Description
Piper Companies is looking for a Data Engineer to join a government integrator in Washington, DC. This is a hybrid position and requires the candidate to be onsite 3 days a week and possess an active Top Secret or DHS clearance.
Essential Duties of theData Engineer:
Qualifications of theData Engineer:
Compensation for theData Engineer:
This job opens for applications on 4/6/26. Applications for this job will be accepted for at least 30 days from the posting date
#LI-HYBRID
#LI-BM2
data engineering, data pipeline design, end-to-end pipeline orchestration, data ingestion, batch ingestion, streaming ingestion, real-time processing, near-real-time processing, ETL, ELT, data extraction, data transformation, data loading, schema design, schema evolution, schema enforcement, schema-on-read, schema-on-write, data modeling, dimensional modeling, star schema, snowflake schema, fact tables, dimension tables, slowly changing dimensions, SCD Type 1, SCD Type 2, SCD Type 3, normalization, denormalization, data architecture, lakehouse architecture, data lake, data warehouse, data mart, operational data store, ODS, columnar storage, row-based storage, partitioning strategy, clustering, sharding, bucketing, data pruning, predicate pushdown, query optimization, cost-based optimizer, query execution plan, SQL performance tuning, indexing strategy, composite index, bitmap index, materialized views, window functions, CTEs, subqueries, joins, join optimization, broadcast join, shuffle join, aggregation optimization, incremental processing, change data capture, CDC, log-based CDC, snapshot-based CDC, idempotent pipelines, exactly-once semantics, at-least-once semantics, event-time processing, watermarking, late-arriving data handling, data consistency, data freshness, data latency, data availability, SLA management, pipeline reliability, retry logic, backpressure handling, fault tolerance, pipeline resilience, scalable pipelines, horizontal scaling, vertical scaling, distributed systems, task parallelism, data parallelism, partition-aware processing, orchestration frameworks, workflow orchestration, DAG design, dependency management, pipeline scheduling, backfill processing, pipeline reprocessing, parameterized pipelines, Python-based pipelines, SQL-based transformations, hybrid SQL-Python workloads, PySpark, Spark SQL, vectorized execution, pandas optimization, memory management, performance profiling, pipeline observability, logging, monitoring, alerting, metrics instrumentation, data quality checks, data validation, data profiling, anomaly detection, data completeness checks, data accuracy checks, data consistency checks, deduplication, data cleansing, data standardization, null handling, outlier handling, reference data management, master data management, MDM, data lineage, metadata management, data catalog, semantic layer, business metrics definitions, feature engineering pipelines, feature extraction, feature transformation, feature stores, offline feature store, online feature store, training-serving skew prevention, reusable features, ML-ready datasets, dataset versioning, data version control, reproducible pipelines, experiment tracking, data reproducibility, AI/ML data readiness, model training pipelines, model inference pipelines, model monitoring, data drift detection, concept drift detection, pipeline automation, CI/CD for data pipelines, data unit tests, pipeline integration testing, data contract testing, contract-driven development, schema contracts, backward compatibility, forward compatibility, versioned schemas, governance frameworks, data access control, row-level security, column-level security, data masking, tokenization, anonymization, PII handling, compliance enforcement, audit logging, privacy-by-design, cost optimization, query cost control, storage optimization, compression techniques, parquet optimization, partition pruning, workload isolation, multi-tenant data architecture, resource management, job scheduling optimization, SQL analytics, advanced SQL analytics, Python user-defined functions, UDF optimization, vectorized UDFs, data science enablement, analytics engineering, metrics layer development, self-service analytics enablement, decision intelligence pipelines
Essential Duties of theData Engineer:
- Design, develop, and optimize data pipelines and architectures that support data-driven decision-making across AI and ML initiatives
- Collaborate with data scientists, analysts, and other stakeholders to ensure data availability, integrity, and quality
- Implement ETL (Extract, Transform, Load) processes to integrate data from various sources into centralized systems
- Design, develop and maintain data models to support advanced analytics initiatives, AI/ML, Generative AI, and predictive analytics
- Work with relational databases (e.g., Oracle, PostgreSQL, MySQL, Redshift) to support data integrity and consistency
Qualifications of theData Engineer:
- Bachelor's Degree in Mathematics, Computer Science, Information Systems, or a related discipline
- 6+ years of progressive experience in data science, advanced analytics, data visualization, and reporting, with demonstrated ownership of analytical solutions from concept through delivery and operationalization
- Proven ability to lead the design, development, and deployment of data-driven solutions, including AI/ML models, predictive analytics, and business intelligence products, in production environments
- Advanced proficiency in Python for data manipulation, automation, and development of scalable analytical workflows
- Strong expertise in SQL and relational databases (e.g., Oracle, PostgreSQL, MySQL), with the ability to design efficient data models and support complex data integration needs
- Extensive experience developing automated data pipelines and analytics workflows using Python, R, SQL, and related tools, with a focus on scalability, reliability, and maintainability (e.g., Pandas, R Shiny)
Compensation for theData Engineer:
- $130,000 - $155,000 (based on experience)
- Comprehensive benefit package; Cigna Medical, Cigna Dental, Vision, 401k w/ ADP, PTO, paid holidays, sick Leave as required by law
This job opens for applications on 4/6/26. Applications for this job will be accepted for at least 30 days from the posting date
#LI-HYBRID
#LI-BM2
data engineering, data pipeline design, end-to-end pipeline orchestration, data ingestion, batch ingestion, streaming ingestion, real-time processing, near-real-time processing, ETL, ELT, data extraction, data transformation, data loading, schema design, schema evolution, schema enforcement, schema-on-read, schema-on-write, data modeling, dimensional modeling, star schema, snowflake schema, fact tables, dimension tables, slowly changing dimensions, SCD Type 1, SCD Type 2, SCD Type 3, normalization, denormalization, data architecture, lakehouse architecture, data lake, data warehouse, data mart, operational data store, ODS, columnar storage, row-based storage, partitioning strategy, clustering, sharding, bucketing, data pruning, predicate pushdown, query optimization, cost-based optimizer, query execution plan, SQL performance tuning, indexing strategy, composite index, bitmap index, materialized views, window functions, CTEs, subqueries, joins, join optimization, broadcast join, shuffle join, aggregation optimization, incremental processing, change data capture, CDC, log-based CDC, snapshot-based CDC, idempotent pipelines, exactly-once semantics, at-least-once semantics, event-time processing, watermarking, late-arriving data handling, data consistency, data freshness, data latency, data availability, SLA management, pipeline reliability, retry logic, backpressure handling, fault tolerance, pipeline resilience, scalable pipelines, horizontal scaling, vertical scaling, distributed systems, task parallelism, data parallelism, partition-aware processing, orchestration frameworks, workflow orchestration, DAG design, dependency management, pipeline scheduling, backfill processing, pipeline reprocessing, parameterized pipelines, Python-based pipelines, SQL-based transformations, hybrid SQL-Python workloads, PySpark, Spark SQL, vectorized execution, pandas optimization, memory management, performance profiling, pipeline observability, logging, monitoring, alerting, metrics instrumentation, data quality checks, data validation, data profiling, anomaly detection, data completeness checks, data accuracy checks, data consistency checks, deduplication, data cleansing, data standardization, null handling, outlier handling, reference data management, master data management, MDM, data lineage, metadata management, data catalog, semantic layer, business metrics definitions, feature engineering pipelines, feature extraction, feature transformation, feature stores, offline feature store, online feature store, training-serving skew prevention, reusable features, ML-ready datasets, dataset versioning, data version control, reproducible pipelines, experiment tracking, data reproducibility, AI/ML data readiness, model training pipelines, model inference pipelines, model monitoring, data drift detection, concept drift detection, pipeline automation, CI/CD for data pipelines, data unit tests, pipeline integration testing, data contract testing, contract-driven development, schema contracts, backward compatibility, forward compatibility, versioned schemas, governance frameworks, data access control, row-level security, column-level security, data masking, tokenization, anonymization, PII handling, compliance enforcement, audit logging, privacy-by-design, cost optimization, query cost control, storage optimization, compression techniques, parquet optimization, partition pruning, workload isolation, multi-tenant data architecture, resource management, job scheduling optimization, SQL analytics, advanced SQL analytics, Python user-defined functions, UDF optimization, vectorized UDFs, data science enablement, analytics engineering, metrics layer development, self-service analytics enablement, decision intelligence pipelines
group id: 10430981