Posted today
Public Trust
Early Career (2+ yrs experience)
Unspecified
IT - Data Science
Remote/Hybrid•Atlanta, GA (Off-Site/Hybrid)
CODE Plus, Inc., an experienced IT government contractor in Fairfax, VA with offices in Huntsville, AL and Oak Ridge, TN have been in business for 30 years and have been servicing different agencies within the Federal sector. Our mission is to deliver high-quality, cost-effective solutions that empower our clients to achieve their goals. At CODEplus, we value teamwork, integrity, and technical excellence, and we pride ourselves on maintaining long-standing partnerships built on trust and results.
Our CODEplus team is looking for a Cloudera Data Engineer in Atlanta, GA (Hybrid).
In this role, you’ll work with a variety of technologies across the Hadoop and Cloudera ecosystems to move, transform, and optimize data from multiple sources. You’ll partner with data engineers, analysts, and developers to make sure the data infrastructure is efficient, scalable, and secure—enabling smarter, faster decisions for our clients.
You’ll spend your days designing and deploying data solutions, improving data processing performance, and supporting the overall health of our data environment. This position is ideal for someone who enjoys hands-on technical work, problem-solving, and continuous learning in a collaborative, team-oriented setting.
What You’ll Do
Design and build data pipelines to extract, transform, and load (ETL) large data sets from multiple sources into the Cloudera environment.
Manage and optimize data infrastructure for high performance, reliability, and scalability across both on-premise and cloud environments.
Develop and maintain scripts and workflows using Python, Java, Scala, or Pig to automate data processing and integration tasks.
Collaborate with cross-functional teams to understand data requirements, develop solutions, and ensure data is accurate and available for analytics and reporting.
Monitor and troubleshoot Cloudera environments, leveraging tools such as Cloudera Manager and Hue for system health, tuning, and debugging.
Use generative AI tools (like GitHub Copilot, Codex, or Claude Code) to enhance development efficiency and code quality.
Basic Qualifications
3+ years of experience using Hadoop technologies (including Spark) to ingest, transform, and process data
Experience with Cloudera installation, configuration, tuning, and administration
Experience developing and managing NiFi pipelines for data ingestion and transformation
Experience leveraging generative AI coding tools to assist in development
Experience working in SQL (Hive, Spark SQL, or Impala) for querying and managing data within the Cloudera ecosystem
Experience with public cloud platforms such as AWS or Microsoft Azure
Knowledge of Python, Java, Scala, or Bash for data engineering and automation
Possession of strong collaboration and communication skills with the ability to work effectively in a cross-functional team environment
Ability to obtain and maintain a Public Trust or Suitability/Fitness determination based on client requirements
Bachelor’s degree
Nice to Have
Experience with Terraform for infrastructure automation and deployment
Experience with with CI/CD tools and DevOps best practices
Knowledge of data governance, metadata management, and data catalog tools
Ability to optimize queries and resource usage for better performance and efficiency
Our CODEplus team is looking for a Cloudera Data Engineer in Atlanta, GA (Hybrid).
In this role, you’ll work with a variety of technologies across the Hadoop and Cloudera ecosystems to move, transform, and optimize data from multiple sources. You’ll partner with data engineers, analysts, and developers to make sure the data infrastructure is efficient, scalable, and secure—enabling smarter, faster decisions for our clients.
You’ll spend your days designing and deploying data solutions, improving data processing performance, and supporting the overall health of our data environment. This position is ideal for someone who enjoys hands-on technical work, problem-solving, and continuous learning in a collaborative, team-oriented setting.
What You’ll Do
Design and build data pipelines to extract, transform, and load (ETL) large data sets from multiple sources into the Cloudera environment.
Manage and optimize data infrastructure for high performance, reliability, and scalability across both on-premise and cloud environments.
Develop and maintain scripts and workflows using Python, Java, Scala, or Pig to automate data processing and integration tasks.
Collaborate with cross-functional teams to understand data requirements, develop solutions, and ensure data is accurate and available for analytics and reporting.
Monitor and troubleshoot Cloudera environments, leveraging tools such as Cloudera Manager and Hue for system health, tuning, and debugging.
Use generative AI tools (like GitHub Copilot, Codex, or Claude Code) to enhance development efficiency and code quality.
Basic Qualifications
3+ years of experience using Hadoop technologies (including Spark) to ingest, transform, and process data
Experience with Cloudera installation, configuration, tuning, and administration
Experience developing and managing NiFi pipelines for data ingestion and transformation
Experience leveraging generative AI coding tools to assist in development
Experience working in SQL (Hive, Spark SQL, or Impala) for querying and managing data within the Cloudera ecosystem
Experience with public cloud platforms such as AWS or Microsoft Azure
Knowledge of Python, Java, Scala, or Bash for data engineering and automation
Possession of strong collaboration and communication skills with the ability to work effectively in a cross-functional team environment
Ability to obtain and maintain a Public Trust or Suitability/Fitness determination based on client requirements
Bachelor’s degree
Nice to Have
Experience with Terraform for infrastructure automation and deployment
Experience with with CI/CD tools and DevOps best practices
Knowledge of data governance, metadata management, and data catalog tools
Ability to optimize queries and resource usage for better performance and efficiency
group id: 10124632