user avatar

Performance & Reliability Engineer

Dunhill Professional Search

Posted today
Public Trust
Unspecified
Unspecified
Engineering - Mechanical
San Antonio, TX (On-Site/Office)

Job Details

Performance & Reliability Engineer

San Antonio, TX: Hybrid

US Citizenship

We are seeking a Performance & Reliability Engineer to hire in support of the EDUCATION- DCC program. This is a great opportunity for someone who enjoys collaborating across teams, solving complex technical challenges, and improving system reliability.

Job Description : Plays a crucial role in maintaining and enhancing the reliability, availability, and performance of our applications and services. You will leverage your expertise in AWS operations, infrastructure as code, and deployment automation to streamline processes, reduce downtime, and improve overall system performance.

Key Responsibilities:

• Ensure the reliability, availability, and performance of applications and services through proactive monitoring, incident response, and capacity planning.

• Manage and optimize AWS cloud infrastructure to support scalable and resilient application operations.

• Develop, implement, and maintain infrastructure as code using tools such as Terraform, CloudFormation, or similar.

• Automate deployment processes to ensure consistent and reliable delivery of software updates and infrastructure changes.

• Collaborate with development teams to design and implement solutions that enhance system performance and reliability.

• Conduct root cause analysis for incidents and implement strategies to prevent recurrence.

• Establish and maintain monitoring, alerting, and logging frameworks to ensure visibility into system health and performance.

• Participate in on-call rotations to provide 24/7 support for critical systems and applications.

• Drive continuous improvement initiatives to enhance operational efficiency and reduce technical debt.

Minimum Qualifications
  • Bachelor's Degree in Information Technology, Computer Science or a related field or equivalent relevant experience.
  • 0-3 years of experience in information technology, systems administration or other IT related field.


Job Qualifications:

• Strong expertise in AWS cloud services, including EC2, S3, RDS, Lambda, etc.

• Proficiency in infrastructure as code tools such as Terraform, CloudFormation, or similar.

• Experience with deployment automation tools and frameworks (e.g., Jenkins, Ansible, Puppet, Chef).

• Solid understanding of monitoring, alerting, and logging tools (e.g., Dynatrace, Splunk, Prometheus, Grafana, ELK Stack).

• Strong scripting and automation skills using languages such as Python, Bash, or PowerShell.

• Excellent problem-solving and troubleshooting skills.

• Strong communication and collaboration abilities.

Other Job Specific Skills
  • Strong knowledge of Microsoft Operating Systems and products that include Microsoft Windows, Windows Servers, Microsoft Office365 and SharePoint, Microsoft Teams
  • Applies standard methodology, techniques, procedures and criteria.
  • Ability to analyze, troubleshoot and resolve basic/routine system hardware, software or networking related problems.
  • Ability to plan and coordinate the deployment of new technology and resolve technical problems individually and as a project participant.
  • Ability to communicate effectively, both orally and in writing and to translate technical terminology into terms understandable to non-technical employees.
  • Exceptional customer service skills.
  • Experience preferred with cloud infrastructure, digital workspace, and storage technology


Job Requirements:

Performance & Reliability Engineer

San Antonio, TX: Hybrid

US Citizenship

We are seeking a Performance & Reliability Engineer to hire in support of the EDUCATION- DCC program. This is a great opportunity for someone who enjoys collaborating across teams, solving complex technical challenges, and improving system reliability.

Job Description: Plays a crucial role in maintaining and enhancing the reliability, availability, and performance of our applications and services. You will leverage your expertise in AWS operations, infrastructure as code, and deployment automation to streamline processes, reduce downtime, and improve overall system performance.

Key Responsibilities:

• Ensure the reliability, availability, and performance of applications and services through proactive monitoring, incident response, and capacity planning.

• Manage and optimize AWS cloud infrastructure to support scalable and resilient application operations.

• Develop, implement, and maintain infrastructure as code using tools such as Terraform, CloudFormation, or similar.

• Automate deployment processes to ensure consistent and reliable delivery of software updates and infrastructure changes.

• Collaborate with development teams to design and implement solutions that enhance system performance and reliability.

• Conduct root cause analysis for incidents and implement strategies to prevent recurrence.

• Establish and maintain monitoring, alerting, and logging frameworks to ensure visibility into system health and performance.

• Participate in on-call rotations to provide 24/7 support for critical systems and applications.

• Drive continuous improvement initiatives to enhance operational efficiency and reduce technical debt.

Minimum Qualifications
  • Bachelor's Degree in Information Technology, Computer Science or a related field or equivalent relevant experience.
  • 0-3 years of experience in information technology, systems administration or other IT related field.


Job Qualifications:

• Strong expertise in AWS cloud services, including EC2, S3, RDS, Lambda, etc.

• Proficiency in infrastructure as code tools such as Terraform, CloudFormation, or similar.

• Experience with deployment automation tools and frameworks (e.g., Jenkins, Ansible, Puppet, Chef).

• Solid understanding of monitoring, alerting, and logging tools (e.g., Dynatrace, Splunk, Prometheus, Grafana, ELK Stack).

• Strong scripting and automation skills using languages such as Python, Bash, or PowerShell.

• Excellent problem-solving and troubleshooting skills.

• Strong communication and collaboration abilities.

Other Job Specific Skills
  • Strong knowledge of Microsoft Operating Systems and products that include Microsoft Windows, Windows Servers, Microsoft Office365 and SharePoint, Microsoft Teams
  • Applies standard methodology, techniques, procedures and criteria.
  • Ability to analyze, troubleshoot and resolve basic/routine system hardware, software or networking related problems.
  • Ability to plan and coordinate the deployment of new technology and resolve technical problems individually and as a project participant.
  • Ability to communicate effectively, both orally and in writing and to translate technical terminology into terms understandable to non-technical employees.
  • Exceptional customer service skills.
  • Experience preferred with cloud infrastructure, digital workspace, and storage technology

#cjpost
group id: 10238000

Match Score

Powered by IntelliSearchâ„¢
image match score
Create an account or Login to see how closely you match to this job!