Machine Learning Performance Engineer

Website Northeastern University

About the Opportunity

This job description is intended to describe the general nature and level of work being performed by people assigned to this classification. It is not intended to be construed as an exhaustive list of all responsibilities, duties and skills required of personnel so classified.

JOB SUMMARY

As part of the Research Computing (RC) team at Northeastern University (NU), the Machine Learning Performance Engineer (MLPE) supports th incredible growth of RC’s user base, computing infrastructure, and ever-increasing need to support Artificial Intelligence/Deep Learning (AI/DL) workloads. The MLPE will help improve the overall reliability and efficiency of GPU-based software applications and optimize the per-watt performance of the ML models on the NU High Performance Computing (HPC) resources. The MLPE will provide key support in the full development cycle of ML models to ensure their optimal performance for faculty research groups in addition to helping increase both the adoption and efficient use of the university’s HPC resources in executing AI/DL workloads.

The key areas of focus will be to serve as a catalyst in the effective use of NU computing resources at the Massachusetts Green High Performance Computing Center (MGHPCC): classify performance test results, predict workload performance bottleneck, detect and monitor anomalies, and contribute to software development lifecycle (SDLC). In support of the research enterprise, the MLPE will deploy strategic AI/ML cloud technologies leveraging both commercial and open-source cloud platforms as well as portable, performant solutions across NURC’s on-prem and cloud infrastructure.

This position will also work with faculty and researchers, advise members of the research community on best practices, and assist them in getting the most out of NU’s cloud and HPC service offerings. The MLPE will also participate in and provide assistance with research proposals related to computational science and the use of HPC solutions for a diverse range of scientific research.

At Team ITS, your success matters as much as the mission. Learn more about our flexible, highly dynamic, and values-first culture at careers.its.northeastern.edu.

This position is eligible for remote work.

MINIMUM QUALIFICATIONS

Expert knowledge with an object-oriented language (e.g. C++, Java). Proficiency in the areas of GPUs and GPU-based software applications; machine learning, modeling, measurement techniques, testing, and statistical methods; and Python. Ability to work with faculty to build technically-focused proposals around cloud and HPC solutions. Excellent time management skills. Ability to manage multiple projects simultaneously, plan and implement project specifications, report project
status, and identify delays or resource shortages. Ability to work with high functioning teams to provide timely build/response for variety of technology projects. Excellent verbal and written communication skills with an ability to communicate solutions by providing both technical and non-technical interpretations of models and results. Knowledge and skills required for this role are typically acquired through a combination of formal education and experience: Master’s degree in a computational science or a related field. Minimum of 2-3 years of experience in performance optimization. Minimum of 1-2 years of experience in high performance computing. Minimum of 1-2 years of research experience in a higher education or government setting. Additionally, experience should include working with: linking the performance of hardware and software components of scientific applications, with emphasis on GPUs and GPU-based applications; optimization techniques including: SIMD (SSE, AVX), vectorization, loop dependencies, multithreading, multi-processor usage, and tensor cores; diverse communities regarding complex computing requirements and capabilities; batch management systems (e.g. Slurm, PBS, SGE, etc), including cluster configuration and management tools; and leveraging open source or commercial cloud technologies (e.g. Open Science Grid, AWS, Azure, GCP).

KEY RESPONSIBILITIES & ACCOUNTABILITIES

  • Profile various algorithms to analyze performance and identify bottlenecks.
    • Profiling includes data loading, data movement, data caching, operation count, execution chipset, warm-up latency and others.
      • Implement solutions to the identified bottlenecks. 40%
      • Understand performance test result classification based on hardware problems, data problems, performance bugs in the code, configuration problems, new features causing regression, and/or framework problems. 20%
      • Apply Artificial Intelligence and Machine Learning to perform analysis of datapoints collected through execution of multiple tests over time and use results to predict performance bottlenecks. 20%
      • Enhance communication with the faculty about technology in support of their research. 10%
      • Forecast and anticipate the need for new HPC technologies and services to improve NU’s standing as a state-of-the-art educational and research environment. 10%
      • Ensure the maintenance and/or creation of documentation, training (internal and external), and communication in support of the high performance computing infrastructure.
      • Follow advancements on the cloud, research, and parallel computing fronts. 10%

      Position Type

      Research

      Additional Information

      Northeastern University is an equal opportunity employer, seeking to recruit and support a broadly diverse community of faculty and staff. Northeastern values and celebrates diversity in all its forms and strives to foster an inclusive culture built on respect that affirms inter-group relations and builds cohesion.

      All qualified applicants are encouraged to apply and will receive consideration for employment without regard to race, religion, color, national origin, age, sex, sexual orientation, disability status, or any other characteristic protected by applicable law.

      To learn more about Northeastern University’s commitment and support of diversity and inclusion, please see www.northeastern.edu/diversity.

      To apply, visit https://northeastern.wd1.myworkdayjobs.com/en-US/careers/job/Boston-MA-Main-Campus/Machine-Learning-Performance-Engineer_R108800jeid-97b85b86efa7b14cb800e3a08ad7dbf6 

Northeastern is an Equal Opportunity/ Affirmative Action, Title IX educational institution and employer. Minorities, women, and persons with disabilities are strongly encouraged to apply.

To apply for this job please visit apptrkr.com.