Print Close

Optimizing and parallelizing algorithms in interpreted languages

Presented During: Maximizing scientific efficiency through sustainability, reproducibility, and FAIRness

Andrew Eck Presenter
Washington University in St. Louis
St. Louis, MO
United States

Tuesday, Jun 24: 9:00 AM - 1:00 PM
Educational Course - Half Day (4 hours)

Brisbane Convention & Exhibition Centre

Room: P2 (Plaza Level)

Description

Interpreted languages are an excellent tool for prototyping algorithms, but a major tradeoff compared to compiled languages is slower performance. This talk will focus on techniques for identifying and addressing performance bottlenecks specifically in Python and MATLAB. Both languages offer a profiler to determine which sections of code use the most processing time, and this approach should be used first before investing time trying to optimize any code. After determining targets for optimization in code, there are several families of approaches that can be used to speed up processing. One approach is to see if any loops in algorithms can be replaced by “vectorized” operations. While both languages are interpreted languages, they are supported in part by libraries of compiled code, and vectorizing certain algorithms can replace repeated calls to a function with small amounts of data to single calls of a function with all available data. Replacing loops with a single function call reduces the overhead of loop management and memory transfer between the interpreted client and the compiled functions, and often these compiled functions are already designed to minimize computational complexity on large blocks of data. Both languages also offer parallelization, which in certain instances may be another approach to optimizing algorithms. In some instances this parallelization happens implicitly, and both Python and MATLAB offer multiple options for the user to explicitly make their algorithms parallelized. This talk will address when parallelization is a good candidate for reducing processing time, because there are both advantages and disadvantages that must be considered. While parallelization allows users to explicitly take advantage of as many processors as are available, there is also memory and processing overhead involved in initializing, running, and getting results from the parallel workers. Both languages offer options to address and reduce this overhead, and the effectiveness of these options will be addressed. All elements of this talk will be supported by interactive segments, and Jupyter notebooks with examples of these concepts will be made available in both MATLAB and Python.