July 09, 2024
By Janea Systems
PyTorch is one of the most popular open source deep learning frameworks alongside Tensorflow. With a large active community and various third-party tools built on top of it, the framework is used by the likes of Meta, Uber, and Microsoft, and it’s been widely adopted in academic research and mobile/edge deployments.
For the past two years, Janea Systems has been maintaining PyTorch on Windows on behalf of Microsoft, ensuring parity across different OSs so that everyone can benefit from PyTorch. In this article, we’ll explain how we ensured a crucial module worked on PyTorch for Windows, enabling seamless sparse matrix operations and expanding the framework's capabilities for Windows developers.
MKL (Math Kernel Library) is a software library developed by Intel that provides optimized mathematical functions for various computational tasks.
The MKL Sparse Matrix module is a part of this library that specifically deals with sparse matrices. Sparse matrices are special types of matrices that contain mostly zero values, with only a few non-zero entries. They are commonly used in scientific computing and data analysis to represent large datasets efficiently.
Efficient storage: It stores only the non-zero values of a matrix, saving memory and computational resources.
Fast operations: It provides optimized algorithms for performing mathematical operations on sparse matrices, such as addition, multiplication, and solving linear equations.
Performance optimization: By using specialized techniques for sparse matrices, MKL can significantly speed up calculations compared to treating them as regular dense matrices.
Integration with other tools: It can enhance the performance of various programming languages and scientific computing environments.
Sparse matrix computations are widely used in various fields, particularly where large datasets with many zero elements are involved. Common applications include:
MKL Sparse wasn't supported on windows, leading to problems using some functions. Specifically, ‘mkl_sparse_d_create_csr’ and ‘mkl_sparse_destroy’ use ‘malloc/new’ and ‘free/delete’ for memory management, which requires the Universal C Runtime (UCRT). PyTorch links against the static variant of MKL with the ‘/MD’ option, meaning it uses a version of the runtime library designed for dynamic linking. However, with Microsoft's Visual Studio Compiler (MSVC), you cannot mix different runtime library options (/MD and /MT) in the same program. This conflict resulted in errors and prevented sparse matrix operations from working correctly. In simpler terms, you can't have two different versions of the same foundational software trying to work together, which caused problems.
We ensured that both PyTorch and MKL use the same C runtime library (/MD), avoiding conflicts that previously led to linker errors. Specifically, we resolved the "LNK2038: mismatch detected for 'RuntimeLibrary'" error, where the value 'MT_StaticRelease' didn't match the value 'MD_DynamicRelease'. This consistency is crucial for stability and successful compilation.
Consistent use of the same C runtime library (/MD) for PyTorch and MKL resolves linker errors and streamlines the build process, making it easier to compile and maintain projects on Windows.
By removing conditional compilation for Windows-specific cases related to sparse operations, our pull request makes the code more maintainable and reduces the need for platform-specific workarounds, enhancing the overall development workflow and cross-platform consistency.
We’re incredibly proud of our continued work maintaining PyTorch on Windows. By enabling sparse matrix operations, this fix unlocks new potential, from self-driving cars to advanced physics, enhancing computational efficiency and performance. As we continue to push the boundaries of software engineering and data science, such advancements are crucial in driving innovation and solving complex problems. The journey of integrating MKL sparse operations into PyTorch on Windows underscores the collaborative spirit of the open-source community and the relentless pursuit of excellence in software development.
Take a look at the original issue and the fix on GitHub here:
https://github.com/pytorch/pytorch/issues/97352
https://github.com/pytorch/pytorch/pull/102604
Ioan Manta is a Senior Software Engineer at Janea Systems. Learn more about him here.
Ready to discuss your software engineering needs with our team of experts?