LAMA – development of fast and scalable software


  • High performance famework for the development of hardware-independent codes on heterogeneous compute clusters
  • Reduces time-to-market for new software
  • Considers latest hardware

High performance framework LAMA

With LAMA, algorithms for numerical-mathematical problems based on sparse matrices can be easily implemented.



LAMA is a framework for developing hardware-independent, high-performance code for heterogeneous computer systems. It enables the development of fast, scalable software that can be used on virtually any type of system, from embedded devices to highly parallel supercomputers. By leveraging LAMA for their application, software developers benefit from higher implementation-level productivity and latest hardware innovations, resulting in a shorter time-to-market.

The framework supports multiple target platforms within a distributed heterogeneous environment. It offers optimized device code on the back-end side and high-scalability through latency-hiding and asynchronous execution across multiple nodes. The modular and extensible software design of LAMA supports the developer on several levels. Regardless of whether he writes his own portable code with the heterogeneous computing development kit or with prepared functionality from the linear algebra package, the user always gets high productivity and maximum performance.

The integration of LAMA into other software products is simple and industry-friendly due to the dual-license model: both the open-source AGPL and a commercial license are offered.


Productivity is combined with performance in execution - which is not mutually exclusive. LAMA's flexible software design introduces minimal overhead and preserves the full performance of the underlying BLAS implementations from the hardware vendors and the highly-optimized kernel backends. The performance comparison with competing software libraries in the field of linear algebra shows comparable results for single-node implementations.

In distributed systems, the asynchronous execution model guarantees an efficient overlap of computation, memory transfer, and communication that achieves linear scaling on GPUs.

Encapsulated modules enable clean software development

Basic module

Math kernel extension

Distributed extension

Linear algebra package