MaGriDo – Mathematics for Machine Learning Methods for Gaph-Based Data with Integrated Domain Knowledge

© Fraunhofer SCAI

The aim of the project MaGriDo is to develop methods of machine learning that require significantly less data than before or can make predictions that are consistent with existing knowledge.

 

Background

We are currently experiencing a worldwide success story of machine learning (ML), especially of deep neural networks (DNNs), which is not only affecting various fields of application but also mathematical topics such as inverse problems. Large existing data resources and the significantly increased computer performance make the application and successful training of deep neural networks possible. So far, so-called "end-to-end" learning approaches have been used, which usually require very large amounts of structured data. However, these approaches can only be used to a limited extent in many applications in the natural sciences, medicine, and industry, since only complex and heterogeneous data sets are available in most cases and the generation of reliable data is expensive and time-consuming. This is where MaGriDo comes in and pursues the goal of integrating existing domain knowledge, in particular to substantially reduce the amount of data required for sufficiently accurate DNNs.

MaGriDo focuses on practical problems from the materials sciences, especially polymers and glasses. For example, polyurethane coatings have been used for over 60 years for automobiles, furniture or parquet flooring. Depending on the specific requirements, several basic materials are used. It is precisely because these components can be varied so widely that their properties (hardness, solvent resistance, scratch resistance, hydrolysis resistance, gloss) can be configured as in a modular system. The prediction of the properties has so far been largely based on trial-and-error methods and years of experience of manufacturers and users. Another example is the production of glasses. Here, too, a large number of basic materials can be used and a wide range of process parameters can be varied. For these problems, learning methods are to be further developed, investigated and applied to data from the industrial partners. One focus is on regression problems.

Objectives

MaGriDo's goal is to (further) develop and analyse DNNs for industrial problems, which allow existing domain knowledge to be incorporated into the architecture of the networks. Such a hybrid approach can make use of the complementary strengths of "end-to-end" learning approaches and "a-priori models/rules". This approach promises more efficient solutions for many fields of application. For example, the amount of data required is reduced, or the predictions of the ML model are consistent with existing knowledge.

The focus of research and development in MaGriDo is on so-called graph networks, since complex systems can usually be represented very well as compositions of entities and their interactions. These contain various special cases such as conventional fully-connected NNs, convolution NNs and recurrent NNs, can be applied to relational structures, and make a hierarchical processing of input data possible.

Mathematical aspects

The mathematical research in the project will contribute to a better understanding of the architecture of DNNs. The mathematical contributions of the project include

  • the exploration and development of an expressivity analysis for graph networks,
  • the extension of interpretability algorithms and their theory to graph networks,
  • the exploration of efficient learning methods for the training of graph networks,
  • the analysis of graph network architectures, regularization and optimization methods,
  • the structured integration of domain knowledge through appropriate mathematical formalisms,
  • the development of mathematical concepts for transfer learning and active learning,
  • the development and exploration of systematically improved interaction potentials,
  • the investigation of generative graph networks.

The mathematical methods and results of MaGriDo are not limited to the field of materials science. Graph networks can be transfered to many other fields of application, for example to integrate knowledge graphs in systems biology and biomedicine into DNNs.

Economic relevance

The economic motivation for the approaches chosen in MaGriDo is reflected by the industrial partners involved in the project. New developments in industrial applications have so far mostly been based either on the laborious approach of trial and error or on areas with many years of experiences of manufacturers and users. They are therefore usually quite expensive or only slightly innovative. The challenge is to develop suitable strategies and techniques to reduce the costs caused by the high number of in vitro and in silico experiments. MaGriDo's approach of integrating domain knowledge into graph networks and thus combining conventional approaches of materials research with data-based methods and multi-scale modeling promises a substantial contribution. Solutions developed in MaGriDo can be transfered to other applications. Acceleration and flexibility in development and production will be enhanced, so that innovative and sustainable products, technologies and application solutions can be provided by industry according to demand.

Role of Fraunhofer SCAI

Together with the project partners from industry, Fraunhofer SCAI provides the concrete requirements from the applications for the project. SCAI also takes care of the software engineering aspects, the engineering side and the later exploitation of the project results. 

Project partners, funding and project duration

The project partners are Fraunhofer SCAI, the University of Bonn (as coordinator), die Technische Universität Berlin, the Technische Universität Braunschweig, and –  as associated partners from industry –  Covestro AG and Schott AG.

MaGriDo is funded by the Federal Ministry of Education and Research (BMBF) as part of the "Mathematics for Innovations" program.

Duration of the project: 04/2020 - 03/2023