Resumen
Data-Intensive Domain (DID) are based on large and heterogeneous datasets, which hinder the development of efficient solutions and data analysis. How these datasets are built and analyzed determines the success and the efficiency of the DID systems in the data knowledge extraction. Existing DID systems provide mechanisms to analyze data but the integration of the different databases is not easy. Moreover, existing DID systems do not suggest any solution in the analysis tasks. There is evidence that the use of foundational ontologies and conceptual modelling can help in the software development process of complex systems, such as DID systems. So, the contribution of this project is: i) The definition of a method to develop quality and efficient DID systems, starting from the definition of models that represent the different features at the different abstraction layers, and going through model to model and model to code transformations, ii) The definition of a platform that implements the method for DID development. This approach is not aligned with how current DID systems are working, where the datasets integration is done ad-hoc to solve specific problems and the data correlation and analysis depends on the analysts' experience. DID systems have the target of not only saving and reporting data, but also the target of helping in the data analysis. Therefore, we aim to include in the DID systems produced with our method some recommendations for the analysts that help them to take the most accurate decision. To this purpose, both in the method and in the platform definition we will use Artificial Intelligence, specifically Explainable Artificial Intelligence (XAI) and Machine Learning (ML) techniques, to help data scientists in the data analysis and exploitation. An example of application of XAI in DID is the development of systems that aim to predict accurate critical diseases before the first symptoms appear. In a precision medicine context, and thanks to all the genomic information available, XAI and ML can be helping to predict future diseases taking as input this genomic information. For the success of this project, we have formed a research team spread through several European countries with research centers that work with DID systems, XAI, ML and the human genome.