D2KAB is being driven by 5 agri-food, ecosystem-biodiversity scenarios
Ontology driven food packaging solutions (T4.1)
INRAE-IATE already develops an ontology for food material processing and food packaging characteristics (Matter Transfer Ontology). It drives concrete solutions for farmers, packaging solution suppliers and packed food suppliers such as an application to automatically select the most appropriate food package for a given food considering multiple variables (food respiration, temperature, material to use, etc.). This decision support system has been created during the FP7 EcoBioCap project (2011-2015), led by IATE, and is currently extended with aggregation preference tools to take into account consumer’s expectations during H2020 NOAW project (2016-2020) also led by IATE [Ref]. The knowledge brought by the ontology reduces the economic cost and time to determine the best packaging solution for a given food to pack. Scenario T4.1 consolidates these preliminary results by focusing on the scientific challenges brought by annotated data quality management. IATE develops new methods and tools to manage constraints to automatically analyze the quality of annotated data. After inconsistency detection, curation will be done manually or semi-automatically on the @Web platform. Consistency constraint checking will be based on STTL / LDScript.
Lifting of agricultural reference lists and dataset (T4.2)
Partners: INRAE-TSCF/ACTA/INRAE-DipSO/CNRS-I3S/UM-LIRMM + Elzeard + IFV + API-AGRO
Within this scenario, we are lifting some agricultural datasets and transforming some data standards or reference lists (e.g., like pesticides and active substances lists, phenological stages, crops and varieties lists). We will publish those datasets and reference lists as linked open data in order to improve their interoperability and reuse. The transformation (also called lifting) from non structured or legacy format to semantic web technologies, typically RDF and SKOS, is a collaborative work between the dataset provider and D2KAB partners. Different lifting strategy are applied depending of the original dataset format; in some cases, when the dataset is accessible via web service, we will use SPARQL Microservices, developed by CNRS-I3S to access data in RDF. Some examples of dataset are:
Datasets will be incorporated in D2KAB’s knowledge graph and standards/reference lists inside AgroPortal. In this task, D2KAB also collaborates with API-AGRO which build a platform to gather open and private agricultural datasets.
Plant Health Bulletins augmented semantic reader (T4.3)
Partners: INRAE-TSCF/ACTA/INRAE-DipSO/CNRS-I3S + Elzeard + IFV
In this scenario, we address agricultural advisers and stakeholders of biovigilance in agriculture by building an augmented, semantically boosted, reading interface for the official Plant Health Bulletins (PHB – bulletins de santé du végétal, in French). Our goal is to improve integration of several datasets and offer new and harmonized information to different stakeholders. An archive of PHBs published as linked open data is constantly updated. It contains original PDF files and their HTML versions, plus a SPARQL endpoint is also available for querying the data. Natural language processing is applied to produce a set of PHB’s annotations describing plot observations related to growth stage and pest attacks. An RDF annotation links an HTML text segment with an element of a semantic ressource, typically taken from AgroPortal: French Crop Usage thesaurus developed by INRAE-TSCF, TAXREF-LD developed by CNRS-I3S for the French MNHN, the GECO knowledge base managed by ACTA, and the “Serre des savoirs” ontology currently being developed by Elzeard. The PHB semantic reader developed within D2KAB will provide some query capabilities of all those interlinked RDF datasets. T4.3 also works with Institut Francais de la Vigne et du Vin.
Wheat phenotype data integration (T4.4)
This scenario will enable research, economical and societal outcomes by a better understanding and control of wheat phenotype expression. This includes improving wheat plant development and resistance to diseases for the reduction of chemical ; meeting new consumer requirements, such as low gluten. Current datasets on wheat phenotypes (such as the ones managed by D2KAB’s partners: AgroLD, Wheat@URGI portal or WheatIS data discovery tool) span from experimental data, bibliographic citations, observations and often use different reference vocabularies (cf. wheat related ontologies in AgroPortal). This prevents researchers and breeders from uniformly querying and reusing these datasets, and building new bridges across different areas of research for fundamental knowledge acquisition and seed variety improvement. In this scenario, we integrate wheat-related information so it becomes possible. The main challenge is the mapping of low level observation measures (e.g., weight of 1000 grains) and abstract qualitative properties (e.g., high yield). The scientific challenge is the formal representation and reasoning to convert data through complex ontology concept rule-based linkages, data-lifting and text to concept mapping. Our results will be published on the Wheat@URGI portal and we will then study how the approach generalizes to other crops.
New ontologies and thesaurus in biodiversity and ecosystem research (T5.1)
AnaEE thesaurus (T5.1a)
In order to properly manage and share the various and heterogeneous data produced by experimental, analytical and modeling platforms, AnaEE-France infrastructure adopted a semantic approach. In 2017, the AnaEE thesaurus was released to provide a controlled vocabulary for the semantic description of the study of continental ecosystems and their biodiversity. During D2KAB, we will take benefit from this thesaurus to extend, in the field of ecosystems and experimentations, the Extensible Observation Ontology that provides upper level classes for capturing the semantics of scientific observations and measurements. Using this ontology and other specific ones such as the Semantic Sensor Network ontology, we develop or consolidate pipelines for the semantic annotation of AnaEE-F databases and produce accessible linked data. We will also formally align our ontology to other relevant semantic resources such as GEMET.
TOP Thesaurus (T5.1b)
Here, the main objective is to upgrade the Thesaurus of Plant Characteristics along three main lines: (i) complete and improve the transformation of the thesaurus in SKOS, (ii) formally align TOP with other trait vocabularies, including via the definitions of traits, and (iii) incorporate a larger number and more precise root traits, which is an important research topic internationally. This new, upgraded version, will updated the current version of TOP in AgroPortal. This work is done in partnership with the national SémanDiv GDR (supported by CNRS and head by CNRS-CEFE), whose aim is to further the development of semantic resources for biodiversity.
Plant functional biogeography data integration in the Mediterranean Basin (T5.2)
This scenario focuses on ontology-driven data integration for functional biogeography, with the aim to understand and generalize trait-environment-relationships (TER) across the Mediterranean Basin. This work is based on different sources of data which have been developed but never been combined yet. Within D2KAB, we: (i) develop the semantic foundations to enable the integration of these data by reviewing existing vocabularies and collaboratively filling the gaps for missing concepts and integrating them into subsequent version(s) of the Thesaurus of Plant Characteristics; (ii) build databases of variables required to identify TER for selected situations in the Mediterranean Basin, to serve as prototypes for data integration; (iii) annotate the concepts in these databases, based on the thesauri developed in (i); (iv) combine botanical, trait, climate and soil data using these semantic annotations, allowing us to run statistical analyses in a systematic and unified way and test the generality of TER along multiple temperature and drought gradients found across the Mediterranean Basin.