New Results

New results

Scalable methods to query data heterogeneity

PAX2GRAPHML: a Python library for large-scale regulation network analysis using BIOPAX [F. Morrews] 18.

  • The concept of regulated reactions, which allows connecting regulatory, signaling and metabolic levels, has been used to easily manipulate BioPAX source files as regulated reaction graphs. Biochemical reactions and regulatory interactions are homogeneously described by regulated reactions involving substrates, products, activators and inhibitors as elements.

Converting disease maps into heavyweight ontologies [O. Dameron15.

  • In the context of our participation to the IPL NeuroMarker, we designed the Disease Map Ontology (DMO), an ontological upper model based on systems biology terms. We then applied DMO to Alzheimer’s disease (AD). Specifically, we used it to drive the conversion of AlzPathway, a disease map devoted to AD, into a formal ontology called Alzheimer DMO.

Pharmaco-epidemiological queries over administrative healthcare databases [O. Dameron23, 27.

  • Chronicles are a relevant formalism for representing complex temporal queries over healthcare patient trajectories while retaining acceptable performances. However, they lack a proper semantic support for handling generalisation. Conversely, Semantic Web techniques adequately handle generalization and can represent temporal constraints, but the latter remain a performance bottleneck. We proposed an hybrid approach combining chronicles and Semantic Web queries and demonstrated its capacity to detect patients having venous thromboembolism disease in the French medico-administrative database 23.
  • Generating synthetic data for administrative healthcare databases allows to perform research on healtcare data without compromising patients privacy. We proposed a probabilistic relational model fitted on publicly available datasets that generates synthetic versions of the national database of French insured patients and mimic statistical distributions but do not hold sensitive personal data 27.

Metabolism: from protein sequences to systems ecology


Detection of genomic recombinations by partial local alignment [B. Blanc, F. Coste] 33.

  • In collaboration with Marie-Agnès Petit (Phage team, MICALIS, Inrae), we investigated how paloma (the partial local multiple sequence alignment tool from Protomata suite) could help studying recombination in proteins from 32 phages, of which some have already been recombined according to the literature. Classical multiple sequence alignment are not suitable for this task. In contrast, the generated partial local alignments allowed to find recombined regions in 8 phages described by the past in 3 phages, and the presence of 4 conserved sequences between these 8 phages around the recombined region which could be recombination fingerprints. 33

Modeling proteins with crossing dependencies [F. Coste, H. Talibart] 19, 32

  • Motivated by their success on contact prediction, we proposed to use Potts models to represent proteins with direct couplings between positions — in addition to positional composition — and compare them by aligning optimally these models thanks to an Integer Linear Programming formulation of the problem. We worked on the inference of robust and more canonical Potts models. We assessed the approach with respect to a non-redundant set of reference pairwise sequence alignments with low sequence identity, showing that Potts models representing proteins can be aligned in reasonable time and that considering couplings can improve significantly the alignments with respect to other methods 19, 32.

Large-scale eukaryotic metabolic network and design of microbial communities [A. Siegel, A. Belcour, S. Blanquart, J. Got, N. Théret, M. Conan] 20, 14, 13, 30, 26, 25, 24, 37, 16.

  • Metabolic data analysis enhanced by large-scale metabolic network reconstruction We used our tools for the reconstruction and analysis of large-scale metabolic networks to provide insights on Ulva compressa, a green tide-forming species, from transcriptome-wide gene expression profiles  20. We also benefited from the availability of genome data and gas chromatography-mass spectrometry (GC-MS) sterol profiling using a database of internal standards to build such a model of sterol biosynthesis in brown algae  14. Our results demonstrate that integrative approaches can already be used to infer experimentally testable models, which will be useful to further investigate the biological roles of those newly identified algal pathways.
  • Metabolic pathway inference from non genomic data We developed a modeling approach in order to predict all the possible metabolite derivatives of a xenobiotic. Our approach relies on the construction of an enriched and annotated map of derivative metabolites from an input metabolite. The pipeline assembles reaction prediction tools (SyGMa), sites of metabolism prediction tools (Way2Drug, SOMP and Fame 3), a tool to estimate the ability of a xenobotics to form DNA adducts (XenoSite Reactivity V1), and a filtering procedure based on Bayesian framework. The method was applied to determine enzyme profiles associated with the maximization of DNA adducts formation derived from each HAA 13, 30
  • Design of synthetic microbiota We presented the tool Metage2Metabo (microbiota-scale metabolic complementarity for the identification of key species) in several conferences 26, 25, 24, 37. Robustness analysis of metabolic predictions in algal microbial communities based on different annotation pipelines.
  • Impact of genome annotations procedures on the design of synthetic microbiomes 16 As there are multiple annotation pipelines available, the question arises to what extent differences in annotation pipelines impact outcomes of genome-scale metabolic network reconstructions. We compared five commonly used pipelines (Prokka, MaGe, IMG, DFAST, RAST) from predicted annotation features to the metabolic network-based analysis of symbiotic communities (biochemical reactions, producible compounds, and selection of minimal complementary bacterial communities). The consortia generated yielded similar predicted producible compounds and could therefore be considered functionally interchangeable.

Regulation and signaling: detecting complex and discriminant signatures of phenotypes

Learning Boolean controls in regulated metabolic networks: a case-study [A. Siegel, K. Thuillier22

  • Many techniques have been developed to infer Boolean regulations from a prior knowledge network and experimental data. Existing methods are able to reverse-engineer Boolean regulations for transcriptional and signaling networks, but they fail to infer regulations that control metabolic networks. We provided a formalization of the inference of regulations for metabolic networks as a satisfiability problem with two levels of quantifiers, and introduces a method based on Answer Set Programming to solve this problem on a small-scale example.

Functional signature for ADAMTS [C. Belleannée, S. Blanquart, F. Coste, O. Dennler, N. Théret36.

  • Hepatic Stellate Cells produce a wide variety of molecules involved in ECM remodeling, such as adamalysins (hal-03215892). However, the limitations of discovering new functions of these proteins stem from the experimental approaches that are difficult to implement due to their structure and biochemical features. In that context we develop an original framework combining the identification of small modules in conserved regions independent of known domains and the concepts of phylogenomics (association of conservation and phenotype gained concurrently during evolution). The resulting evolutionary model of motif signatures and protein-protein interaction signatures of the ADAMTS family is validated by data from literature and provides biologists with many new potential functional motifs.

Creation of predictive functional signaling networks [M. Bougueon, N. Théret29, 21.

  • The rule-based model approach. A Kappa model for hepatic stellate cells activation by TGFB1 29, 21 Kappai is a site graph rewriting language. It offers a rule-centric approach, inspired from chemistry, where interaction rules locally modify the state of a system that is defined as a graph of components, connected or not. In this case study, the components will be occurrences of hepatic stellate cells in different states, and occurrences of the protein TGFB1. The protein TGFB1 induces different behaviors of hepatic stellate cells thereby contributing either to tissue repair or to fibrosis. Better understanding the overall behavior of the mechanisms that are involved in these processes is a key issue to identify markers and therapeutic targets likely to promote the resolution of fibrosis at the expense of its progression.

Evidence of a microRNA signature for frontotemporal lobar degeneration and amyotrophic lateral sclerosis [E. Becker, V. Kmetzsch61.

  • In the context of our participation in the IPL NeuroMarker project, a joint study with Institut du Cerveau (Inserm/CNRS/Sorbonne Université) at the Pitié-Salpêtrière hospital and the Aramis team (Inria Paris) evidenced a signature of four plasma microRNAs in presymptomatic and symptomatic subjects with frontotemporal dementia and amyotrophic lateral sclerosis associated with a C9orf72 mutation13. The four microRNAs’ expression level allows to discriminate patients, presymptomatic or healthy individuals. The study was conducted by Virgilio Kmetzsch in his PhD supervized by Olivier Colliot (Aramis) and Emmanuelle Becker (Dyliss). Future steps will study how combining this signature with medical imaging can refine the classification or can result in a score for characterizing the disease progression.

Characterizing gene structure with grammatical languages and conservation information [C. Belleannée, S. Blanquart, O. Dameron, N. Guillaudeux] 31

  • Based on syntactic models and graph formalisms, we compared splicing structures of 2167 triplets of orthologous genes shared in human, mouse and dog. This resulted in the prediction of 6861 new coding transcripts (i.e. putative proteins) on these species, mainly for dog, an emergent model species. Every predicted transcript shares an identical exonic structure with a coding transcript already known in another species, hence defining them as orthologs. Additionnaly, we identified a set 253 gene triplets with strictly conserved exonic structures in human, mouse and dog, and so expressing the same proteome (i.e. the same isoform coding transcripts). These genes express a total of 879 groups of orthologous isoforms, such that in each group, the same splicing structure is shared in each three species gene. Although these genes express a same proteome, we showed that the expressed transcriptomes may be different, due to the gene’s propensity to express distinct alternatively transcribed mRNAs encoding the same protein.

Estimating ancestral phenotypes of halophilic enzymes using phylogenetic inferences [S. Blanquart] 12

  • Ancestral sequence reconstruction approaches aim at synthesizing ancient genes, which are estimated using phylogenetic methods, in order to experimentally measure the product’s phenotypes. In such a study, we investigated the adaptation of the ancestral malate dehydrogenase enzymes of extrem halophilic Archea. Applying advanced phylogenetic approaches, we infered and synthesized ancient enzyme sequences. We described the phenotype of a transferred enzyme, the evolutionnary drift phenomenon and a secondary adaptation to alkaliphic lifestyle. The stabilisation of tetrameric assembly by ions appeared to modulate the enzymes adaptation to extremely salted environments 12.

Establishing an inventory in human genome of a transposable element with help of grammatical patterns [A. Antoine-Lorquin, C. Belleannée] 11

  • Transposable elements are repeated DNA sequences that represent 45% of the human genome. They play a critical role in genome organization and its evolution. Among them, MADE1 is a 80 bp element with a special structure, being flanked on both ends by short sequences repeated in inverse orientation. The use of grammatical patterns with our Logol tool 2 contributed to characterize the structural MADE1 variants and to establish an exhaustive inventory of MADE1 elements 11.

Comments are closed.