The liver plays a major role in the metabolic activation of xenobiotics (drugs, chemicals such as pollutants, pesticides, food addi- tives, etc.). Among environmental contaminants of concern, heterocyclic aromatic amines (HAAs) are xenobiotics classified as possible or probable carcinogens (2A or 2B) by IARC, for which low information exists in humans. 30 AHAs have been identified to date, but the bioactivation pathways, metabolites and DNA adducts have been fully characterised in the human liver for only three of them (MeIQx, PhIP, AαC). We have developed a modelling approach to predict both metabolism (metabolites and reactions), DNA reactivity and the production probability of metabolite. Our approach is based on the construction of enriched metabolism maps. We bring together tools for predicting reactions and metabolites (SyGMa), pre- dicting metabolism sites (Way2Drug SOMP, Fame3), predicting DNA reactivity (XenoSite Reactivity V1) and calculating a production probability score based on the properties of Bayesian networks. This prediction pipeline was evaluated and validated using caffeine and then applied to six AHAs. Main results show that our approach allows us to predict the metabolism of xenobiotics and that the production probability score has different proper- ties that can lead to the filtration of the metabolism map or to the determination of the enzymatic pro- files associated with maximising the formation of DNA adducts. This predictive toxicology approach opens up prospects for estimating the genotoxicity of various environmental contaminants in normal or pathophysiological situations.
Hugo Talibart’s defense on “Comparison of homologous protein sequences using direct coupling information by pairwise Potts model alignments” will take place on Wednesday 24th February 2021 at 14:30 (UTC+1).
Sean EDDY, Professor at Harvard University, Cambridge, USA (Rapporteur)
Martin WEIGT, Professor at Sorbonne Université, Paris (Rapporteur)
Guillaume GRAVIER, Senior researcher CNRS, Rennes
Juliette MARTIN, Researcher CNRS, Lyon
Thomas SCHIEX, Senior researcher INRAE, Toulouse
Jacques NICOLAS, Senior researcher Inria, Rennes (thesis Director)
François COSTE, Researcher Inria, Rennes (thesis supervizor)
To assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods based on significant alignments of query sequences to annotated proteins or protein families. While powerful, existing approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, in this thesis we propose to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. This novel application of Potts models raised further requirements for their construction, and we identified several key points towards building more comparable Potts models, towards an ideal of canonicity. Due to non-local dependencies, the problem of aligning Potts models is NP-hard. Here, we introduced a method based on an Integer Linear Programming formulation of the problem which can be optimally solved in tractable time. Our first results suggest that taking pairwise couplings into account can improve the alignment of remote homologs and could thus improve remote homology detection.
– Emmanuelle BECKER (Université de Rennes 1)
– Charles BETTEMBOURG (Sanofi, Chilly-Mazarin)
– Laurence CALZONE (Institut Curie, Paris)
– Olivier DAMERON (Université de Rennes 1)
– Franck DELAPLACE (Université de Paris-Saclay, Évry)
– Fleur MOUGIN (LaBRI Bordeaux)
– Anne SIEGEL (IRISA Rennes)
– Vassili SOUMELIS (Hôpital Saint Louis, Paris)
– Emmanuel OGER (Université de Rennes 1)
Systematic erythematosus lupus is an example of a complex, heterogeneous and multifactorial disease. The identification of signature that can explain the cause of a disease remains an important challenge for the stratification of patients. Classic statistical analysis can hardly be applied when population of interest are heterogeneous and they do not highlight the cause. This thesis presents two methods that answer those issues. First, a transomic model is described in order to structure all the omic data, using semantic Web (RDF). Its supplying is based on a patient-centric approach. SPARQL query interrogates this model and allow the identification of expression Individually-Consistent Trait Loci (eICTLs). It a reasoning association between a SNP and a gene whose the presence of the SNP impact the variation of its gene expression. Those elements provide a reduction of omics data dimension and show a more informative contribution than genomic data. This first method are omics data-driven. Then, the second method is based on the existing regulation dependancies in biological networks. By combining the dynamic of biological system with the formal concept analysis, the generated stable states are automatically classified. This classification enables the enrichment of biological signature, which caracterised a phenotype. Moreover, new hybrid phenotype is identified.
Marine Louarn’s PhD defense on “Analysis and integration of heterogeneous large-scale genomics data: application to B cell differentiation and Follicular Lymphoma non coding mutations” will take place on Thursday 26th november at 09:30 (CEST+1).
The defense will be broadcasted live on Youtube: https://youtu.be/9kPjm6yERMM
– Alexandre TERMIER (Professor) University Rennes 1 Rennes
– Sarah COHEN-BOULAKIA (Professor) LRI Orsay
– Fabrice CHATONNET (IR) CHU Rennes
– Olivier DAMERON (Professor) University Rennes 1 Rennes
– Anne SIEGEL (DR CNRS) IRISA Rennes
– Thierry FEST (PU-PH) INSERM / CHU Rennes
Regulatory networks inference from heterogeneous data is a computational step aiming at identifying key regulators involved in differentiation processes leading to cancer. In this thesis I focus on B cell differentiation, from which follicular lymphoma emerges. The first contribution outlines the reproducibility and reusability limitations of a state-of-the-art method for network inference from genomic data. To overcome these limitations, I demonstrated that Semantic Web technologies can structure and integrate large-scale heterogeneous datasets in a systematic way. The original analysis workflow outputs could be reproduced as queries on a graph of data, which could itself be layered and enriched with public databases. This demonstrates the technical relevance of this approach and underlines its benefits in improving reusability and reproducibility. As a fourth contribution, a new method for network inference was designed to take expert knowledge into account – both to extend the previous framework to the analysis of smaller, closely-related datasets and to enrich the inferred networks with signs, therefore including inhibitory regulatory processes. Finally, the method was applied to B cell differentiation, leading to the discovery of 146 TF with potential large impact on the network.