Last software by Dyliss
AskOmics is a solution to convert tabulated data into RDF and create SPARQL queries intuitively and "on the fly".
AskOmics aims at bridging the gap between end user data and the Linked (Open) Data cloud. It allows heterogeneous bioinformatics data (formatted as tabular files) to be loaded in a RDF triplestore and then be transparently and interactively queried.
AskOmics is made of three software blocks: (1) a web interface for data import, allowing the creation of a local triplestore from user's datasheets and standard data, (2) an interactive web interface allowing "à la carte" query-building, (3) a server performing interactions with local and distant triplestores (queries execution, management of users parameters).
AuReMe enables the reconstruction of metabolic networks from different sources based on sequence annotation, orthology, gap-filling and manual curation. The metabolic network is exported as a local wiki allowing to trace back all the steps and sources of the reconstruction. It is highly relevant for the study of non-model organisms, or the comparison of metabolic networks for different strains or a single organism.
Five modules are composing AuReMe: 1) The Model-management PADmet module allows manipulating and traceing all metabolic data via a local database. 2) The meneco python package allows the gaps of a metabolic network to be filled by using a topological approach that implements a logical programming approach to solve a combinatorial problem 3) The shogen python package allows genome and metabolic network to be aligned in order to identify genome units which contain a large density of genes coding for enzymes, it also implements a logical programming approach. 4) The manual curation assistance PADmet module allows the reported metabolic networks and their metadata to be curated. 5) The Wiki-export PADmet module enables the export of the metabolic network and its functional genomic unit as a local wiki platform allowing a user-friendly investigation.
Bioquali for Cytoscape
bioquali [input: signed regulation network & one gene-expression dataset. output: consistency-checking and gene-expression prediction]. It is a plugin of the Cytoscape environment. BioQuali analyses regulatory networks and expression datasets by checking a global consistency between the regulatory model and the expression data. It diagnoses a regulatory network searching for the regulations that are not consistent with the expression data, and it outputs a set of genes which predicted expression is decided in order to explain the expression inputed data. It also provides the visualization of this analysis with a friendly environment to encourage users of different disciplines to analyze their regulatory networks.
Draw graphs using Answer Set Programming
Use ASP as a Domain Specific Language to specify dot-based visualizations.
Pattern Matching biological grammar language
Logol is a pattern matching grammar language and a set of tools to search a pattern in a biological (nucleic or amino acid) sequence. It allows the design of sophisticated patterns (by way of a an high level grammatical formalism), and their search in large sequences. The LogolMatch tool takes as input a biological sequence, DNA, RNA or protein, and a grammar file. It returns a result file containing the matches with all required details.
Two modules are composing Logol. First, the Graphical designer allows a complex pattern to be iteratively built based on basic graphical patterns. The associated grammar file is an export of the graphical designer. Second, the LogolMatch parser takes as input a biological sequence and a grammar file. It returns a XML file containing all the occurrences of the pattern in the sequence with their parsing details. The input sequences can be genomes from biological banks.
Lombarde is a bioinformatics method that extracts from a gene regulatory network determined from a set of predicted transcription factors and binding sites a subnetwork explaining a given set of observed co-expressions, highlighting those regulations most likely involved in the co-regulation. Lombarde solves an optimization problem on a graph to select confident paths within the given regulatory network joining a putative common regulator with two co-expressed genes via regulatory cascades.
This tool is useful to enhance key causalities within a regulatory transcriptional network when it is challenged by several environmental perturbations.
Metabolic Network Completion
It is a qualitative approach to elaborate the biosynthetic capacities of metabolic networks. In fact, large-scale metabolic networks as well as measured datasets suffer from substantial incompleteness. Moreover, traditional formal approaches to biosynthesis require kinetic information, which is rarely available. Our approach builds upon formal systems for analyzing large-scale metabolic networks. Mapping its principles into Answer Set Programming allows us to address various biologically relevant problems. A new version of Meneco has been available with Python 3 and gringo 4 in 2015.
Graph visualization assistance through power graph analysis methods.
Implementation of graph compression methods oriented toward visualization, and based on power graph analysis.
Software suite for the inference of automata modelling protein sequences
This tool is a grammatical inference framework suitable for learning the specific signature of a functional protein family from unaligned sequences by partial and local multiple alignment and automata modelling. It performs a syntactic characterization of proteins by identification of conservation blocks on sequence subsets and modelling of their succession. Possible fields of application are new members discovery or study (for instance, for site-directed mutagenesis) of, possibly non-homologous, functional families and subfamilies such as enzymatic, signalling or transporting proteins.
Given a sample of sequences belonging to a structural or functional family of proteins, Protomata-Learner infers an automaton characterizing the family by partial local alignment of the sequences. Automata are graphical models representing a (potentially infinite) set of sequences. Able to express alternative local dependencies between the positions, automata offer a finer level of expressivity than classical sequence patterns (such as PSSM, Profile HMM, or Prosite Patterns) and can model more than homologous sequences. They are well suited to get new insights into a family or to search for new family members in the sequence data banks, especially when approaches based on classical multiple sequence alignments are insufficient.
Three main modules are integrated in the Protomata-learner workflow are available as well as stand-alone programs: 1) paloma builds partial local multiple alignments, 2) protobuild infers automata from these alignements and 3) protomatch and protoalign scans, parses and aligns new sequences with learnt automata. The suite is completed by tools to handle or visualize data and can be used online by the biologists via a web interface on Genouest Platform.
This ASP-based software aims at identifying every segments of consecutive genes in a bacterial genome with a maximum number of genes that participates in a given metabolic pathway.
Through this selection, the shogen tool deciphers putative sets of genes that (1) take an active part in metabolic pathways while being closely connected via metabolic networks and (2) are consecutive on each of the genomes involved.
In practice, our approach connects genomic and metabolic knowledge by considering the genome organization and the biochemical reactions catalyzed by enzymes encoded by its genes. The underline parsimonious principle assumes that genes that must be jointly regulated to activate a metabolic reaction cascade, and should be close enough in the genome organization.
In 2016, the python package was integrated in a docker container together with several scripts in order to facilitate the preprocessing of inputs for shogen and their post-processing.
Identification of capsid and tail viral protein sequences
VIRALpro is a predictor capable of identifying capsid and tail protein sequences using support vector machines (SVM) with an accuracy estimated to be between 90% and 97%. Predictions are based on the protein amino acid composition, on the protein predicted secondary structure, as predicted by SSpro, and on a boosted linear combination of HMM e-values obtained from 3,380 HMMs built from multiple sequence alignments of specific fragments – called contact fragments – of both capsid and tail sequences.