Web supplement for:
Systematic identification of orthologous genes based on protein network comparison

Sourav Bandyopadhyay, Roded Sharan, Trey Ideker
sourav @ bioeng.ucsd.edu; roded @ post.tau.ac.il; trey @ bioeng.ucsd.edu

Abstract
Annotating protein function across species is an important task which is often complicated by the presence of large paralogous gene families. Here, we report a novel strategy for identifying functionally related proteins that supplements sequence-based comparisons with information on conserved protein-protein interactions. First, the protein interaction networks of two species are aligned by assigning proteins to sequence homology groups using the InParanoid algorithm. Next, probabilistic inference is performed on the aligned networks to identify pairs of proteins, one from each species, that are likely to retain the same function based on conservation of their interacting partners. Applying this method to D. melanogaster and S. cerevisiae, we analyze 121 cases for which orthology assignment is ambiguous when using sequence similarity alone. In 61 of these cases, the network supports a different protein pair than that favored by sequence comparisons alone. These results suggest that network analysis can be used to provide a key source of information for refining sequence-based homology searches.
whole image

In each figure, nodes represent a potential functionally orthologous pairing of proteins from Yeast and Fly (Yeast | Fly). The shape of the node corresponds to its properties. Definite orthologs are diamond-shaped, while ambiguous orthologs (orthologs with duplications) are elliptical. Nodes of the same color correspond to the same orthologous group from InParanoid. Edges between nodes represent some degree of interaction conservation between both species.
In each table the data is organized as follows:

Group Inparanoid group to which the node belongs to.
Common Name The "yeast | fly" protein pairing represented in the group.
Name The "yeast | fly" ORF names.
E-value BLAST e-values.
Mean Probability Probability of Orthology as determined by our algorithm.
Num Interactions YeastNumber of underlying interactions in the single species yeast data set
Num Interactions FlyNumber of underlying interactions in the single species fly data set
Std ProbabilityStandard Deviation of the probability estimate

Figures were generated using Cytoscape software.
Data Sources
Inparanoid output: table.SC-DM
Interaction data from DIP: Yeast and Fly

Results
The whole network can be viewed as a SIF file suitable for viewing in Cytoscape with the appropriate Synonyms file.
The organization of the results file is as follows:

State The type of node. Pairings labeled 'Ttrain' are true (definite) orthologs. NA means there is no conserved interaction available. TEST are orthologous pairing that have a conserved network interaction and are ambiguous orthologs.
Name The "yeast | fly" protein pairing represented in the group.
Number of Conserved InteractionsThe number of conserved interactions among orthologs
Yeast Intxs Number of interactions present in yeast.
Fly Intxs Number of interactions present in fly.
Probability Probability of Orthology as determined by our algorithm averaged over 100 runs.
Standard DeviationThe standard deviation of the estimate of Probability of Orthology based on 100 runs

Links
Inparanoid
Laboratory for Integrative Network Biology
Cytoscape
UCSD Bioengineering
UCSD Bioinformatics Graduate Program

Comments, Concerns, Questions pertaining to this web page should be addressed to sourav @ bioeng.ucsd.edu