Web supplement for:
Systematic identification of orthologous genes based on protein network comparison
Sourav Bandyopadhyay, Roded Sharan, Trey Ideker
sourav @ bioeng.ucsd.edu; roded @ post.tau.ac.il; trey @ bioeng.ucsd.edu
Abstract
Annotating protein function across species is an important task which is often complicated by the presence of large paralogous gene families. Here, we report a novel strategy for identifying functionally related proteins that supplements sequence-based comparisons with information on conserved protein-protein interactions. First, the protein interaction networks of two species are aligned by assigning proteins to sequence homology groups using the InParanoid algorithm. Next, probabilistic inference is performed on the aligned networks to identify pairs of proteins, one from each species, that are likely to retain the same function based on conservation of their interacting partners. Applying this method to D. melanogaster and S. cerevisiae, we analyze 121 cases for which orthology assignment is ambiguous when using sequence similarity alone. In 61 of these cases, the network supports a different protein pair than that favored by sequence comparisons alone. These results suggest that network analysis can be used to provide a key source of information for refining sequence-based homology searches.
In each figure, nodes represent a potential functionally orthologous pairing of proteins from Yeast and Fly (Yeast | Fly). The shape of the node corresponds to its properties. Definite orthologs are diamond-shaped, while ambiguous orthologs (orthologs with duplications) are elliptical. Nodes of the same color correspond to the same orthologous group from InParanoid. Edges between nodes represent some degree of interaction conservation between both species.
In each table the data is organized as follows:
| Group | Inparanoid group to which the node belongs to. |
| Common Name | The "yeast | fly" protein pairing represented in the group. |
| Name | The "yeast | fly" ORF names. |
| E-value | BLAST e-values. |
| Mean Probability | Probability of Orthology as determined by our algorithm. |
| Num Interactions Yeast | Number of underlying interactions in the single species yeast data set |
| Num Interactions Fly | Number of underlying interactions in the single species fly data set |
| Std Probability | Standard Deviation of the probability estimate |
Figures were generated using Cytoscape software.
Data Sources
Inparanoid output: table.SC-DM
Interaction data from DIP: Yeast and Fly
Results
The whole network can be viewed as a SIF file suitable for viewing in Cytoscape with the appropriate Synonyms file.
The organization of the results file is as follows:
| State | The type of node. Pairings labeled 'Ttrain' are true (definite) orthologs. NA means there is no conserved interaction available. TEST are orthologous pairing that have a conserved network interaction and are ambiguous orthologs. |
| Name | The "yeast | fly" protein pairing represented in the group. |
| Number of Conserved Interactions | The number of conserved interactions among orthologs |
| Yeast Intxs | Number of interactions present in yeast. |
| Fly Intxs | Number of interactions present in fly. |
| Probability | Probability of Orthology as determined by our algorithm averaged over 100 runs. |
| Standard Deviation | The standard deviation of the estimate of Probability of Orthology based on 100 runs |
Links
Inparanoid
Laboratory for Integrative Network Biology
Cytoscape
UCSD Bioengineering
UCSD Bioinformatics Graduate Program
Comments, Concerns, Questions pertaining to this web page should be addressed to sourav @ bioeng.ucsd.edu
|