Informatics for Proteomics: Database, Algorithm, and Software Tool
Eunok Paek
Department of Mechanical and Information Engineering, University of Seoul, 90 Jeonnong-dong, Dongdaemun-gu, Seoul 130-743, Korea
e-mail: paek@uos.ac.kr
To understand complex signaling pathways and networks, it is necessary to develop a formal and structured representation of the available information in a format suitable for analysis by software tools. Due to the complexity and incompleteness of the current biological knowledge about cell signaling, such a device must be able to represent cellular pathways at differing levels of details, one level of information abstract enough to convey an essential signaling flow while hiding its details and another level of information detailed enough to explain the underlying mechanisms that account for the signaling flow described at a more abstract level. Various protein states defined by post-translational modifications, conformational changes, and/or subcellular localization, for instance, must be explicitly represented in such a database. Post-translational modifications (PTMs), in particular, are known as playing a key role in cell signaling, but their identification is extremely limited in the current high-throughput proteomics technology such as tandem mass spectrometry. We have proposed an efficient algorithm called MODi that interprets a tandem mass spectrum of a peptide having multiple PTMs while taking into account hundreds of modification types published on www.unimod.org. In order to facilitate the human validation of MODi interpretation, we have developed a software tool that first displays a list of candidate peptides that may match a given tandem mass spectrum. For each candidate sequence, chains of partial sequences, called sequence tags, and in-between gaps are listed, where a gap represents a peptide segment suspected to contain PTMs. Based on MODi results, it not only visualizes the spectral alignment of b-ions and y-ions for sequence tags, but also displays theoretical fragment ion peaks for each PTM interpretation for gap. The tool can also be used to manually complete sequencing in the gap by inferring all possible sequence tags of length one from fragment ion pairs in the designated area of spectrum and helps a user to sequence a peptide manually so that complete peptide and PTM identification can be augmented to the MODi interpretation.