Proteome Measures

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Thursday, March 30, 2006

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

What is all this about labeling and signal integration can’t you just count the peptides?

Exponentially Modified Protein Abundance Index (emPAI) for Estimation of Absolute Protein Amount in Proteomics by the Number of Sequenced Peptides per Protein Yasushi Ishihama, Yoshiya Oda, Tsuyoshi Tabata, Toshitaka Sato, Takeshi Nagasu, Juri Rappsilber, and Matthias Mann Molecular & Cellular Proteomics 4:1265-1272, 2005.

A technique is presented that can be used to calculated protein concentrations from the results of database search engines such as Mascot or SEQUEST. Thus it is possible to apply this approach to previously measured datasets and in cases where isotopes or labeling are not practical.

emPAI = 10^PAI - 1

The PAI is defined as the number of peptides observed divided by the number of peptides observable.
PAI can be used to estimate the relative proteins. emPAI can be used to predict the absolute concentration.


“At present, it is not clear why the logarithm of protein concentration correlates with the number of observed peptides, and in any case this relationship is likely to be due to a combination of processes and probably holds only approximately.”

In many ways this paper demonstrates the field of proteomic’s craving for simple quantition. Proteomics is quickly graduating from qualitative to quantitative analyis. On thing that makes it hard for me to take this technique seriously is the numerous reports demonstrating abysmal peptide identification reproducibility. I am concerned that the number of observed peptides in a complex proteome is influenced by the tandem MS ion selection, digestion and ion suppression.
Essentially this technique builds the quantitation on the reproducibility of the identification. I do not mean to imply that I do not believe the results that are presented, but I do wonder if a critical part of implementing this technique in a real world sample has been glossed over and I have not been observant enough to identify it.

One aspect of the presented experiment that might be key to the application of this technique is that the experiment was run such that there was a high coverage of the peptides. In the data presented proteins had as many as 31 peptides identifies.

Regardless of my doubts their data is impressive. In conclusion if tandem MS reproducibility, digestion and ion suppression can be controlled this might be a great option for quantitation.

Tuesday, March 21, 2006

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Precision: The Key to Proteomic / Peptidomic Pattern Recognition

Review of "Correcting Common Errors in Identifying Cancer-Specific Serum Peptide Signature"

Journal of Proteome Research 2005, 4, 1060- 1072
Josep Villanueva, John Philip, Carlos A. Chaparro, Yongbiao Li, Ricardo Toledo-Crow, Lin DeNoyer, Martin Fleisher, Richard J. Robbins, and Paul Tempst


While describing early proteomic pattern recognition papers, the authors stated: “If these early results are as robust and reproducible as they seem, then serum proteomics will undoubtedly attain a prominent and lasting position in the future of cancer diagnostics. Despite initial excitement, skepticism about the methodology and the results is mounting in the scientific community.” In response to this skepticism, this paper addresses a very important part of the application of pattern recognition to proteomics or “peptidomics”, namely the clinical and analytical chemistry variables. I feel the skepticism surrounding the early proteomics pattern recognition work is valid and thus the topic of this paper very important. The clinical and analytical chemistry variables addressed in this paper can be major sources of bias in pattern recogntion. It is important to realize how the everything from blood collection and clotting, to serum storage and handling, automated peptide extraction, crystallization, spectral acquisition, and signal processing affect the measurement. This paper includes a clearly written table and diagram that illustrate the protocol for serum peptide sample preparation from the blood draw to the MALDI plate. This paper also includes a rather detailed recipe for data analysis, including smoothing, baseline correction, normalization, calibration/alignment, and peak labeling.

“In sum, any systematic bias in serum preparation and/or storage between two or more groups of samples can result in a statistically relevant, yet clinically useless diagnostic tool.”

An experiment where the effect of clotting at room temperature for 5 min, 1 h and 5 h are compared is included. In this experiment, some intensity diminished while others increased as clotting time increased. This implies a degradation of the plasma peptides. Also, the effect of freeze-thaw cycles on serum peptide profiling using RP magnetic particles and MALDI TOF MS was shown to be dramatic.

The authors illustrated results that should be of great concern to anyone using bead based RP extraction of serum. In their study different batches of the same extraction media from the same manufacture gave dramatically different results. The change was so pronounced that I personally would avoid bead based extraction, although the authors did defend this method.

One criticism of this paper is that after a very detailed look at sample handling, instrument operation and signal preprocessing they described the final result of patter recognition with terms like “A fairly good, but not perfect, segregation.” I wish that they had taken the work that extra mile and reported quantitative results like sensitivity, specificity or accuracy.

The authors created most of their data analysis software in MATLAB. Although they brand these routines, they do not include information on how one might obtain most of the routines. Multivariate analysis like ANOVA, PCA, hierarchical clustering, K-NN and SVMs was done using GeneSpring (Agilent; Palo Alto, CA).

The vocabulary word for today is sera. Sera is plural for serum.

Friday, March 10, 2006

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Insilicos Awarded Grant to Study Heart Disease

Insilicos today announced that the National Heart, Lung, and Blood Institute has awarded Insilicos a two-year, $400,000 grant to develop a blood test for heart disease. A successful test could identify people at risk for heart attacks, when there is still time to prevent a heart attack.
"Heart disease kills more Americans each year than any other disease, claiming almost a million people every year," said Insilicos Chief Scientist Dr. Bryan Prazen, Principal Investigator on the grant. "We're excited to be working with the NIH to address this serious health problem."
To conduct the research, Insilicos will use patent-pending techniques to sensitively analyze blood samples. "Current diagnostic techniques are expensive, unpleasant for the patient, and ultimately not very accurate," said Insilicos President Erik Nilsson. "A better test for heart disease is needed, and we're honored to have the chance to work on it."
Insilicos LLC develops life science software for pharmaceutical development, biological research and clinical diagnostics.
For more information, visit the Insilicos web site www.insilicos.com or contact Insilicos at info@insilicos.com. 'Insilicos' and 'Life Science Software' are trademarks of Insilicos LLC.

Thursday, March 09, 2006

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Todays vocabulary word is exopeptidase.

Exopeptidase is an enzyme that catalyses the removal of a single amino acid from the end of a polypeptide chain.

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Proteomic classification of cancer that actually works?

Differential exoprotease activities confer tumor-specific serum peptidome patterns. Josep Villanueva et al. Journal of Clinical Investigation, 116, 1, 2006

Using an optimized peptide extraction and MALDI-TOF MS serum peptide signatures provide accurate class discrimination between patients with 3 types of solid tumors. Despite a huge effort, few proteins have been validated as cancer biomarkers. This paper demonstrates that peptides in the serum of cancer patients that are generated as a result of tumor protease activity can be used for the detection and classification of cancer. The authors propose that the proteolytic degradation patterns in the serum peptidome might also to distinguish indolent from aggressive tumors. The data set measured was pretty substantial, 106 serum samples from patients with advanced prostate cancer, bladder cancer or breast cancer.

“…this study provides a direct link between peptide marker profiles of disease and differential protease activity, and the patterns we describe may have clinical utility as surrogate markers for detection and classification of cancer.”

I like the way they tell their story. What this area needs is more pattern recognition papers written by critical biologists. This paper comes pretty close to the mark.

One fabulous aspect of this paper is the experimental design. By distinguishing types of cancer, they demonstrated that serum peptide signatures are not just indicators of a nonspecific inflammatory condition, such as arthritis or infection.

One area of concern is the author’s feature selection. They selected only 61 masses from the entire MALDI spectrum for the pattern recognition. Assuming each mass is associated with a single peptide, 61 seems like a reasonable number of peptides that are both detected by MALDI and differentially expressed by causers. But the thresholds use to select the 61 seemed pretty arbitrary. For this reason, I question how much user intervention would be necessary to apply this method to other diseases or even other data.

Wednesday, March 01, 2006

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

BMSorter: Yet another proteomics tool from Aebersold and Friends

Proteome analysis of Halobacterium sp. NRC-1 facilitated by the biomodules analysis tool BMSorter by Rueichi R. Gan1 at al. Molecular & Cellular Proteomics Papers in Press. Published on February 23, 2006 as Manuscript M500367-MCP200

BMSorter is a biological networks analysis tool that incorporates protein identification results into biological networks. BMSorter pulls together information from the identification of proteins with Trans Proteomics Pipeline (PeptideProphet and ProteinProphet) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) database (http://www.genome.ad.jp/). BMSorter is written in Perl language and interfaces via web browse (Apache web server). BMsorter can be obtained by contacting Wailap Victor Ng at wvng@ym.edu.tw

This paper reports on the systems analysis of the Halobacterium sp. NRC-1 soluble proteome identified by 2-dimensional liquid chromatography coupled with tandem mass spectrometry. Halobacterium is a class of archaea, found in water saturated or nearly saturated with salt. Proteins were identified using SEQUEST in combination with the Trans Proteomic Pipeline. BMSorter pulled the protein identification and the metabolic pathway information together. Cytoscape (www.cytoscape.org) was utilized display the protein identification information on amino acid metabolisms and the citrate cycle pathways in terms of the enzyme-metabolite interaction networks.

This title of this paper uses the somewhat novel term biomodules. Biological modules (or 'biomodules') are loose associations of preferred molecular interaction partners that interact to perform a collective function.
One simple and fun thing that was included is an equation to estimated number of false positive identifications from the Peptide Prophet results. The number of false positive identifications of N proteins with a probability value of P is equal to N x (1.0-P). Thus, if you identified 100 proteins all with a probability of .99 you can estimate that you have one false positive. In a realistic case the probabilities for each protein would be unique but the calculation remains trivial.

It is exciting to see how the average number of proteins that are identified in a standard MudPIT style experiment continues to grow. In this study 888 proteins were identified with a ProteinProphet probability (P) > 0.9. It seems that it was only a year ago that a study of this type might only identify a couple hundred proteins.