Proteome Measures

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Wednesday, October 11, 2006

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Ascore: Taking on matching of Phosphorylated Peptides

A probability-based approach for high-throughput protein phosphorylation analysis and site localization Sean A Beausoleil, Judit Villen, Scott A Gerber, John Rush & Steven P Gygi

Search algorithms like SEQUEST or Mascot often identify the proper peptide sequence, but fail to provide information about the presence or absence of site-determining ions. As a result, users must manually inspect each spectrum to confirm proper site localization. The authors present a method named Ascore. Ascore works with results of matching techniques like SEQUEST to calculates phosphorylation site localization based on the presence and intensity of site-determining ions in MS/MS spectra.
The Ascore method was able to match with 99% certainty two- to fourfold more phosphorylation sites in a data set of known phoshorylation sites than Sequest or Mascot.

This paper not only offers a novel technique for identifying phosphorylated peptides, but also provides data and a formula for the optimization of SEQUEST and Mascot for the analysis of phosphorylated peptides.

This paper is well written and organized. The authors build a very sound argument by first testing the assumptions of their quality measurement. I also appreciate that they looked at both sensitivity and precision. Many peptide identification methods are touted for their sensitivity while neglecting their precision.

A website that spectra and .out files can be submitted for Ascore analysis is under construction (http://Ascore.med.harvard.edu).

Tuesday, June 13, 2006

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Insilicos Releases Proteomics Data Analysis Pipeline

Insilicos announced the launch of Insilicos Proteomics Pipeline (IPP), a high-performance software system for the identification and analysis of proteomics data.
IPP is the result of an open-source collaboration led by the Institute for Systems Biology (ISB). "This offering by Insilicos will allow more people to take advantage of the proteomics approaches that we pioneered at the Institute for Systems Biology," said ISB cofounder and internationally renowned biologist Dr. Ruedi Aebersold. "The performance and ease of use of this product is a significant advance, and will contribute to the goal of consistent and transparent analysis of proteomics data."
The pipeline containes many highly regarded data analysis tools, including: PeptideProphet, ProteinProphet and ASAPRatio. Development of additional tools for the pipeline are underway.
Insilicos developed IPP as a result of its colaboration on the ISB's proteomics project. IPP takes the groundbreaking work of the ISB, and makes it run up to 20 times faster —fast enough to run on an inexpensive laptop. "Proteomics will revolutionize human health, but software has been a key bottleneck," said Insilicos president Erik Nilsson. "Now IPP brings that power within reach of anyone with a laptop."
The ISB originally developed the Trans-Proteomic Pipeline tools as part of a grant funded by the National Heart, Lung, and Blood Institute (NHLBI), part of the National Institutes of Health. In 2005, Insilicos and the ISB received a grant from the NHLBI to further develop the ISB's software for use by a broad range of scientists.
For more information, visit the Insilicos web site www.insilicos.com

Thursday, March 30, 2006

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

What is all this about labeling and signal integration can’t you just count the peptides?

Exponentially Modified Protein Abundance Index (emPAI) for Estimation of Absolute Protein Amount in Proteomics by the Number of Sequenced Peptides per Protein Yasushi Ishihama, Yoshiya Oda, Tsuyoshi Tabata, Toshitaka Sato, Takeshi Nagasu, Juri Rappsilber, and Matthias Mann Molecular & Cellular Proteomics 4:1265-1272, 2005.

A technique is presented that can be used to calculated protein concentrations from the results of database search engines such as Mascot or SEQUEST. Thus it is possible to apply this approach to previously measured datasets and in cases where isotopes or labeling are not practical.

emPAI = 10^PAI - 1

The PAI is defined as the number of peptides observed divided by the number of peptides observable.
PAI can be used to estimate the relative proteins. emPAI can be used to predict the absolute concentration.


“At present, it is not clear why the logarithm of protein concentration correlates with the number of observed peptides, and in any case this relationship is likely to be due to a combination of processes and probably holds only approximately.”

In many ways this paper demonstrates the field of proteomic’s craving for simple quantition. Proteomics is quickly graduating from qualitative to quantitative analyis. On thing that makes it hard for me to take this technique seriously is the numerous reports demonstrating abysmal peptide identification reproducibility. I am concerned that the number of observed peptides in a complex proteome is influenced by the tandem MS ion selection, digestion and ion suppression.
Essentially this technique builds the quantitation on the reproducibility of the identification. I do not mean to imply that I do not believe the results that are presented, but I do wonder if a critical part of implementing this technique in a real world sample has been glossed over and I have not been observant enough to identify it.

One aspect of the presented experiment that might be key to the application of this technique is that the experiment was run such that there was a high coverage of the peptides. In the data presented proteins had as many as 31 peptides identifies.

Regardless of my doubts their data is impressive. In conclusion if tandem MS reproducibility, digestion and ion suppression can be controlled this might be a great option for quantitation.

Tuesday, March 21, 2006

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Precision: The Key to Proteomic / Peptidomic Pattern Recognition

Review of "Correcting Common Errors in Identifying Cancer-Specific Serum Peptide Signature"

Journal of Proteome Research 2005, 4, 1060- 1072
Josep Villanueva, John Philip, Carlos A. Chaparro, Yongbiao Li, Ricardo Toledo-Crow, Lin DeNoyer, Martin Fleisher, Richard J. Robbins, and Paul Tempst


While describing early proteomic pattern recognition papers, the authors stated: “If these early results are as robust and reproducible as they seem, then serum proteomics will undoubtedly attain a prominent and lasting position in the future of cancer diagnostics. Despite initial excitement, skepticism about the methodology and the results is mounting in the scientific community.” In response to this skepticism, this paper addresses a very important part of the application of pattern recognition to proteomics or “peptidomics”, namely the clinical and analytical chemistry variables. I feel the skepticism surrounding the early proteomics pattern recognition work is valid and thus the topic of this paper very important. The clinical and analytical chemistry variables addressed in this paper can be major sources of bias in pattern recogntion. It is important to realize how the everything from blood collection and clotting, to serum storage and handling, automated peptide extraction, crystallization, spectral acquisition, and signal processing affect the measurement. This paper includes a clearly written table and diagram that illustrate the protocol for serum peptide sample preparation from the blood draw to the MALDI plate. This paper also includes a rather detailed recipe for data analysis, including smoothing, baseline correction, normalization, calibration/alignment, and peak labeling.

“In sum, any systematic bias in serum preparation and/or storage between two or more groups of samples can result in a statistically relevant, yet clinically useless diagnostic tool.”

An experiment where the effect of clotting at room temperature for 5 min, 1 h and 5 h are compared is included. In this experiment, some intensity diminished while others increased as clotting time increased. This implies a degradation of the plasma peptides. Also, the effect of freeze-thaw cycles on serum peptide profiling using RP magnetic particles and MALDI TOF MS was shown to be dramatic.

The authors illustrated results that should be of great concern to anyone using bead based RP extraction of serum. In their study different batches of the same extraction media from the same manufacture gave dramatically different results. The change was so pronounced that I personally would avoid bead based extraction, although the authors did defend this method.

One criticism of this paper is that after a very detailed look at sample handling, instrument operation and signal preprocessing they described the final result of patter recognition with terms like “A fairly good, but not perfect, segregation.” I wish that they had taken the work that extra mile and reported quantitative results like sensitivity, specificity or accuracy.

The authors created most of their data analysis software in MATLAB. Although they brand these routines, they do not include information on how one might obtain most of the routines. Multivariate analysis like ANOVA, PCA, hierarchical clustering, K-NN and SVMs was done using GeneSpring (Agilent; Palo Alto, CA).

The vocabulary word for today is sera. Sera is plural for serum.

Friday, March 10, 2006

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Insilicos Awarded Grant to Study Heart Disease

Insilicos today announced that the National Heart, Lung, and Blood Institute has awarded Insilicos a two-year, $400,000 grant to develop a blood test for heart disease. A successful test could identify people at risk for heart attacks, when there is still time to prevent a heart attack.
"Heart disease kills more Americans each year than any other disease, claiming almost a million people every year," said Insilicos Chief Scientist Dr. Bryan Prazen, Principal Investigator on the grant. "We're excited to be working with the NIH to address this serious health problem."
To conduct the research, Insilicos will use patent-pending techniques to sensitively analyze blood samples. "Current diagnostic techniques are expensive, unpleasant for the patient, and ultimately not very accurate," said Insilicos President Erik Nilsson. "A better test for heart disease is needed, and we're honored to have the chance to work on it."
Insilicos LLC develops life science software for pharmaceutical development, biological research and clinical diagnostics.
For more information, visit the Insilicos web site www.insilicos.com or contact Insilicos at info@insilicos.com. 'Insilicos' and 'Life Science Software' are trademarks of Insilicos LLC.

Thursday, March 09, 2006

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Todays vocabulary word is exopeptidase.

Exopeptidase is an enzyme that catalyses the removal of a single amino acid from the end of a polypeptide chain.

Proteomics - Systems Biology - Mass Spectrometry - Peptide Pattern Recognition

Proteomic classification of cancer that actually works?

Differential exoprotease activities confer tumor-specific serum peptidome patterns. Josep Villanueva et al. Journal of Clinical Investigation, 116, 1, 2006

Using an optimized peptide extraction and MALDI-TOF MS serum peptide signatures provide accurate class discrimination between patients with 3 types of solid tumors. Despite a huge effort, few proteins have been validated as cancer biomarkers. This paper demonstrates that peptides in the serum of cancer patients that are generated as a result of tumor protease activity can be used for the detection and classification of cancer. The authors propose that the proteolytic degradation patterns in the serum peptidome might also to distinguish indolent from aggressive tumors. The data set measured was pretty substantial, 106 serum samples from patients with advanced prostate cancer, bladder cancer or breast cancer.

“…this study provides a direct link between peptide marker profiles of disease and differential protease activity, and the patterns we describe may have clinical utility as surrogate markers for detection and classification of cancer.”

I like the way they tell their story. What this area needs is more pattern recognition papers written by critical biologists. This paper comes pretty close to the mark.

One fabulous aspect of this paper is the experimental design. By distinguishing types of cancer, they demonstrated that serum peptide signatures are not just indicators of a nonspecific inflammatory condition, such as arthritis or infection.

One area of concern is the author’s feature selection. They selected only 61 masses from the entire MALDI spectrum for the pattern recognition. Assuming each mass is associated with a single peptide, 61 seems like a reasonable number of peptides that are both detected by MALDI and differentially expressed by causers. But the thresholds use to select the 61 seemed pretty arbitrary. For this reason, I question how much user intervention would be necessary to apply this method to other diseases or even other data.