Proteins belonging to the conserved and diversified Snf2 family provide the ATP-driven motor subunits for remodelling systems, which control the accessibility of chromatin DNA. The 41 proteins of this family encoded in the Arabidopsis genome fall into 19 distinct subfamilies. Although most of the plant Snf2 proteins studied so far retain the functional specialization of their yeast and animal homologues, some have been adapted for functions occurring only in plants. We present a comprehensive in silico characterization of the domain architecture of the complete set of Arabidopsis Snf2 proteins. In combination with recent data on the molecular mechanisms underlying the functions of some yeast and animal homologues, this offers an insight into the different roles of Snf2 proteins in plants.
The aerobic catabolism of nicotinic acid (NA) is considered a model system for degradation of N-heterocyclic aromatic compounds, some of which are major environmental pollutants; however, the complete set of genes as well as the structural-functional relationships of most of the enzymes involved in this process are still unknown. We have characterized a gene cluster (nic genes) from Pseudomonas putida KT2440 responsible for the aerobic NA degradation in this bacterium and when expressed in heterologous hosts. The biochemistry of the NA degradation through the formation of 2,5-dihydroxypyridine and maleamic acid has been revisited, and some gene products become the prototype of new types of enzymes with unprecedented molecular architectures. Thus, the initial hydroxylation of NA is catalyzed by a two-component hydroxylase (NicAB) that constitutes the first member of the xanthine dehydrogenase family whose electron transport chain to molecular oxygen includes a cytochrome c domain. The Fe(2+)-dependent dioxygenase (NicX) converts 2,5-dihydroxypyridine into N-formylmaleamic acid, and it becomes the founding member of a new family of extradiol ring-cleavage dioxygenases. Further conversion of N-formylmaleamic acid to formic and maleamic acid is catalyzed by the NicD protein, the only deformylase described so far whose catalytic triad is similar to that of some members of the alpha/beta-hydrolase fold superfamily. This work allows exploration of the existence of orthologous gene clusters in saprophytic bacteria and some pathogens, where they might stimulate studies on their role in virulence, and it provides a framework to develop new biotechnological processes for detoxification/biotransformation of N-heterocyclic aromatic compounds.
We present here a neural network-based method for detection of signal peptides (abbreviation used: SP) in proteins. The method is trained on sequences of known signal peptides extracted from the Swiss-Prot protein database and is able to work separately on prokaryotic and eukaryotic proteins. A query protein is dissected into overlapping short sequence fragments, and then each fragment is analyzed with respect to the probability of it being a signal peptide and containing a cleavage site. While the accuracy of the method is comparable to that of other existing prediction tools, it provides a significantly higher speed and portability. The accuracy of cleavage site prediction reaches 73% on heterogeneous source data that contains both prokaryotic and eukaryotic sequences while the accuracy of discrimination between signal peptides and non-signal peptides is above 93% for any source dataset. As a consequence, the method can be easily applied to genome-wide datasets. The software can be downloaded freely from http://rpsp.bioinfo.pl/RPSP.tar.gz.
Abstract not available yet.
Abstract not available yet.
The MRF1 gene encodes the only class I release factor found in Saccharomyces cerevisiae mitochondria, mRF1. The previously isolated point mutation mrf1-13 caused respiratory deficiency due to inhibition of mitochondrial translation. In this study, we have isolated second-site suppressors of mrf1-13. Among over 200 respiratory positive suppressor colonies, ten nuclear dominant suppressors had a new mutation in the MRF1 gene. The suppressors in combination with the original mrf1-13 revealed increased levels of mitochondrially synthesized proteins, Cox2 and Atp6. One of the suppressor alleles was cloned on a plasmid and was found to support weaker respiratory competence than in combination with mrf1-13. Finally, the possible effects of the suppressor mutations are discussed based on a structural model of mRF1 protein built for its "open" and "closed" forms using known crystal structures of prokaryotic release factor RF1 as templates. The 3D models suggest that at least some suppressors switch the structure of mRF1 from the "closed" to a permanently "open" form causing stronger binding of the mRF1 protein to the ribosome and increasing the time of ribosome occupation. This explains how the suppressor mutants may facilitate translation termination despite a defect in decoding of the stop signal.
We present here the recent update of AutoMotif Server (AMS 2.0) that predicts post-translational modification sites in protein sequences. The support vector machine (SVM) algorithm was trained on data gathered in 2007 from various sets of proteins containing experimentally verified chemical modifications of proteins. Short sequence segments around a modification site were dissected from a parent protein, and represented in the training set as binary or profile vectors. The updated efficiency of the SVM classification for each type of modification and the predictive power of both representations were estimated using leave-one-out tests for model of general phosphorylation and for modifications catalyzed by several specific protein kinases. The accuracy of the method was improved in comparison to the previous version of the service (Plewczynski et al., "AutoMotif server: prediction of single residue post-translational modifications in proteins", Bioinformatics 21: 2525-7, 2005). The precision of the updated version reached over 90% for selected types of phosphorylation and was optimized in trade of lower recall value of the classification model. The AutoMotif Server version 2007 is freely available at http://ams2.bioinfo.pl/ . Additionally, the reference dataset for optimization of prediction of phosphorylation sites, collected from the UniProtKB was also provided and can be accessed at http://ams2.bioinfo.pl/data.
2007
Bacterial DUF199/COG1481 proteins including sporulation regulator WhiA
are distant homologs of LAGLIDADG homing endonucleases that retained only DNA binding |
| Knizewski L, Ginalski K |
| Cell Cycle 6, 1666-1670 |
|
Homing endonucleases (HEnases) form a large and highly diverse class of proteins encoded by introns and inteins (as well as free-standing genes) that confer mobility to their host genetic elements. These fast evolving enzymes catalyze site specific, double-stranded breaks in intron/intein-less alleles. In the strand exchange during the repair process, the intron/intein encoding homing endonuclease is incorporated into the previously intron/intein-free allele, thus promoting the survival of this selfish genetic element. Analysis carried out using our distant homology detection method Meta-BASIC12 that exploits comparison of sequence profiles combined with predicted secondary structure, have mapped the N-terminal region of DUF199 consensus sequence with above threshold scores (predictions with Z-score>12 have less than 5% probability of being incorrect12) onto several LAGLIDADG HEnase structures.
BACKGROUND: PD-(D/E)XK nucleases constitute a large and highly diverse superfamily of enzymes that display little sequence similarity despite retaining a common core fold and a few critical active site residues. This makes identification of new PD-(D/E)XK nuclease families a challenging task as they usually escape detection with standard sequence-based methods. We developed a modified transitive meta profile search approach and to consider the structural diversity of PD-(D/E)XK nuclease fold more thoroughly we analyzed also lower than threshold Meta-BASIC hits to select potentially correct predictions placed among unreliable or incorrect ones. RESULTS: Application of a modified transitive Meta-BASIC searches on updated PFAM families and PDB structures resulted in detection of five new PD-(D/E)XK nuclease families encompassing hundreds of so far uncharacterized and poorly annotated proteins. These include four families catalogued in PFAM database as domains of unknown function (DUF506, DUF524, DUF1626 and DUF1703) and YhgA-like family of putative transposases. Three of these families represent extremely distant homologs (DUF506, DUF524, and YhgA-like), while two are newly defined in updated database (DUF1626 and DUF1703). In addition, we also confidently identified an extended AAA-ATPase domain in the N-terminal region of DUF1703 family proteins. CONCLUSION: Obtained results suggest that detailed analysis of below threshold Meta-BASIC hits may push limits further for distant homology detection in the 'midnight zone' of homology. All identified families conserve the core evolutionary fold, secondary structure and hydrophobic patterns common to existing PD-(D/E)XK nucleases and maintain critical active site motifs that contribute to nucleic acid cleavage. Further experimental investigations should address the predicted activity and clarify potential substrates providing further insight into detailed biological role of these newly detected nucleases.
In many cases, at the beginning of a high throughput screening experiment some
information about active molecules is already available. Active compounds
(such as substrate analogues, natural products and inhibitors of related proteins)
are often identified in low throughput validation studies on a biochemical
target. Sometimes the additional structural information is also available from
crystallographic studies on protein and ligand complexes. In addition, the
structural or sequence similarity of various protein targets yields a novel
possibility for drug discovery. Co-crystallized compounds from homologous
proteins can be used to design leads for a new target without co-crystallized
ligands. In this paper we evaluate how far such an approach can be used
in a real drug campaign, with severe acute respiratory syndrome (SARS)
coronavirus providing an example. Our method is able to construct small
molecules as plausible inhibitors solely on the basis of the set of ligands
from crystallized complexes of a protein target, and other proteins from its
structurally homologous family. The accuracy and sensitivity of the method are
estimated here by the subsequent use of an electronic high throughput screening
flexible docking algorithm. The best performing ligands are then used for a very
restrictive similarity search for potential inhibitors of the SARS protease within
the million compounds from the Ligand.Info small molecule meta-database.
The selected molecules can be passed on for further experimental validation.
The RPSP: Web server for prediction of signal peptides |
| Plewczynski D, Slabinski L, Tkacz A, Kajan L, Holm L, Ginalski K, Rychlewski L |
| Polymer 48, 5493-5496 |
|
The RPSP is a fast web service for detection of signal peptides in proteins. The method uses neural networks trained on known signal peptides from the Swiss-Prot protein database. The web server works either on prokaryotic and eukaryotic proteins or without specifying an organism type. The accuracy of the web server is similar to other available computational prediction web services, yet because of its speed and portability the method can be easily applied to whole proteomes. The RPSP web server is available at http://rpsp.bioinfo.pl.
SelT, SelW, SelH, and Rdx12: genomics and molecular insights into the functions of selenoproteins of a novel thioredoxin-like family |
| Dikiy A, Novoselov SV, Fomenko DE, Sengupta A, Carlson BA, Cerny RL, Ginalski K, Grishin NV, Hatfield DL, Gladyshev VN |
| Biochemistry 46, 6871-6882 |
|
Selenium is an essential trace element in many life forms due to its occurrence as a selenocysteine (Sec) residue in selenoproteins. The majority of mammalian selenoproteins, however, have no known function. Herein, we performed extensive sequence similarity searches to define and characterize a new protein family, designated Rdx, that includes mammalian selenoproteins SelW, SelV, SelT and SelH, bacterial SelW-like proteins and cysteine-containing proteins of unknown function in all three domains of life. An additional member of this family is a mammalian cysteine-containing protein, designated Rdx12, and its fish selenoprotein orthologue. Rdx proteins are proposed to possess a thioredoxin-like fold and a conserved CxxC or CxxU (U is Sec) motif, suggesting a redox function. We cloned and characterized three mammalian members of this family, which showed distinct expression patterns in mouse tissues and different localization patterns in cells transfected with the corresponding GFP fusion proteins. By analogy to thioredoxin, Rdx proteins can use catalytic cysteine (or Sec) to form transient mixed disulfides with substrate proteins. We employed this property to identify cellular targets of Rdx proteins using affinity columns containing mutant versions of these proteins. Rdx12 was found to interact with glutathione peroxidase 1, whereas 14-3-3 protein was identified as one of the targets of mammalian SelW, suggesting a mechanism for redox regulation of the 14-3-3 family of proteins.
A structure-based in silico virtual drug discovery procedure was assessed with severe acute respiratory syndrome coronavirus main protease serving as a case study. First, potential compounds were extracted from protein-ligand complexes selected from Protein Data Bank database based on structural similarity to the target protein. Later, the set of compounds was ranked by docking scores using a Electronic High-Throughput Screening flexible docking procedure to select the most promising molecules. The set of best performing compounds was then used for similarity search over the 1 million entries in the Ligand.Info Meta-Database. Selected molecules having close structural relationship to a 2-methyl-2,4-pentanediol may provide candidate lead compounds toward the development of novel allosteric severe acute respiratory syndrome protease inhibitors.
In many cases at the beginning of an HTS-campaign, some information about active molecules is already available. Often known active compounds (such as substrate analogues, natural products, inhibitors of a related protein or ligands published by a pharmaceutical company) are identified in low-throughput validation studies of the biochemical target. In this study we evaluate the effectiveness of a support vector machine applied for those compounds and used to classify a collection with unknown activity. This approach was aimed at reducing the number of compounds to be tested against the given target. Our method predicts the biological activity of chemical compounds based on only the atom pairs (AP) two dimensional topological descriptors. The supervised support vector machine (SVM) method herein is trained on compounds from the MDL drug data report (MDDR) known to be active for specific protein target. For detailed analysis, five different biological targets were selected including cyclooxygenase-2, dihydrofolate reductase, thrombin, HIV-reverse transcriptase and antagonists of the estrogen receptor. The accuracy of compound identification was estimated using the recall and precision values. The sensitivities for all protein targets exceeded 80% and the classification performance reached 100% for selected targets. In another application of the method, we addressed the absence of an initial set of active compounds for a selected protein target at the beginning of an HTS-campaign. In such a case, virtual high-throughput screening (vHTS) is usually applied by using a flexible docking procedure. However, the vHTS experiment typically contains a large percentage of false positives that should be verified by costly and time-consuming experimental follow-up assays. The subsequent use of our machine learning method was found to improve the speed (since the docking procedure was not required for all compounds from the database) and also the accuracy of the HTS hit lists (the enrichment factor).
2006
A large and highly diverse Gcn5-related N-acetyltransferase (GNAT) superfamily includes proteins that play various cellular functions in gene regulation, antibiotic resistance or hormonal regulation of circadian rhythms. These enzymes catalyze the transfer of the acetyl group from acetyl coenzyme A (AcCoA) to a wide variety of acceptor substrates including lysine residues in histones and other proteins as well as amino groups in arylalkyamines and aminoglycosides. Despite retaining a common core around which a more variable repertoire of structures is built, GNAT superfamily members display little sequence similarity and retain few invariant residues. In this work we used a variety of sophisticated fold recognition and distant homology detection methods and identified a novel family of eukaryotic GNATs so far catalogued in PFAM database as a domain of unknown function DUF738. Searches with standard sequence comparison tools using DUF738 protein sequences did not yield any significant hits to known protein domains. However, Meta-BASIC confidently mapped DUF738 family members to both Esa1 histone acetyltransferase structure and MOZ/SAS family, which has been suggested to be homologous to acetyltransferases. Meta-BASIC prediction was further confirmed by consensus fold recognition 3D-Jury server that provided consistent matches to variety AcCoA N-acetyltransferase structures. Conservation of all characteristic GNAT superfamily motifs and good mapping of predicted and observed secondary structures are additional indicators of the correct but highly non-trivial assignment. Our detailed analyses suggest that DUF738 family members play important role in chromatin modifying or regulate a transcription related proteins.
DUF1613 is a novel family of eucaryotic AdoMet-dependent methyltransferases |
| Knizewski L, Ginalski K |
| Cell Cycle 5, 1580-1582 |
|
A large and highly diverse Gcn5-related N-acetyltransferase (GNAT) superfamily includes proteins that play various cellular functions in gene regulation, antibiotic resistance or hormonal regulation of circadian rhythms. These enzymes catalyze the transfer of the acetyl group from acetyl coenzyme A (AcCoA) to a wide variety of acceptor substrates including lysine residues in histones and other proteins as well as amino groups in arylalkyamines and aminoglycosides. Despite retaining a common core around which a more variable repertoire of structures is built, GNAT superfamily members display little sequence similarity and retain few invariant residues. In this work we used a variety of sophisticated fold recognition and distant homology detection methods and identified a novel family of eukaryotic GNATs so far catalogued in PFAM database as a domain of unknown function DUF738. Searches with standard sequence comparison tools using DUF738 protein sequences did not yield any significant hits to known protein domains. However, Meta-BASIC confidently mapped DUF738 family members to both Esa1 histone acetyltransferase structure and MOZ/SAS family, which has been suggested to be homologous to acetyltransferases. Meta-BASIC prediction was further confirmed by consensus fold recognition 3D-Jury server that provided consistent matches to variety AcCoA N-acetyltransferase structures. Conservation of all characteristic GNAT superfamily motifs and good mapping of predicted and observed secondary structures are additional indicators of the correct but highly non-trivial assignment. Our detailed analyses suggest that DUF738 family members play important role in chromatin modifying or regulate a transcription related proteins.
Crystal structure of the ApbE protein (TM1553) from Thermotoga maritima
at 1.58 A resolution |
Han GW, Sri Krishna S, Schwarzenbacher R, McMullan D, Ginalski K, Elsliger MA, Brittain SM, Abdubek P, Agarwalla S, Ambing E, Astakhova T, Axelrod H, Canaves JM, Chiu HJ, DiDonato M, Grzechnik SK, Hale J, Hampton E, Haugen J, Jaroszewski L, Jin KK, Klock HE, Knuth MW, Koesema E, Kreusch A, Kuhn P, Miller MD, Morse AT, Moy K, Nigoghossian E, Oommachen S, Ouyang J, Paulsen J, Quijano K, Reyes R, Rife C, Spraggon G, Stevens RC, van den Bedem H, Velasquez J, Wang X, West B, White A, Wolf G, Xu Q, Hodgson KO, Wooley J, Deacon AM, Godzik A, Lesley SA, Wilson IA |
| Proteins 64, 1083-1090 |
|
The TM1553 gene of Thermotoga maritima
encodes a lipoprotein with a molecular weight of
39,409 Da (residues 1–352) and a calculated isoelectric
point of 5.6. Sequence analysis reveals that TM1553
belongs to a predominantly prokaryotic family of proteins
that are homologous to the ApbE family of periplasmic
lipoproteins. ApbE is involved in thiamine (vitamin B1)
biosynthesis and has been proposed to carry out the
conversion of aminoimidazole ribotide (AIR) to 4-amino-5-
hydroxymethyl-2-methyl pyrimidine (HMP). Although
the precise biochemical function of ApbE is not known,
mutagenesis studies have indicated that ApbE is important
for Fe-S cluster metabolism. The exact role played
by ApbE in either of these activities remains unclear.
Herein, we report the crystal structure of TM1553, the
first structural representative of the ApbE family, which
was determined using the semiautomated, high-throughput
pipeline of the Joint Center for Structural Genomics
(JCSG).
Efficacy of 2-halogen substituted D-glucose analogs in blocking glycolysis and killing "hypoxic tumor cells" |
Lampidis TJ, Kurtoglu M, Maher JC, Liu H, Krishan A, Sheft V, Szymanski S, Fokt I, Rudnicki WR, Ginalski K, Lesyng B, Priebe W |
| Cancer Chemother Pharmacol 58, 725-734 |
|
Since 2-deoxy-D-glucose (2-DG) is currently in phase I clinical trials to selectively target slow-growing hypoxic tumor cells, 2-halogenated D-glucose analogs were synthesized for improved activity. Given the fact that 2-DG competes with D-glucose for binding to hexokinase, in silico modeling of molecular interactions between hexokinase I and these new analogs was used to determine whether binding energies correlate with biological effects, i.e. inhibition of glycolysis and subsequent toxicity in hypoxic tumor cells. METHODS AND RESULTS: Using a QSAR-like approach along with a flexible docking strategy, it was determined that the binding affinities of the analogs to hexokinase I decrease as a function of increasing halogen size as follows: 2-fluoro-2-deoxy-D-glucose (2-FG) > 2-chloro-2-deoxy-D-glucose (2-CG) > 2-bromo-2-deoxy-D-glucose (2-BG). Furthermore, D-glucose was found to have the highest affinity followed by 2-FG and 2-DG, respectively. Similarly, flow cytometry and trypan blue exclusion assays showed that the efficacy of the halogenated analogs in preferentially inhibiting growth and killing hypoxic vs. aerobic cells increases as a function of their relative binding affinities. These results correlate with the inhibition of glycolysis as measured by lactate inhibition, i.e. ID50 1 mM for 2-FG, 6 mM for 2-CG and > 6 mM for 2-BG. Moreover, 2-FG was found to be more potent than 2-DG for both glycolytic inhibition and cytotoxicity. CONCLUSIONS: Overall, our in vitro results suggest that 2-FG is more potent than 2-DG in killing hypoxic tumor cells, and therefore may be more clinically effective when combined with standard chemotherapeutic protocols.
Comparative modeling for protein structure prediction |
|
| Curr Opin Struct Biol 16, 172-7 |
|
With the progression of structural genomics projects, comparative modeling remains an increasingly important method of choice. It helps to bridge the gap between the available sequence and structure information by providing reliable and accurate protein models. Comparative modeling based on more than 30% sequence identity is now approaching its natural template-based limits and further improvements require the development of effective refinement techniques capable of driving models toward native structure. For difficult targets, for which the most significant progress in recent years has been observed, optimal template selection and alignment accuracy are still the major problems.
Human herpesvirus 1 UL24 gene encodes a potential PD-(D/E)XK endonuclease |
|
| J Virol 80, 2575-2577 |
|
Using Meta-BASIC, a highly sensitive method for detection of distant similarity between proteins, we have identified another potential PD-(D/E)XK endonuclease in human herpesvirus 1 (HHV-1) encoded by the UL24 gene. The universal presence of UL24 in completed herpesviral genomes of three major subfamilies, Alphaherpesvirinae, Betaherpesvirinae, and Gammaherpesvirinae, suggests a fundamental role for this predicted PD-(D/E)XK endonuclease activity in the viral life cycle.
PDB-UF: database of predicted enzymatic functions for unannotated protein structures from structural genomics |
|
| BMC Bioinformatics 7, 53 |
|
BACKGROUND: The number of protein structures from structural genomics centers dramatically increases in the Protein Data Bank (PDB). Many of these structures are functionally unannotated because they have no sequence similarity to proteins of known function. However, it is possible to successfully infer function using only structural similarity. RESULTS: Here we present the PDB-UF database, a web-accessible collection of predictions of enzymatic properties using structure-function relationship. The assignments were conducted for three-dimensional protein structures of unknown function that come from structural genomics initiatives. We show that 4 hypothetical proteins (with PDB accession codes: 1VH0, 1NS5, 1O6D, and 1TO0), for which standard BLAST tools such as PSI-BLAST or RPS-BLAST failed to assign any function, are probably methyltransferase enzymes. CONCLUSION: We suggest that the structure-based prediction of an EC number should be conducted having the different similarity score cutoff for different protein folds. Moreover, performing the annotation using two different algorithms can reduce the rate of false positive assignments. We believe, that the presented web-based repository will help to decrease the number of protein structures that have functions marked as "unknown" in the PDB file. AVAILABILITY: http://paradox.harvard.edu/PDB-UF and http://bioinfo.pl/PDB-UF.
Site-2 protease regulated intramembrane proteolysis: sequence homologs suggest an ancient signaling cascade |
|
| Protein Sci 15, 84-93 |
|
Site-2 proteases (S2Ps) form a large family of membrane-embedded metalloproteases that participate in cellular signaling pathways through sequential cleavage of membrane-tethered substrates. Using sequence similarity searches, we extend the S2P family to include remote homologs that help define a conserved structural core consisting of three predicted transmembrane helices with traditional metalloprotease functional motifs and a previously unrecognized motif (GxxxN/S/G). S2P relatives were identified in genomes from Bacteria, Archaea, and Eukaryota including protists, plants, fungi, and animals. The diverse S2P homologs divide into several groups that differ in various inserted domains and transmembrane helices. Mammalian S2P proteases belong to the major ubiquitous group and contain a PDZ domain. Sequence and structural analysis of the PDZ domain support its mediating the sequential cleavage of membrane-tethered substrates. Finally, conserved genomic neighborhoods of S2P homologs allow functional predictions for PDZ-containing transmembrane proteases in extra-cytoplasmic stress response and lipid metabolism.
2005
Bacillus subtilis YkuK protein is distantly related to RNase H |
|
| FEMS Microbiol Lett 251, 341-6 |
|
In addition to one hypothetical viral sequence from Bacteriophage KVP40, the PfamA family of unknown function DUF458 (Pfam Accession No. PF04308) encompasses several uncharacterized bacterial proteins including Bacillus subtilis YkuK protein. Using Meta-BASIC, a highly sensitive method for detection of distant similarity between proteins, we assign DUF458 family members to the ribonuclease H-like (RNase H-like) superfamily. DUF458 sequences maintain all core secondary structure elements of RNase H-like fold and share several conserved, presumably active site residues with RNase HI, including an invariant DDE motif. In addition to providing a model structure for a previously uncharacterized protein family, this finding suggests that DUF458 proteins function as nucleases. The unusual phyletic pattern, together with a presence of DUF458 in several thermophilic organisms, may suggest a potential role of these proteins in DNA repair in stressful conditions such as an extreme heat or other stress that causes spore formation.
Identification of novel restriction endonuclease-like fold families among hypothetical proteins |
|
| Nucleic Acids Res 33, 3598-3605 |
|
Restriction endonucleases and other nucleic acid cleaving enzymes form a large and extremely diverse superfamily that display little sequence similarity despite retaining a common core fold responsible for cleavage. The lack of significant sequence similarity between protein families makes homology inference a challenging task and hinders new family identification with traditional sequence-based approaches. Using the consensus fold recognition method Meta-BASIC that combines sequence profiles with predicted protein secondary structure, we identify nine new restriction endonuclease-like fold families among previously uncharacterized proteins and predict these proteins to cleave nucleic acid substrates. Application of transitive searches combined with gene neighborhood analysis allow us to confidently link these unknown families to a number of known restriction endonuclease-like structures and thus assign folds to the uncharacterized proteins. Finally, our method identifies a novel restriction endonuclease-like domain in the C-terminus of RecC that is not detected with structure-based searches of the existing PDB database.
Practical lessons from protein structure prediction |
|
| Nucleic Acids Res 33, 1874-1891 |
|
Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed.
A comprehensive update of the sequence and structure classification
of kinases |
|
| BMC Struct Biol 5, 6 |
|
BACKGROUND: A comprehensive update of the classification of all available kinases was carried out. This survey presents a complete global picture of this large functional class of proteins and confirms the soundness of our initial kinase classification scheme. RESULTS: The new survey found the total number of kinase sequences in the protein database has increased more than three-fold (from 17,310 to 59,402), and the number of determined kinase structures increased two-fold (from 359 to 702) in the past three years. However, the framework of the original two-tier classification scheme (in families and fold groups) remains sufficient to describe all available kinases. Overall, the kinase sequences were classified into 25 families of homologous proteins, wherein 22 families (approximately 98.8% of all sequences) for which three-dimensional structures are known fall into 10 fold groups. These fold groups not only include some of the most widely spread proteins folds, such as the Rossmann-like fold, ferredoxin-like fold, TIM-barrel fold, and antiparallel beta-barrel fold, but also all major classes (all alpha, all beta, alpha+beta, alpha/beta) of protein structures. Fold predictions are made for remaining kinase families without a close homolog with solved structure. We also highlight two novel kinase structural folds, riboflavin kinase and dihydroxyacetone kinase, which have recently been characterized. Two protein families previously annotated as kinases are removed from the classification based on new experimental data. CONCLUSION: Structural annotations of all kinase families are now revealed, including fold descriptions for all globular kinases, making this the first large functional class of proteins with a comprehensive structural annotation. Potential uses for this classification include deduction of protein function, structural fold, or enzymatic mechanism of poorly studied or newly discovered kinases based on proteins in the same family.
Protein domain of unknown function DUF1023 is an alpha/beta hydrolase |
|
| Proteins 59, 1-6 |
|
Pfam family DUF1023 consists entirely of uncharacterized proteins generated by sequencing the genomes of Actinobacteria (Bateman A., et al., Nucleic Acids Res. 2004;32 Database issue:D138-141). Utilizing sequence similarity detection methods, we infer homology between DUF1023 and alpha/beta hydrolases. DUF1023 proteins conserve the core secondary structures in alpha/beta hydrolase fold, and share similar catalytic machinery as that of alpha/beta hydrolases. We predict DUF1023 spatial structure and deduce that they function as hydrolases utilizing catalytic Ser-His-Asp triad with the serine as a nucleophile.
2004
Biochemical identification of Argonaute 2 as the sole protein required for RNA-induced silencing complex activity |
|
| Proc Natl Acad Sci USA 101, 14385-14389 |
|
RNA interference is carried out by the small double-stranded RNA-induced silencing complex (RISC). The RISC-bound small RNA guides the RISC complex to identify and cleave mRNAs with complementary sequences. The proteins that make up the RISC complex and cleave mRNA have not been unequivocally defined. Here, we report the biochemical purification of RISC activity to homogeneity from Drosophila Schnieder 2 cell extracts. Argonaute 2 (Ago-2) is the sole protein component present in the purified, functional RISC. By using a bioinformatics method that combines sequence-profile analysis with predicted protein secondary structure, we found homology between the PIWI domain of Ago-2 and endonuclease V and identified potential active-site amino acid residues within the PIWI domain of Ago-2.
ECEPE proteins: a novel family of eukaryotic cysteine proteinases |
|
| Trends Biochem Sci 29, 524-526 |
|
Using a variety of fold-recognition methods, a novel eukaryotic cysteine proteinase (ECEPE) family has been identified. This family encompasses sequences from an uncharacterized KOG4621, including the Arabidopsis thaliana guanylyl cyclase-related protein AtGC1. ECEPE proteins are predicted to possess the papain-like cysteine proteinase fold and are evolutionarily linked to C39 peptidases. The presence of the invariant Cys-His-Asp/Asn catalytic triad and the oxyanion-hole glutamine residue characteristic of papain-like cysteine proteases indicate that ECEPE proteins might function as proteases.
Raptor protein contains a caspase-like domain |
|
| Trends Biochem Sci 29, 522-524 |
|
Using state-of-the-art sequence analysis and structure-prediction methods a caspase-like domain in the N-terminal region of raptor proteins has been identified. This domain, which is characterized by the presence of invariant catalytic Cys-His dyad, is evolutionarily and structurally related to known caspases and might have protease activity. This finding suggests several unexpected aspects of raptor function in the target of rapamycin (TOR) signaling pathway.
BTLCP proteins: a novel family of bacterial transglutaminase-like cysteine proteinases |
|
| Trends Biochem Sci 29, 392-395 |
|
Using sequence similarity searches and top-of-the-range fold-recognition methods, we have identified a novel family of bacterial transglutaminase-like cysteine proteinases (BTLCPs) with an invariant Cys-His-Asp catalytic triad and a predicted N-terminal signal sequence. This family of previously uncharacterized hypothetical proteins encompasses sequences of unknown function from DUF920 (in the Pfam database) and COG3672. BTLCPs are predicted to possess the papain-like cysteine proteinase fold and catalyze post-translational protein modification through transamidase, acetylase or hydrolase activity. Inspection of neighboring genes encoding BTLCPs suggests a link between this predicted activity and a type-I secretion system resembling ATP-binding cassette exporters of toxins and proteases involved in bacterial pathogenicity.
DCC proteins: a novel family of thiol-disulfide oxidoreductases |
|
| Trends Biochem Sci 29, 339-342 |
|
|