jdnax.blogg.se

Rapid search engine
Rapid search engine












rapid search engine

Such functions often assume the importance of certain structural features, such as the presence of canonical SSEs. 21 Further, a number of ad hoc similarity scoring functions have been proposed as researchers have attempted to codify the intuition inherent in comparing structures manually. Heuristic methods have dominated because of computational demands of the problem. 20, 29 – 32 Search techniques used can be broken down into either heuristics-based approaches or those looking for provably optimal matches based on a given similarity score. Methods for identifying matches to smaller protein substructures have also been proposed. This is a challenging problem, because one needs to find good query–target alignments while simultaneously choosing the best subset and permutation of query residues upon which the alignment is based. 21 Most development has focused on searching for structural similarity on the level of whole proteins or domains, 22 – 28 with intended applications including function prediction and evolutionary inference. Numerous methods have been proposed under the general umbrella of the protein structure search problem. A particularly useful feature of ATFS is that it does not require sequence constraints, which enables the discovery of natural sequence–structure relationships, as we have shown previously 15, 20 and further demonstrate here. We believe ATFS is of high value for many applications in structural biology, and we carry out several computational experiments demonstrating this in this study. We refer to this as the atomistic tertiary fragment search (ATFS) problem, and propose an efficient method to solve it here. We find that a particularly useful flavor of this general problem is the identification of precise atom-for-atom matches to arbitrary constellations of disjoint backbone fragments. Given the increased use of protein substructure statistics, the problem of rapidly finding close matches to a structural motif is growing in significance. 13 – 15 As the Protein Data Bank (PDB) continues to grow, more ambitious uses of fragment-based data, incorporating both secondary and tertiary information, are being proposed for design and prediction. 7 – 9 Conformational sampling based on previously observed contiguous structural fragments has revolutionized both structure prediction 10 – 12 and protein design. The observed recurrence of compact folds in unrelated native proteins gave rise to various template-based structure prediction approaches. 6 Computational methods have taken advantage of this in a multitude of ways. Modularity is evident on the level of secondary structure, with reliable amino acid propensities emergent from structural databases, 1 assembly of secondary structural elements (SSEs), 2 – 5 and even conserved domains. The observed modularity of protein structure-that is, the frequent recurrence in nature of local structural patterns-has had a strong influence on methods of computational structural biology. Given the broad utility of protein tertiary fragment searches, we hope that providing MASTER in an open-source format will enable novel advances in understanding, predicting, and designing protein structure. We demonstrate its capacity to rapidly establish structure–sequence relationships, uncover the native designability landscapes of tertiary structural motifs, identify structural signatures of binding, and automatically rewire protein topologies.

rapid search engine

The ability to explore naturally plausible structural and sequence variations around a given motif has the potential to synthesize its design principles in an automated manner so we go on to illustrate the utility of MASTER to protein structural biology. We show that despite the potentially exponential time complexity of the problem, running times in practice are modest even for queries with many segments. Here, we propose a solution, dubbed MASTER, that is both rapid, enabling searches over the Protein Data Bank in a matter of seconds, and provably correct, finding all matches below a user-specified root-mean-square deviation cutoff. Although numerous protein structure search approaches have been proposed, methods that address this specific task without additional restrictions and on practical time scales are generally lacking. Finding backbone substructures from the Protein Data Bank that match an arbitrary query structural motif, composed of multiple disjoint segments, is a problem of growing relevance in structure prediction and protein design.














Rapid search engine