A model for recognition of 3processing sites

As the hexamer AATAAA is the most conservative element of 3 '-processing sites it was considered as the main block in our complex recognition function. Although the hexamer is highly conserved, variants of this signal are observed. For example, in the training set 43 of 248 poly-A sites have hex-amer variants of AATAAA with one mismatch. To consider such variants the position weight matrix for recognizing this signal has been used. The other characteristics such as content statistics of...

Acknowledgments

The author thanks the editor Thomas Lengauer and Martin Stahl (Fa.Hoffmann La-Roche, Basel) for carefully reading the first version of this manuscript and making lots of valuable comments. This work was partially funded by the BMBF (Bundesministerium f r Bildung und Forschung) under grant 0311620 (Project Relimo). Software for automated molecular docking Software for automated molecular docking 1 J. M. Bianey and J. S. Dixon. A good ligand is hard to find Automated docking methods. Perspectives...

Electrostatic effects

Both shape complementarity and electrostatic effects are important in the recognition process in protein complex formation. Accordingly a treatment of electrostatics was introduced into the Fourier correlation approach. The charge-charge interaction is evaluated from point charges of the mobile molecule B interacting with the potential from static molecule A. This choice results in having to perform the potential calculation only once (for the static molecule) whilst the charge calculation (see...

Basic analysis of 2DE images

First, two-dimensional gels must be digitized. In most cases, this is achieved by the use of a laser densitometer, a CCD camera or a phosphor imager 32 . These pieces of equipment produce images of typically around 2000 x 2000 pixels or more, and a depth of 12 or 16 bits, thus providing a dynamic range of 4096, respectively 65 536 grey levels. Once loaded into the software, spot detection and quantitation is performed automatically, producing a repository of all protein spots contained in the...

Dot plots

Dot plots are probably the oldest way of comparing sequences 34 . A dot plot is a visual representation of the similarities between two sequenes. Each axis of a rectangular array represents one of the two sequences to be compared. A window length is fixed, together with a criterion when two sequence windows are deemed to be similar. Whenever one window in one sequence resembles another window in the other sequence, a dot or short diagonal is drawn at the corresponding position of the array....

Structure Properties and Computer Identification of Eukaryotic Genes

3.1 Structural characteristics of eukaryotic genes 59 3.2 Classification of splice sites in mammalian genomes 62 3.3 Methods for the recognition of functional signals 66 3.3.1 Search for nonrandom similarity with consensus sequences 66 3.3.2 Position-specific sensors 69 3.3.3 Content-specific measures 71 3.3.4 Frame-specific measures for recognition of protein coding regions 71 3.3.6 Application of linear discriminant analysis 73 3.3.7 Prediction of donor and acceptor splice junctions 74 3.3.8...

Internal exon recognition

For internal intron prediction we consider all open reading frames in a given sequence that flanked the AG (on the left) and GT (on the right) base pairs as potential internal exons. The structure of such exons is presented in Figure 3.9. The components of the recognition function for internal exon Different functional regions of the first (a), internal (b), last (c) and single exons corresponding to components of recognition functions. Different functional regions of the first (a), internal...

Introduction

Once the spots of interest in a 2-DE gel are selected, the next step is to identify the corresponding proteins in a database. This is another challenge for bioinformatics to design tools adapted to match experimental data with those in sequence databases. Even if the amino acid sequence of a protein can be predicted with a reasonable degree of confidence, post-translational protein modifications cannot always be predicted from the DNA sequence and their presence or absence can be of paramount...

Integrating and Accessing Molecular Biology Resources

Hansen and Thure Etzold Li Bioscience Ltd, Cambridge U.K. With biological research increasingly focusing on molecular biology and genetics more and more resources are becoming available for this research community to use. These resources include the data which is being collected and stored in databases and applications which operate on these data, producing yet more data. Databases range from simple sequence databases to complex metabolic pathway databases, as well as chemical...

R1nc R2 R3nh2 H

Based on the Ugi reaction, combinatorial libraries with four different R-groups can be created. After the reaction, all library molecules have the core shown on the right in common but differ in the four R-groups attached to it. molecules have the core shown on the right in common but differ in the four R-groups attached to it. the second phase is to improve the overall ranking of the solutions and to identify the correct placement. The placements are first energy-minimized using the CHARMm...

Introduction and principles

Today, genome sequencing projects are generating, with very high throughput capabilities, a huge amount of information in the form of nucleotide sequences. These sequences are being stored in specific databases. This information has to be analysed in order to complete its annotation. Annotation means description of the coding sequences, of the precursor elements, of the position of the DNA recognition sites, etc. In addition to this structural information, functional data has to be annotated,...

Analyzing Regulatory Regions in Genomes

General features of regulatory regions in eukaryotic genomes Regulatory regions share several common features despite their obvious divergence in sequence. Most of these common features are not evident directly from the nucleotide sequence but result from the restraints imposed by functional requirements. Therefore, understanding of the major components and events during the formation of regulatory DNA-protein complexes is crucial for the design and evaluation of algorithms for the analysis of...

Protein sequence databases

The most comprehensive source of protein information is found in protein sequence databases. These can be divided into universal databases, which store protein information from all types of biological sources, and specialised databases, which concentrate their efforts on restricted groups of protein families or organisms. Universal protein sequence databases can be categorised in databases that are simple repositories of sequence data, mostly translated from DNA sequences, and in annotated...

Docking of combinatorial libraries

The development of combinatorial chemistry and its application to drug design 18, 19 has led to new search problems in the context of molecular docking. An example of a combinatorial library is given in Figure 7.6. The number of molecules which can be synthesized on the combinatorial chemistry bases has increased dramatically compared to classical methods. Therefore, any screening methodology has to face many more molecules. Probably more important for the development of docking methods is the...

Definition of terms

Ab initio approaches try to perform predictions based on first principles and the input sequence alone. In their pure form, they do not use database-derived potentials, knowledge-based approaches or transfer of information from homologous proteins 35 . Recently, ab initio methods have been combined with fragment searching and threading to yield a new fragment assembly approach, which appears to be quite successful for small proteins 5, 36, 37 . In contrast to homology prediction and protein...

Info

Complexes used to generate the empirical scoring function. The protein data bank codes 1 of the complex are given. Complexes used to generate the empirical scoring function. The protein data bank codes 1 of the complex are given. by Jones et al. 14 . For a review, see Vajda et al. 15 . In addition, empirical functions have been used to evaluate protein-low molecular weight ligand complexes 16, 17, 18, 19 . Here we present the derivation and application of pair potentials specifically designed...

Short history of 2DE analysis by computer

In the early days of two-dimensional electrophoresis, researchers would visually examine and compare their 2-DE images in order to detect spots that might indicate differentially expressed proteins. However it became apparent quickly that only the development of specialised computer software would provide the necessary means to efficiently extract and analyse the huge amount of data contained in even the simplest of these images. In the early 1980s, a handful of academic groups started to...

Annotation of sequences from genome sequencing projects

The first task in analyzing these sequences is finding the genes. Knowledge of genes opens a new way of performing biological studies called 'functional genomics'. The other problem is to find out what all these new genes do, how they interact and are regulated 104 . Comparisons between genes of different genomes can provide additional insights into the details of the structure and function of genes. We cannot predict exactly all gene components due to the limitation of our knowledge of the...

Protein identification with sequence tags Tagldent

TagIdent 46, makes use of the high specificity of short amino acid sequence tags in molecularly well defined organisms with small proteomes and a low degree of post-translational modifications and, in particular, few N-terminally blocked proteins. The tool can match a short sequence tag of up to six amino acids, pI and Mw against all SWISS-PROT TrEMBL sequences from a species or taxonomic category. If more than one protein satisfies the user-specified tag and pI Mw ranges, TagIdent produces an...

Structural characteristics of eukaryotic genes

The gene is a fragment of nucleic sequence that carries the information representing a particular polypeptide or RNA molecule. In eukaryotes, genes lie in a linear array on chromosomes, which consist of a long molecule of duplex DNA and chromatin proteins mostly histones that form a structure called a nucleosome . The complex of DNA and proteins chromatin can maintain genes in an inactive state by restricting access to RNA polymerase and its accessory factors. To activate a gene, the chromatin...

Homology Modeling in Biology and Medicine

To understand basic biological processes such as cell division, cellular communication, metabolism, and organismal development and function, knowledge of the three-dimensional structure of the active components is crucial. Proteins form the key players in all of these processes, and study of their diverse and elegant designs is a mainstay of modern biology. The Protein Databank PDB of experimentally determined protein structures 1, 2 now contains some 18 000 entries, which can be grouped into...

Identification with amino acid composition AACompldent

The AACompldent tool 37, 45, http www.expasy.org tools aacomp compares the experimentally determined amino acid composition the numbers of each amino acid present in a molecule of an unknown protein with theoretical compositions of all proteins in SWISS-PROT TrEMBL. A score quantifying the difference between the compositions of query and database proteins is used to rank the candidate proteins. Estimated pI and Mw, as well as the name of an organism or a taxonomic range, and a keyword which...

Cenotyping technologies

In general, SNP discovery and SNP genotyping take place on separate technological platforms. This is because SNP discovery technologies are optimized to identify variation at any position in a given target sequence for a few individuals, while SNP genotyping technologies are optimized to detect variation at a single position in many individuals. While some technologies e.g. DHPLC and SSCP may be used for both discovery and genotyping, they tend to be sub-optimal in both throughput and...

Peptide mass fingerprinting Peptldent Smartldent Protein Prospector Mowse Mascot and PROWL

Increasing reproducibility of available separation techniques and sensitivity and affordability of mass spectrometers, as well as the desire and need to automate the identification process, have caused peptide mass fingerprinting and MS MS sequencing to gain importance and to become the method of choice for many proteomics laboratories. Several tools are available to assist users in the interpretation of mass spectrometry data. Pept-Ident on the ExPASy server follows the concept of the other...