Molecular Staging Molecular Staging is addressing this demand with a portfolio of products and services based on technologies that are transforming the detection and measurement of both proteins and nucleic acids.
  • Bioinformatics Glossary

    Annotation : A functional description of a clone, which may include identifying attributes such as locus name, keywords, and Medline references.

    BAC : Bacterial Artificial Chromosome; see cloning vector.

    BLAST  : The Basic Local Alignment Search Tool is a fast technique for detecting ungapped subsequences that match a given query sequence.

    BLIMPS : The BLocks IMProved Searcher is a search tool used to compare a DNA sequence against protein patterns in the Blocks database.

    Block 1 : The sequence editing portion of Incyte’s automated bioanalysis system. It takes raw sequences and masks or removes undesirable elements such as vector, nucleotide repeats, BAC contamination, and mitochondrial DNA. It then compares the sequences against the existing database and makes cluster assignments.

    Block 2 : The functional identification portion of Incyte’s automated bioanalysis system. It compares Incyte and WashU-Merck sequences against public gene-sequence and protein-pattern databases, looking for matches that indicate the identity and function of the expressed gene. If a match is found, the sequence inherits the appropriate annotations.


    database : A public database of protein patterns that correspond to the most highly conserved regions in proteins. It is used in Block 2 bioanalysis for comparison with Incyte and WashU-Merck clones that have no GenBank matches.

    Base Pair :  Two nitrogenous bases (adenine and thymine or guanine and cytosine) held together by weak bonds. Two strands of DNA are held together in the shape of a double-helix by the bonds between base pairs. The human genome contains an estimated 3 billion base pairs (bp). One million bp is often referred to as 1 Mb and one thousand as 1 kb.

    cDNA : Complementary DNA; synthesized from a mRNA template.

    CentiMorgans : The measurement between markers on a genetic map. Two markers are said to be 1 cM apart if they are separated by recombination 1% of the time, roughly equal to a distance of 1 million bp.

    Clone ID : The unique numerical identifier for each LifeSeq clone. A single Clone ID may have more than one associated Sequence ID.

    Clones  : A group of cells derived from a single ancestor.

    Cloning vector : A DNA molecule originating from a virus, plasmid, cosmid, phage, bacteria, or yeast into which a foreign DNA fragment is integrated and then introduce into host cells, where it can be reproduced in large quantities (cloned).

    Cluster : A group of clones related to one another by sequence homology. Each cluster has a unique Cluster ID number for a given stringency.

    Codon : A sequence of three DNA bases within a gene that codes for a single amino acid.

    Cosmid: Artificially constructed cloning vector containing the cos gene of phage lambda which is used to infect E. coli, permitting cloning of DNA fragments up to 45kb, larger than those possible using plasmid vectors.

    DNA: Deoxyribonucleic acid, the double-stranded molecule held together by weak bonds between base pairs of 4 different nucleotides. Encodes genetic information.

    EBI: European BioInformatics Institute (EMBL Outstation)

    EGCG: Extensions to the GCG package (See article by Rice et al.)

    EMBL: European Molecular Biology Laboratory

    EMBnet: European Molecular Biology network

    EPFL: Ecole Politechnique Federale de Lausanne, Switzerland

    EST: Expressed Sequence Tag; a sampling of sequence from a cDNA.

    ETH: Eidgenoessische Technische Hochschule Zuerich, Switzerland

    Entrez : An online resource provided by the National Center for Biotechnology Information (NCBI). It organizes GenBank sequences and links them to the literature sources in which they originally appeared. From LifeSeq, you can reach an Entrez Document Report by clicking on a GI number in the Hit ID column of a Results screen.

    Exons: The protein-coding sequences of genes. Exons only comprise about 10% of the human genome. See introns.

    FASTA: A database search tool used to compare a nucleotide or peptide sequence to a sequence database. The program is based on the rapid sequence algorithm described by Lipman and Pearson.

    Functional: genomics Systematic analysis of gene activity in healthy and diseased tissues.

    GCG Assembly: A tool using the GCG Fragment Assembly System created by Genetics Computer Group, Inc. It is used to assemble nucleotide sequence fragments contained in a cluster and view how they overlap with each other.

    GenBank: The public DNA sequence database maintained by the National Center for Biotechnology Information (NCBI), part of the National Library of Medicine.

    Gene : A specific DNA sequence which carries the information required for constructing proteins. The human genome is estimated to contain 100,000 to 150,000 genes.
    Genome The total genetic information posessed by an individual organism. Each cell contains a complete copy of the genome.

    Genomics Sequencing and characterization of the genome and analysis of the relationship between gene activity and cell function.
    Genotype The unique genetic makeup of an individual organism.

    GenBank Identifier, a unique number assigned to protein and nucleotide sequences in the GenBank database.

    HGMP : Human Genome Mapping Project, based in Cambridge/United Kingdom
    HPLC High-Performance Liquid Chromatography, a method for characterizing proteins.

    HSPs : High-scoring Segment Pairs; two sequence fragments of arbitrary but equal length with an alignment that is locally maximal and for which the alignment score meets or exceeds a threshold (cutoff) score.

    HSSP : Database of homology-derived structures of proteins (Sander et al.)

    ICGEB: International Center for Genetic Engineering and Biotechnology, Trieste

    IG : Intelligenetics Inc.; commercial vendor of the Intelligenetics Suite

    INCBI : Irish National Centre for BioInformatics

    Introns: DNA sequences in genes which have no protein-coding function. Other non-coding regions include control sequences and intergenic regions whose functions are uknown.

    Library : A collection of expressed genes from a specific tissue sample, and their annotations.

    A “super cluster” that is formed by joining clusters and singletons that have representative clones with significant matches (a Product Score of 40 or more) to the same gene. The master cluster is named after the cluster (or singleton) with the highest Product Score.

    MPsrch : Sequence searching application on the MasPar computer (Coulson et al.)

    mRNA : Messenger RNA; an expressed gene that is then translated into a protein.

    NCBI : National Center for Biotechnology Information, Washington, D.C., USA

    Normalized :  library

    A cDNA library from which most of the highly expressed sequences have been removed in order to represent a greater proportion of low-abundance messenger RNAs. Normalized libraries are not an accurate reflection of a tissue’s gene-expression profile.

    Northern : A function of the LifeSeq Expression Database, this analysis reveals the presence and quantity of an expressed gene among the LifeSeq libraries. It is based on a laboratory experiment called a Northern Blot.
    Pharmacology The science of the action of drugs and other chemicals on living biological systems. It seeks to discover the molecular mechanisms which regulate cells in order to understand how drugs act and to develop new drugs for the treatment of diseases.

    Phrap : Developed by Phil Green at the University of Washington, “PHil’s Revised Assembly Program” is a tool for assembling shotgun-sequenced DNA fragments.

    PHYLIP : Program Package created by J.Felsenstein for Phylogenicity

    PIR : Protein Identifiaction Resource International, a protein database vendor

    Proteome The complete profile of proteins expressed in a given tissue, cell or biological system at a given time.
    Proteomics Systematic analysis of the protein expression of healthy and diseased tissues.

    PSB : Abbreviation for ‘Pacific Symposium on Biocomputing’

    PubEST : Abbreviation for a sequence from a public-domain source, such as the WashU-Merck EST Project or Banting Institute.

    RASMOL : Program package by R.Sayle to display protein structures

    Singleton: A sequence that has no nucleotide overlaps with other Incyte clones at a given stringency and is therefore not contained in any cluster.

    SRS : Program for biological database browsing created by T.Etzold(EMBL)

    SRSWWW : Version of the Sequence Retrieval System running in the WWW environment

    Stringency :  A parameter that lets you filter the results of a LifeSeq query based on how closely related the sequences in a cluster must be.

    Toxicology The science of the harmful effects of chemicals (including drugs) on living biological systems. It seeks to determine the mechanisms by which chemicals produce adverse effects in cells and organisms.

    TREMBL : Translated EMBL, a SRS-based compilation of the EMBL DNA data library

    UniGene : A public database, maintained by NCBI, which brings together sets of GenBank sequences that represent the transcription products of distinct genes. LifeSeq uses this information in forming master clusters.

    Unique clone : A LifeSeq sequence that has no match in GenBank or other public databases.

    A LifeSeq clone that does not cluster in LifeSeq and has no match in the public databases.

    WPI : Wisconsin Package Interface, graphical user interface to the GCG package

    Published on September 12, 2012 · Filed under: Research Reagents; Tagged as: , , ,
    Comments Off on Bioinformatics Glossary