P lab

epPCR

The composition of an error-prone PCR library, estimated using Pedel.

GLUE, PEDEL and DRIVeR

In collaboration with Dr. Andrew Firth (University of Cambridge), we have investigated the statistics associated with constructing and sampling large, randomized protein-encoding libraries. Using fairly simple statistics, we have written algorithms for estimating the diversity in libraries generated by the most commonly-used randomization methods (see below). These are available through a user-friendly web interface.

The web server also has tools (CodonCalculator and AA-Calculator) for designing fully- or partly-randomized oligonucleotides.

The Programmes

Glue – for libraries comprising equiprobable variants (constructed using site saturation mutagenesis, synthetic shuffling, etc). Given the total number of possible variants, Glue can tell you what size library you should aim for to sample them. If you’ve already made the library, Glue will calculate what fraction of all possible variants are in it. We’ve recently extended Glue to allow protein-level diversity to be estimated; the new programme is called Glue Including Translation (Glue-It).

Programme for Estimating Diversity in Error-Prone PCR Libraries (Pedel) – does what it says on the label. Given a mutation rate and a library size, Pedel will calculate the size and composition of each sub-library (comprising sequences with x = 0,1,2,3,... mutations). For the most accurate results (using the “PCR distribution” described by Drummond et al. (2005), J. Mol. Biol. 350:806-816), one needs to note the amount of template DNA in the error-prone PCR, the number of cycles in the reaction, and the final yield (as discussed here).

A major upgrade now enables the user to estimate protein-level diversity in epPCR libraries; the new programme is called Pedel-aa. Input includes the parent sequence, overall muttion rate, library size and a nucleotide substitution matrix. Output includes amino acid completeness and diversity statistics, including the number of unique protein variants in your library.

Diversity Resulting from In Vitro Recombination (Driver) – if two (and only two, for now) parent sequences that differ at known positions are shuffled, Driver will predict how many of the shuffled daughter sequences are represented in your library.