Groups Coordinate Gene Sequencing

veral research teams around the world announcedGenomic Research (TIGR)], and M. Bento Soares
plans in the fall of 1996 for full-length cDNA (gene)(University of Iowa).
sequencing, investigators felt that the highlyA protocol that takes advantage of the unusual
beneficial infrastructure provided since 1994 bynucleotide "cap" on the 5' end of mRNAs requires
the international Integrated Molecular Analysis ofthat the first cDNA strand’s extension be
Genome Expression (I.M.A.G.E.) consortium [HGNlong enough to protect the cap as a contingency
6(6), 3] should be extended to the challenges offor final cDNA clone production. Soares reported,
complete cDNA sequencing. A subsequenthowever, that about one-third of cDNA
workshop for I.M.A.G.E. participants was held intranscripts begin within the mRNA, as contrasted
May 1997 in Gaithersburg, Maryland. The meetingwith preferred starts at the mRNA's 3' end, thus
was organized and chaired by Greg Lennon [thengiving rise to3' truncations. This problem can be
at Lawrence Livermore National Laboratoryalleviated substantially by size fractionating the
(LLNL) and now at Gene Logic Inc.] with MarvinmRNAs and later selecting out the cDNA products
Stodolsky coordinating for the meeting sponsor,with lengths equal to the size-sorted mRNA
the DOE Office of Biological and Environmentaltemplates. Hans Lehrach (Max Planck Institut
Research. Scientists attended from France,für Molekulare Genetik, Germany) related the
Germany, Italy, Japan, Sweden, the Unitedvalue of massively parallel oligomer fingerprinting
Kingdom, and the United States.of cDNAs. This is an economical way to screen a
Several workshop participants are members oflibrary for novel and longer, potentially full-length
the subgroup EURO-IMAGE, whose goals includecDNAs. Optimal candidate cDNAs chosen by the
generating and sequencing a master set of uniqueLehrach team at the Resource Center of the
full-length cDNA clones (based on I.M.A.G.E.German Genome Project are being sequenced in
consortium resources) representing 3000the laboratory of Annemarie Poustka (Deutsches
transcripts and 6 Mb of finished sequence. OtherKrebsforschungszentrum).
EURO-IMAGE goals are to obtain high-resolutionMore than one sequencing read commonly is
and comparative functional mapping in human andnecessary to display the complete sequence for
model organisms of 1000 master-set genes andcDNAs longer than a few hundred bases.
to develop the I.M.A.G.E. consortium database forStrategies for economical full-length sequencing
easy access to an integrated view of thewere discussed by Lennon and Richard Gibbs
sequence, map, and expression data generated.(Baylor College of Medicine). Sequence reads
U.S. funding agencies represented at thebeyond 1000 bases now are being obtained with
workshop included DOE, NIH, and the recentlyimprovements to sequencing systems by Wilhelm
established nonprofit Merck Genome ResearchAnsorge’s team at the European Molecular
Institute [HGN 8(3-4), 9]. Selected highlights followBiology Laboratory. Ansorge suggested that, for
of technical progress in complete cDNAcDNAs shorter than 2 kb, good coverage could
sequencing, as reported at the workshop.be achieved by two overlapping reads on
Highlights of Technical Progresscomplementary strands.
Attendees addressed a wide range of topics,Giuseppe Borsani (Telethon Institute of Genetics
including the status of cDNA sequencing projects,and Medicine) reported on the benefits of the
future targets, data- and clone-release policies,easily manipulated Drosophila model for studies of
quality criteria and assessment, and mouse anddevelopment and function to reveal roles
other model organism cDNAs. Speakers projectedrepresented by human cDNAs.
that, with adequate support from fundingMark Boguski (National Center for Biotechnology
agencies, participating laboratories could generateInformation) discussed the status of the dbEST
up to 15,000 full-length cDNA sequences in thecDNA sequence database and made
following year. With average cDNA lengths of 2recommendations for the evolution needed to
kb, this represents some 30 Mb of totalmeet the impending new demands of complete
sequence.DNA sequencing. He observed that each group will
Researchers have long recognized that expressionhave its own selection criteria and sequencing
of a single gene may culminate in the productionpriorities, such as finding cancer genes, genes with
of several different messenger RNA (mRNA)Drosophila homologs, or genes that already have
transcripts, depending both on the gene and thebeen mapped.
source tissue. Added to this biological complexityBoguski coined the expression "the slicing problem"
are the technical challenges of converting fragileto describe the difficulties in avoiding undesirable
mRNAs to the sturdier cDNAs. Standard methodsduplication and redundancy due to overlapping
involve use of poly dT as a primer on the 3' polychoice categories. A possible solution would be to
A end of purified mRNAs, with reverseestablish a registration and tracking database
transcriptase enzymes of viral origin polymerizingmodeled after the successful European
the synthesis of a single-stranded DNABioinformatics Institute's (EBI) RHAlloc-RHdb
complement of the mRNA. These initial DNAapproach used in constructing the human
transcripts often fail to extend to the 5' end oftranscript map. Patricia Rodriguez-Tomé (EBI)
longer mRNAs. With the use of more routinehas accepted this responsibility. This data will
biochemistries, the single-stranded DNA isinclude an investigator or center name and
converted into duplex DNA and combined with acontact information, identifiers for the physical
DNA vector to support its propagation andcDNA clones being sequenced and associated EST
maintenance as a DNA clone. The double-strandedaccession numbers, and sequencing status. When
DNAs produced are much more stable and lessparticipants registered a clone that they intended
susceptible to degradative processes than theirto sequence, the database would detect and
single-stranded mRNA predecessors. However,report overlaps with clones selected by other
because the initial reverse transcription is oftengroups.
shortened, cDNA libraries with abundant truncatedAttendees agreed that the I.M.A.G.E. consortium
products are the common result, particularly forshould convene every 6 months to maintain
the longer source mRNAs. Strategies devised fornecessary coordination and efficiency. A
alleviating this truncation problem were describedsubsequent meeting, organized by Quackenbush,
by Takao Isogai (Helix Research Institute, Japan),was held in September 1997 in conjunction with
Nobuo Nomura (Kazusa DNA Research Institute,the Ninth International Genome Sequencing and
Japan), John Quackenbush [The Institute forAnalysis Conference in Hilton Head, South Carolina.