Sonic Gem: November 2010

Monday, November 8, 2010

ASHG 2010, Post 3

Made it to the exome sequencing session this morning, and caught the end of the cancer sequencing project. Some interesting points

For pure tumor, 30x is fine, but for heterogeneous samples/samples with ploidy!=2, much more sequencing is needed.
150x for their exome samples was typical.
Exome sequencing is now their hypothesis generating step
With sequencing of a large number of carcinoma and multiple myloma samples, they found a number of mutations that would be unlikely to exist by chance, including about 10 that were not in their list of 6000 target cancer genes

In an iPS talk, they used a slightly more complicated exon capture:

Padlock method, designed >300,000 probes, 600,000 synthesized.
complemented with SureSelect
NS/Syn ratio: typically .82:1

Jay Shendure on exome sequencing

His group (with UW) has found 3 (2 novel) Mendelian
Total: about 10 found so far
Working on exome sequencing in Autism
In Autism

Simplifying assumptions

some % highly penetrant
some % de novo
more focused than looking for de novo CNV

Paradigm: trio-based exome sequencing
Very high SNR! by chance, expecting .59 mutations within coding regions--how to find this in the noise?
So far: 20 trios (60 exomes) -- data from sporadic autism
de novo SNV analysis pipeline: "Haystack"

look at bases called in all 3
ID Mendelian errors w/ discordant proband
Filter against >1000 other exomes (to eliminate false positive)
Annotate (SeattleSeq)
Manual review & Sanger confirmation

Single trio ex

268 candidate den novo (mendelian errors0
other exome screening -> 214
manual review -> 18
manual review -> 2 confirmed denovo events

Not significant so far--need more numbers--but good trend
Identified

GRIN2B

cause MR, epilepsy (Nature (genetics?) Nov 3)

FOXP1

related to FOXP2

SCN1A, LAMC3: following up on these

Parent of origin analysis

molecular haplotyping by long range PCR & sequencing
phase and determine parent of origin of each den novo point mutation
So far: 7 from father, 2 from mother

Model for potentially relevant for identifying large-effect non-coding mutations (when whole genome is cost-effective)
NHLBI Exome Sequencing Project (ESP)

Goal: 7000 exomes over 3 years

Private coding variants in 1000 exomes

several hundred per exome (combined Eurpoean American and African)

Number of genes consistent with a domninant model

dominant: roughly 100 +- 28 (1000 exomes filter), ~400 (1% allele freq)
recessive: 2 +- 2 (1000 exomes filter), 1% allele freq (~35)

Current/Future

50 nanogram exomes: fragmented with transposase (Andrew Adey)
Using Nimblegen EZ Exome (2nd generation works well)
Working toward multiplexing exomes on hiseq
Currently targeting only 8x/exome? (might have misread/misheard this)

Goncalo Abecasis on draft sequencing of 1000 Genomes in Sardinia

Day 4 (Friday, Nov 5)

Was under the weather, so missed the first morning session. Need to check out the cancer genomic session abstracts and info. I wandered around the vendors until lunch, and had some nice discussions with various people. I especially focused on analysis and annotation of coding variation from NGS data.

In the first afternoon session, I chose to sit in on the pop gen session. I found one talk on synonymous SNPs interesting. The talk suggested that, as suggested in recent literature, synonymous SNPs are not always "silent", and that the differences can be explained by selection on translation efficiency. The question I have is how often this really has an effect on disease phenotype. It has been shown to be relevant for at least a few diseases, but...

Thought: dN/dS ratio is commonly used to measure evolutionary conservation of a region. Can this be used in a similar way to the ts/tv ratio for evaluating SNP discovery?

Thursday, November 4, 2010

Genomic Software/Algorithms

Notes for myself on analysis software/algorithms to explore (from ASHG talks and poster sessions):

Aligner

Karma
Mosaik

SNP/INDEL Calling

GigaBayes SNP/short indel caller (Marth Lab, Boston College). Used in one of the 1000 Genome pipelines (NCBI?)
ATLAS-Indel2

Structural Variation

BreakSeq
Genome STRiP (2,3)

Variant Annotation

Marvel (Otto Valladares, UPenn)
VARp Drive (???)
MutaREPORTER

Sequence Viewer

Gambit Viewer

Sequence Assembly

SOAPdenovo
CORTEX assembler (1, 2, 3)

Imputation (see http://bioinformatics.oxfordjournals.org/content/25/11/1449.full)

Plink (http://pngu.mgh.harvard.edu/~purcell/plink/)
Merlin (http://www.sph.umich.edu/csg/abecasis/Merlin)
IMPUTE (http://www.stats.ox.ac.uk/∼marchini/software/gwas/impute.html)
MACH (http://www.sph.umich.edu/csg/abecasis/MaCH/)
BEAGLE (http://faculty.washington.edu/browning/beagle/beagle.html)
fastPhase (http://depts.washington.edu/ventures/UW_Technology/Express_Licenses/fastPHASE.php)

Pipeline/Analysis

NCBI genome workbench
CIDRSeq Suite (not publicly available)

For reference, we use the following currently

bfast
bwa
novoalign
samtools
picard
SVA (minimally)
GATK
SeattleSeq

ASHG 2010, Post 1

I've been at ASHG for a couple of days, and this is my first chance to take some notes.

Day 1 (Tuesday, Nov. 2)

I didn't quite make it in time to hear the Distinguished Speakers, but fortunately, many of the talks are supposed to be online after Nov. 25 (at http://www.ashg.org/2010meeting). I'm especially looking forward to hearing Eric Lander's speech.

Fortunately, I did make it in time for the mixer :-) and was able to meet up with some old friends (hi Sibel!).

Day 2 (Wednesday, Nov. 3)

I have three areas of interest that I want to explore, and unfortunately, they sometimes conflict in my schedule. The first is cancer sequencing--this is most related to my research. Haven't seen much of that yet, though there is a session on Friday. The second is sequencing for rare variant discovery, and the third is population genetics/genomics.

So, I was a little late for Carlos Bustamante's pop gen talk, but I still enjoyed what I saw. It was really a more detailed version of his earlier work on measuring genetic variation across geography, but it's still quite cool. Some of his students/postdocs gave talks in later sessions.

Wasn't able to see much else in the early morning, as I was still getting my bearings and spent some time catching up with people I ran into. This is a good thing.

For the second morning session, I went to a section on Lessons from high throughput sequencing. There were a number of cool 1000 Genomes talks, a talk by a student of Carlos Bustamante on genomic variation in the Americas, a talk on whole genome sequencing of a Japanese individual, and a couple of others.

The afternoon talks were quite good. For the presidential address, Roderick McInnes' talk on cultural sensitivity was actually better than my expectations (I really didn't know what to expect when I saw the title). One of the most interesting was the talk on Global patterns of RNA editing in humans, where the authors suggest novel evidence post-translational RNA editing. At a cursory glance, this will need to be validated (how much of what they found were really errors/artifacts?), but if true, could have broad implications on sequencing. Another talk discussed the use of imputation in implicating a variant of FBN1 in ascending aortic aneurysms, which was rather impressive when I first heard it, but has lost a little impact (on me) with the amount I've heard about imputation in this conference since then.

I wandered around the poster sessions after that, and found some interesting NGS-related software to explore (see the next post), and met Chunlin Xiao, who works on the NCBI sequencing pipeline. Need to contact him again soon.

In the evening, I attended the 1000 Genome Tutorial. I was feeling a bit out of it, but it was actually quite a good overview of the project. Interesting note: the structural variation subgroup claims to detect deletions as small as 50bases. This seems to suggest that the hard to detect deletions range from around <10 to 50 bases.

Day 3 (Wednesday, Nov. 4)

This morning, I struggled between going to a session on rare variant discovery, and on a separate session on population genomics. Personally, I'm more interested in pop gen, but the rare variant discovery is more related to my work... or so I thought. Unfortunately, for me, all of the talks discussed the use of pedigrees, and at the moment, none of my work is pedigree related. :-( I would have gotten more out of the pop gen talks. I did see most of the last one (on using HLA to map human variation), which seemed quite interesting.

For the second morning session, I attended the Statistical Analysis of Human Sequence Variation. This was more up my alley. I didn't get all of it, and not all of it was good, but it was mostly interesting. The talks were mostly on imputation, with a few on other talks on other SNP calling methods.

I attended the RainDance lunch presentation on targeted sequencing. They provided quite a nice lunch, and the presentations were somewhat interesting. RainDance might be an alternative to our current solution based capture methods--it seems to be very accurate and a lot faster--but it's unclear if it would be worth the cost.

I'm taking the afternoon off, and this is where I am now... so I'll stop here.

Monday, November 8, 2010

ASHG 2010, Post 3

ASHG 2010, Post 2

Thursday, November 4, 2010

Genomic Software/Algorithms

ASHG 2010, Post 1