ochoa lab - relatedness and differentiation in arbitrary ......2018/01/22  · jew_ashkenazi cypriot...

61
Relatedness and differentiation in arbitrary population structures Alejandro Ochoa, John D. Storey Lab, Princeton University DrAlexOchoa viiia.org/research/ [email protected] 1 / 35

Upload: others

Post on 31-Mar-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●● ●

0.2

0.3

0.4

Inbr

eedi

ng c

oeff.

Relatedness and differentiationin arbitrary population structures

Alejandro Ochoa, John D. Storey Lab, Princeton University7 DrAlexOchoa � viiia.org/research/ R [email protected]

1 / 35

Page 2: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

My research areas / Contributions1. Stratified False Discovery Rate (FDR)

I Finding: per-stratum local FDR maximizes power controlling overall FDRI Improved power in protein domain predictionI Identified protein classes with problematic statistics

2. Genetic and other factors controlling the placebo responseI Collaboration with Otsuka PharmaceuticalI Mixed-effects modeling of longitudinal response in drug trials for

Schizophrenia, Bipolar Disorder, and Major Depressive DisorderI Genome-wide association study for individual-specific placebo response

3. Kinship and FST for arbitrary population structuresI Motivation: world-wide human population structureI Generalized definitions and modelsI Novel bias calculations validated by simulationsI Novel non-parametric estimator with greatly improved accuracy

2 / 35

Page 3: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

My research areas / Contributions1. Stratified False Discovery Rate (FDR)

I Finding: per-stratum local FDR maximizes power controlling overall FDRI Improved power in protein domain predictionI Identified protein classes with problematic statistics

2. Genetic and other factors controlling the placebo responseI Collaboration with Otsuka PharmaceuticalI Mixed-effects modeling of longitudinal response in drug trials for

Schizophrenia, Bipolar Disorder, and Major Depressive DisorderI Genome-wide association study for individual-specific placebo response

3. Kinship and FST for arbitrary population structuresI Motivation: world-wide human population structureI Generalized definitions and modelsI Novel bias calculations validated by simulationsI Novel non-parametric estimator with greatly improved accuracy

2 / 35

Page 4: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

My research areas / Contributions1. Stratified False Discovery Rate (FDR)

I Finding: per-stratum local FDR maximizes power controlling overall FDRI Improved power in protein domain predictionI Identified protein classes with problematic statistics

2. Genetic and other factors controlling the placebo responseI Collaboration with Otsuka PharmaceuticalI Mixed-effects modeling of longitudinal response in drug trials for

Schizophrenia, Bipolar Disorder, and Major Depressive DisorderI Genome-wide association study for individual-specific placebo response

3. Kinship and FST for arbitrary population structuresI Motivation: world-wide human population structureI Generalized definitions and modelsI Novel bias calculations validated by simulationsI Novel non-parametric estimator with greatly improved accuracy

2 / 35

Page 5: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Why study relatedness?

Human geneticsis fascinating!

Pop. structureconfoundsassociationstudies (GWAS)

Heritability ofcomplex traits

Animal and plantbreeding

3 / 35

Page 6: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Why study relatedness?

Human geneticsis fascinating!

Pop. structureconfoundsassociationstudies (GWAS)

Heritability ofcomplex traits

Animal and plantbreeding

3 / 35

Page 7: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Why study relatedness?

Human geneticsis fascinating!

Pop. structureconfoundsassociationstudies (GWAS)

Heritability ofcomplex traits

Animal and plantbreeding

3 / 35

Page 8: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Why study relatedness?

Human geneticsis fascinating!

Pop. structureconfoundsassociationstudies (GWAS)

Heritability ofcomplex traits

Animal and plantbreeding

3 / 35

Page 9: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Single Nucleotide Polymorphism (SNP) data

Genotype xijCC 0CT 1TT 2

Individuals

Loci

0 2 2 1 1 0 10 2 1 0 12 ...

X

4 / 35

Page 10: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Single Nucleotide Polymorphism (SNP) data

Genotype xijCC 0CT 1TT 2

Individuals

Loci

0 2 2 1 1 0 10 2 1 0 12 ...

X

4 / 35

Page 11: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Single Nucleotide Polymorphism (SNP) data

Genotype xijCC 0CT 1TT 2

Individuals

Loci

0 2 2 1 1 0 10 2 1 0 12 ...

X4 / 35

Page 12: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Hardy-Weinberg Equillibrium (HWE): Binomial draws

xij = genotype at locus i for individual j .

pTi = frequency of reference allele at locus i , (ancestral) population T .

Under HWE:

Pr(xij = 2|pTi ) =(pTi)2,

Pr(xij = 1|pTi ) = 2pTi(1− pTi

),

Pr(xij = 0|pTi ) =(1− pTi

)2.

HWE not valid under population structure!

5 / 35

Page 13: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Hardy-Weinberg Equillibrium (HWE): Binomial draws

xij = genotype at locus i for individual j .

pTi = frequency of reference allele at locus i , (ancestral) population T .

Under HWE:

Pr(xij = 2|pTi ) =(pTi)2,

Pr(xij = 1|pTi ) = 2pTi(1− pTi

),

Pr(xij = 0|pTi ) =(1− pTi

)2.

HWE not valid under population structure!

5 / 35

Page 14: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Hardy-Weinberg Equillibrium (HWE): Binomial draws

xij = genotype at locus i for individual j .

pTi = frequency of reference allele at locus i , (ancestral) population T .

Under HWE:

Pr(xij = 2|pTi ) =(pTi)2,

Pr(xij = 1|pTi ) = 2pTi(1− pTi

),

Pr(xij = 0|pTi ) =(1− pTi

)2.

HWE not valid under population structure!

5 / 35

Page 15: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Goal: measure dependence structure of genotype matrix columns

IndividualsLo

ci0 2 2 1 1 0 10 2 1 0 12 ...

X

High-dimensional binomial data

Population structure⇒ dependence between individuals (columns)

Linkage disequilibrium⇒ dependence between loci (rows)

6 / 35

Page 16: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Goal: measure dependence structure of genotype matrix columns

IndividualsLo

ci0 2 2 1 1 0 10 2 1 0 12 ...

X

High-dimensional binomial data

Population structure⇒ dependence between individuals (columns)

Linkage disequilibrium⇒ dependence between loci (rows)

6 / 35

Page 17: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Goal: measure dependence structure of genotype matrix columns

IndividualsLo

ci0 2 2 1 1 0 10 2 1 0 12 ...

X

High-dimensional binomial data

Population structure⇒ dependence between individuals (columns)

Linkage disequilibrium⇒ dependence between loci (rows)

6 / 35

Page 18: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Model parametersIBD(T ): “Identical By Descent” for ancestral population T — shared coin flips

f Tj : Inbreeding coefficientPr. that the two alleles at a random locus of individual j are IBD(T )

Var(xij |T ) = 2pTi(1− pTi

)(1 + f Tj )

ϕTjk : Kinship coefficient

Pr. that two alleles, one at random from each of individuals j and k , at onerandom locus are IBD(T )

Cov(xij , xik |T ) = 4pTi(1− pTi

)ϕTjk

FST: Fixation indexPr. that two random alleles in a subpopulation at a random locus are IBD(T )

7 / 35

Page 19: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Model parametersIBD(T ): “Identical By Descent” for ancestral population T — shared coin flips

f Tj : Inbreeding coefficientPr. that the two alleles at a random locus of individual j are IBD(T )

Var(xij |T ) = 2pTi(1− pTi

)(1 + f Tj )

ϕTjk : Kinship coefficient

Pr. that two alleles, one at random from each of individuals j and k , at onerandom locus are IBD(T )

Cov(xij , xik |T ) = 4pTi(1− pTi

)ϕTjk

FST: Fixation indexPr. that two random alleles in a subpopulation at a random locus are IBD(T )

7 / 35

Page 20: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Model parametersIBD(T ): “Identical By Descent” for ancestral population T — shared coin flips

f Tj : Inbreeding coefficientPr. that the two alleles at a random locus of individual j are IBD(T )

Var(xij |T ) = 2pTi(1− pTi

)(1 + f Tj )

ϕTjk : Kinship coefficient

Pr. that two alleles, one at random from each of individuals j and k , at onerandom locus are IBD(T )

Cov(xij , xik |T ) = 4pTi(1− pTi

)ϕTjk

FST: Fixation indexPr. that two random alleles in a subpopulation at a random locus are IBD(T )

7 / 35

Page 21: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Model parametersIBD(T ): “Identical By Descent” for ancestral population T — shared coin flips

f Tj : Inbreeding coefficientPr. that the two alleles at a random locus of individual j are IBD(T )

Var(xij |T ) = 2pTi(1− pTi

)(1 + f Tj )

ϕTjk : Kinship coefficient

Pr. that two alleles, one at random from each of individuals j and k , at onerandom locus are IBD(T )

Cov(xij , xik |T ) = 4pTi(1− pTi

)ϕTjk

FST: Fixation indexPr. that two random alleles in a subpopulation at a random locus are IBD(T )

7 / 35

Page 22: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Existing approaches

1. FST estimationI For independent subpopulations only!I Weir-Cockerham (WC) estimator (1984) — 15K citations!I “Hudson” pairwise estimator (2013) tweaks WCI BayeScan (2008) — 1.2K citations

2. Kinship estimationI “Standard” kinship estimator (1950s)

I Used by most modern GWAS approaches that control for population structure(PCA, LMM, adj. χ2; top paper 6K citations)

I GCTA heritability estimation (2 papers: 4K citations)I Our novel finding: accuracy requires unstructured population

(a minority of closely-related individuals)

8 / 35

Page 23: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Existing approaches

1. FST estimationI For independent subpopulations only!I Weir-Cockerham (WC) estimator (1984) — 15K citations!I “Hudson” pairwise estimator (2013) tweaks WCI BayeScan (2008) — 1.2K citations

2. Kinship estimationI “Standard” kinship estimator (1950s)

I Used by most modern GWAS approaches that control for population structure(PCA, LMM, adj. χ2; top paper 6K citations)

I GCTA heritability estimation (2 papers: 4K citations)I Our novel finding: accuracy requires unstructured population

(a minority of closely-related individuals)

8 / 35

Page 24: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Dataset: Human Origins (Lazaridis et al. 2014, 2016)

●●

●●●

●●●

●●●

●●

●● ●

● ●

● ●●

●●

●●

●●● ●

●●●

●●●

●●

●●

●●●●

● ●●

●●●

●●●

●●

●●

●●

●●

●● ●

●●●●

●●●

●●

●●●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●●

● ●

● ●

●●

●●

●●

●●

●● ●●

●●

●●

●●

Ju_hoan_North

Khomani

Mbuti

Biaka

BantuSA

BantuKenya

Luhya

YorubaEsan

MandenkaGambian

Mende

Luo

Hadza

AA

KikuyuMasai

Datog

Somali

Jew_Ethiopian

Saharawi

MoroccanMozabite

Algerian Tunisian

Libyan Egyptian

Jew_Libyan

Jew_TunisianJew_Moroccan

Jew_Yemenite

Yemeni

Saudi

BedouinBBedouinA

PalestinianJordanian

SyrianLebanese_Muslim

Lebanese_Christian

Jew_Turkish

Assyrian

Druze

Lebanese

Jew_iraqi

Iranian_Bandari

Iranian

Jew_Iranian

TurkishItalian_SouthSicilian

Maltese

SpanishSW

Canary_Islander

SpanishNE

FrenchS

Italian_North

Sardinian

Basque

FrenchN

Greek

BulgarianCroatian

English

Scottish

Orcadian

Icelandic

Hungarian

Norwegian

Czech

Belarusian

Estonian

Lithuanian

Ukrainian

RussianFinnish

MordovianJew_Ashkenazi

Cypriot

Romanian

Albanian North_Ossetian

Jew_GeorgianLezgin

KumykBalkar

Tajik

Nogai

Chuvash

Turkmen

ChechenAbkhasian

Georgian

Adygei

Armenian

BalochiBrahui

Makrani

Kalash

Pathan

Sindhi

Burusho

Punjabi

Hazara

Jew_Cochin

Gujarati Bengali

Uzbek

Mansi

Kusunda

Uygur

Altaian

Dolgan

Even

Kalmyk

Kyrgyz

Selkup

Tubalar Tuvinian

ThaiCambodian

Ami

Atayal

Dai

Han

Japanese

Kinh

Korean

Lahu

Miao

Mongola

Naxi She

Tujia

Xibo

Yi

Daur Hezhen

Tu

OroqenUlchi

Yakut

Nganasan

Yukagir

ItelmenKoryak

Chukchi

Eskimo

AleutAleut_Tlingit

Pima

Mixe

Mixtec

Zapotec

Mayan

Piapoco

Surui

Karitiana

Bolivian

Quechua

Australian

Nasoi

Papuan

Subpopulations

SAfricaNAfricaMiddleEastEuropeCaucasusSAsiaNAsiaEAsiaAmrOc

2,066 indivs. from 163 locs. — 595,911 loci — SNP chip

9 / 35

Page 25: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Ju_h

oan_

Nor

thK

hom

ani

Mbu

tiB

iaka

Ban

tuS

AB

antu

Ken

yaLu

hya

Yoru

baE

san

Man

denk

aG

ambi

anM

ende

Luo

Had

za AA

Kik

uyu

Mas

aiD

atog

Som

ali

Jew

_Eth

iopi

anS

ahar

awi

Mor

occa

nM

ozab

iteA

lger

ian

Tuni

sian

Liby

anE

gypt

ian

Jew

_Lib

yan

Jew

_Tun

isia

nJe

w_M

oroc

can

Jew

_Yem

enite

Yem

eni

Sau

diB

edou

inB

Bed

ouin

AP

ales

tinia

nJo

rdan

ian

Syr

ian

Leba

nese

_Mus

limLe

bane

se_C

hris

tian

Jew

_Tur

kish

Ass

yria

nD

ruze

Leba

nese

Jew

_ira

qiIr

ania

n_B

anda

riIr

ania

nJe

w_I

rani

anTu

rkis

hM

alte

seC

anar

y_Is

land

erIta

lian_

Sou

thS

icili

anF

renc

hNS

pani

shS

WS

pani

shN

EF

renc

hSIta

lian_

Nor

thS

ardi

nian

Bas

que

Sco

ttish

Orc

adia

nE

nglis

hIc

elan

dic

Nor

weg

ian

Cze

chC

roat

ian

Hun

garia

nA

lban

ian

Gre

ekB

ulga

rian

Lith

uani

anB

elar

usia

nU

krai

nian

Est

onia

nR

ussi

anF

inni

shM

ordo

vian

Rom

ania

nJe

w_A

shke

nazi

Cyp

riot

Arm

enia

nJe

w_G

eorg

ian

Geo

rgia

nA

bkha

sian

Ady

gei

Nor

th_O

sset

ian

Che

chen

Lezg

inK

umyk

Bal

kar

Nog

aiTa

jikK

alas

hP

atha

nB

aloc

hiB

rahu

iM

akra

niS

indh

iB

urus

hoG

ujar

ati

Pun

jabi

Ben

gali

Jew

_Coc

hin

Turk

men

Uzb

ekH

azar

aU

ygur

Chu

vash

Man

siS

elku

pTu

bala

rK

yrgy

zA

ltaia

nTu

vini

anN

gana

san

Dol

gan

Yaku

tX

ibo

Hez

hen

Oro

qen

Ulc

hiE

ven

Yuka

gir

Itelm

enK

orya

kC

hukc

hiE

skim

oA

leut

Ale

ut_T

lingi

tK

usun

daC

ambo

dian

Tha

iA

mi

Ata

yal

Kin

hD

aiLa

huN

axi

Yi

Tujia

Han

Mia

oS

he TuM

ongo

laK

alm

ykD

aur

Japa

nese

Kor

ean

Pim

aM

ixe

Mix

tec

Zap

otec

May

anP

iapo

coS

urui

Kar

itian

aB

oliv

ian

Que

chua

Aus

tral

ian

Nas

oiP

apua

n

Ju_hoan_NorthKhomani

MbutiBiaka

BantuSABantuKenya

LuhyaYoruba

EsanMandenka

GambianMende

LuoHadza

AAKikuyuMasaiDatog

SomaliJew_Ethiopian

SaharawiMoroccanMozabiteAlgerianTunisian

LibyanEgyptian

Jew_LibyanJew_Tunisian

Jew_MoroccanJew_Yemenite

YemeniSaudi

BedouinBBedouinA

PalestinianJordanian

SyrianLebanese_Muslim

Lebanese_ChristianJew_Turkish

AssyrianDruze

LebaneseJew_iraqi

Iranian_BandariIranian

Jew_IranianTurkishMaltese

Canary_IslanderItalian_South

SicilianFrenchN

SpanishSWSpanishNE

FrenchSItalian_North

SardinianBasque

ScottishOrcadian

EnglishIcelandic

NorwegianCzech

CroatianHungarian

AlbanianGreek

BulgarianLithuanianBelarusian

UkrainianEstonianRussianFinnish

MordovianRomanian

Jew_AshkenaziCypriot

ArmenianJew_Georgian

GeorgianAbkhasian

AdygeiNorth_Ossetian

ChechenLezginKumykBalkarNogai

TajikKalashPathan

BalochiBrahui

MakraniSindhi

BurushoGujaratiPunjabiBengali

Jew_CochinTurkmen

UzbekHazaraUygur

ChuvashMansi

SelkupTubalarKyrgyzAltaian

TuvinianNganasan

DolganYakutXibo

HezhenOroqen

UlchiEven

YukagirItelmenKoryak

ChukchiEskimo

AleutAleut_Tlingit

KusundaCambodian

ThaiAmi

AtayalKinh

DaiLahuNaxi

YiTujiaHan

MiaoShe

TuMongola

KalmykDaur

JapaneseKorean

PimaMixe

MixtecZapotec

MayanPiapoco

SuruiKaritianaBolivian

QuechuaAustralian

NasoiPapuan

SAfrica NAfrica MiddleEast Europe Caucasus SAsia NAsia EAsia Amr Oc

SA

fric

aN

Afr

ica

Mid

dleE

ast

Eur

ope

Cau

casu

sS

Asi

aN

Asi

aE

Asi

aA

mr

Oc

00.

10.

20.

3

Kin

ship

Indi

vidu

als

Our new kinshipestimatesGenotypes from “Human Origins”(Lazaridis et al. 2014, 2016)

Edited from Ephert [CC BY-SA 3.0], viaWikimedia Commons

*Inbreeding coeffs. on diagonal

10 / 35

Page 26: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Ju_h

oan_

Nor

thK

hom

ani

Mbu

tiB

iaka

Ban

tuS

AB

antu

Ken

yaLu

hya

Yoru

baE

san

Man

denk

aG

ambi

anM

ende

Luo

Had

za AA

Kik

uyu

Mas

aiD

atog

Som

ali

Jew

_Eth

iopi

anS

ahar

awi

Mor

occa

nM

ozab

iteA

lger

ian

Tuni

sian

Liby

anE

gypt

ian

Jew

_Lib

yan

Jew

_Tun

isia

nJe

w_M

oroc

can

Jew

_Yem

enite

Yem

eni

Sau

diB

edou

inB

Bed

ouin

AP

ales

tinia

nJo

rdan

ian

Syr

ian

Leba

nese

_Mus

limLe

bane

se_C

hris

tian

Jew

_Tur

kish

Ass

yria

nD

ruze

Leba

nese

Jew

_ira

qiIr

ania

n_B

anda

riIr

ania

nJe

w_I

rani

anTu

rkis

hM

alte

seC

anar

y_Is

land

erIta

lian_

Sou

thS

icili

anF

renc

hNS

pani

shS

WS

pani

shN

EF

renc

hSIta

lian_

Nor

thS

ardi

nian

Bas

que

Sco

ttish

Orc

adia

nE

nglis

hIc

elan

dic

Nor

weg

ian

Cze

chC

roat

ian

Hun

garia

nA

lban

ian

Gre

ekB

ulga

rian

Lith

uani

anB

elar

usia

nU

krai

nian

Est

onia

nR

ussi

anF

inni

shM

ordo

vian

Rom

ania

nJe

w_A

shke

nazi

Cyp

riot

Arm

enia

nJe

w_G

eorg

ian

Geo

rgia

nA

bkha

sian

Ady

gei

Nor

th_O

sset

ian

Che

chen

Lezg

inK

umyk

Bal

kar

Nog

aiTa

jikK

alas

hP

atha

nB

aloc

hiB

rahu

iM

akra

niS

indh

iB

urus

hoG

ujar

ati

Pun

jabi

Ben

gali

Jew

_Coc

hin

Turk

men

Uzb

ekH

azar

aU

ygur

Chu

vash

Man

siS

elku

pTu

bala

rK

yrgy

zA

ltaia

nTu

vini

anN

gana

san

Dol

gan

Yaku

tX

ibo

Hez

hen

Oro

qen

Ulc

hiE

ven

Yuka

gir

Itelm

enK

orya

kC

hukc

hiE

skim

oA

leut

Ale

ut_T

lingi

tK

usun

daC

ambo

dian

Tha

iA

mi

Ata

yal

Kin

hD

aiLa

huN

axi

Yi

Tujia

Han

Mia

oS

he TuM

ongo

laK

alm

ykD

aur

Japa

nese

Kor

ean

Pim

aM

ixe

Mix

tec

Zap

otec

May

anP

iapo

coS

urui

Kar

itian

aB

oliv

ian

Que

chua

Aus

tral

ian

Nas

oiP

apua

n

Ju_hoan_NorthKhomani

MbutiBiaka

BantuSABantuKenya

LuhyaYoruba

EsanMandenka

GambianMende

LuoHadza

AAKikuyuMasaiDatog

SomaliJew_Ethiopian

SaharawiMoroccanMozabiteAlgerianTunisian

LibyanEgyptian

Jew_LibyanJew_Tunisian

Jew_MoroccanJew_Yemenite

YemeniSaudi

BedouinBBedouinA

PalestinianJordanian

SyrianLebanese_Muslim

Lebanese_ChristianJew_Turkish

AssyrianDruze

LebaneseJew_iraqi

Iranian_BandariIranian

Jew_IranianTurkishMaltese

Canary_IslanderItalian_South

SicilianFrenchN

SpanishSWSpanishNE

FrenchSItalian_North

SardinianBasque

ScottishOrcadian

EnglishIcelandic

NorwegianCzech

CroatianHungarian

AlbanianGreek

BulgarianLithuanianBelarusian

UkrainianEstonianRussianFinnish

MordovianRomanian

Jew_AshkenaziCypriot

ArmenianJew_Georgian

GeorgianAbkhasian

AdygeiNorth_Ossetian

ChechenLezginKumykBalkarNogai

TajikKalashPathan

BalochiBrahui

MakraniSindhi

BurushoGujaratiPunjabiBengali

Jew_CochinTurkmen

UzbekHazaraUygur

ChuvashMansi

SelkupTubalarKyrgyzAltaian

TuvinianNganasan

DolganYakutXibo

HezhenOroqen

UlchiEven

YukagirItelmenKoryak

ChukchiEskimo

AleutAleut_Tlingit

KusundaCambodian

ThaiAmi

AtayalKinh

DaiLahuNaxi

YiTujiaHan

MiaoShe

TuMongola

KalmykDaur

JapaneseKorean

PimaMixe

MixtecZapotec

MayanPiapoco

SuruiKaritianaBolivian

QuechuaAustralian

NasoiPapuan

SAfrica NAfrica MiddleEast Europe Caucasus SAsia NAsia EAsia Amr Oc

SA

fric

aN

Afr

ica

Mid

dleE

ast

Eur

ope

Cau

casu

sS

Asi

aN

Asi

aE

Asi

aA

mr

Oc

−0.0

50

0.05

0.1

0.15

Kin

ship

Indi

vidu

als

Standard kinshipestimatesGenotypes from “Human Origins”(Lazaridis et al. 2014, 2016)

Edited from Ephert [CC BY-SA 3.0], viaWikimedia Commons

11 / 35

Page 27: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Only our new estimator is accurate in simulations

True KinshipA New estimateB Standard est.C

00.

10.

2

Kin

ship

Indi

vidu

als

12 / 35

Page 28: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Population-level inbreeding increases with distance from Africa

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●● ●

0.2

0.3

0.4

Inbr

eedi

ng c

oeff.

13 / 35

Page 29: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Differentiation (FST) previously underestimated

0.0 0.1 0.2 0.3 0.4 0.5

020

40

Inbreeding Coefficient

Den

sity

Subpopulations

SAfricaNAfricaMiddleEastEuropeCaucasusSAsiaNAsiaEAsiaAmrOc

FST estimates

WCBayeScanHudsonKNew

14 / 35

Page 30: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Only our new method estimates generalized FST accurately

0.00

0.05

0.10

Ind. Subpops.A

Wei

r−C

ocke

rham

Hud

sonK

Bay

eSca

n

Sta

ndar

dK

insh

ip

New

− − −−

− AdmixtureB

Wei

r−C

ocke

rham

Hud

sonK

Bay

eSca

n

Sta

ndar

dK

insh

ip

New

− −−

True FST

FSTindep limit

Estimates−

FS

T e

stim

ate

FST estimator

15 / 35

Page 31: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Recently-admixed populations

African-Americans

Baharian et al. (2016)

Hispanics

Moreno-Estrada et al. (2013)16 / 35

Page 32: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Admixed siblings from different populations?

Lucy and Maria, UK

Ochoa brothers, MX

High Admixture LD:

Moreno-Estrada et al. (2013)

Solution: treat every individual as its own population!

17 / 35

Page 33: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Admixed siblings from different populations?

Lucy and Maria, UK

Ochoa brothers, MX

High Admixture LD:

Moreno-Estrada et al. (2013)

Solution: treat every individual as its own population!

17 / 35

Page 34: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Admixed siblings from different populations?

Lucy and Maria, UK

Ochoa brothers, MX

High Admixture LD:

Moreno-Estrada et al. (2013)

Solution: treat every individual as its own population!

17 / 35

Page 35: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Admixed siblings from different populations?

Lucy and Maria, UK

Ochoa brothers, MX

High Admixture LD:

Moreno-Estrada et al. (2013)

Solution: treat every individual as its own population!17 / 35

Page 36: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Dataset: 1000 Genomes Project (2013)

● ●

●●

●●

●●

ASW

ACB

ESN

GWD

LWK

MSLYRI

GBR

FIN

IBS

TSI

CEU

BEB

GIH

ITU

PJL

STU

CDX

CHB

JPT

KHV

CHS

CLM

MXL

PEL

PUR

Subpopulations

SAfricaEuropeSAsiaEAsiaAmr

2,504 indivs. from 26 locs. — 20,417,698 loci (asc. in YRI) — WGS trios, etc.

18 / 35

Page 37: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

00.

10.

20.

30.

4

Kin

ship

Indi

vidu

als

0.0

0.5

1.0

Individuals

Anc

estr

y fr

ac.

x

PU

RC

LMP

EL

MX

L

Pop

ulat

ion

x

AF

RA

MR

EU

R

Anc

estr

y

Kinship driven byadmixture in Hispanics

Our new kinship estimates

Genotypes from the 1000 Genomes Project (2013)

19 / 35

Page 38: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Comparison of population structures in simulation

T

S1

f TS1

S2

f TS2

...SK

f TSK

Indep. Subpops.T

S1 S2 ... SK

A1 A2 ... An

Admixture

00.

10.

2

Kin

ship

Indi

vidu

als

20 / 35

Page 39: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

FST in the independent subpopulation model

●●●

●●

0.4

0.6

Alle

le fr

eque

ncy

Illustration.

FST =Var(pSi∣∣T)

pTi(1− pTi

) .Here FST = proportion of variance explained by pop. structure

21 / 35

Page 40: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

FST in the independent subpopulation model

●●●

●●

0.4

0.6

Alle

le fr

eque

ncy

Illustration.

FST =Var(pSi∣∣T)

pTi(1− pTi

) .Here FST = proportion of variance explained by pop. structure

21 / 35

Page 41: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Wright’s FST

T = Total, S = Subpopulation, I = Individual.

Total inbreeding: FIT =1|S |∑j∈S

f Tj ,

Local inbreeding: FIS =1|S |∑j∈S

f Sj ,

Structural inbreeding: FST =FIT − FIS

1− FIS.

22 / 35

Page 42: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Our generalized FSTNeed new “local” subpopulations Lj (separates total from local inbreeding):(

1− f Tj)

=(1− f

Ljj

)(1− f TLj

).

Generalized FST: applicable to arbitrary population structures, equals previousdefinition for non-overlapping subpopulations:

FST =n∑

j=1

wj fTLj.

Mean heterozygosity in a structured population:

H̄i =1n

n∑j=1

Pr(xij = 1|T ) = 2pTi(1− pTi

)(1− FST) .

23 / 35

Page 43: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Our generalized FSTNeed new “local” subpopulations Lj (separates total from local inbreeding):(

1− f Tj)

=(1− f

Ljj

)(1− f TLj

).

Generalized FST: applicable to arbitrary population structures, equals previousdefinition for non-overlapping subpopulations:

FST =n∑

j=1

wj fTLj.

Mean heterozygosity in a structured population:

H̄i =1n

n∑j=1

Pr(xij = 1|T ) = 2pTi(1− pTi

)(1− FST) .

23 / 35

Page 44: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Our generalized FSTNeed new “local” subpopulations Lj (separates total from local inbreeding):(

1− f Tj)

=(1− f

Ljj

)(1− f TLj

).

Generalized FST: applicable to arbitrary population structures, equals previousdefinition for non-overlapping subpopulations:

FST =n∑

j=1

wj fTLj.

Mean heterozygosity in a structured population:

H̄i =1n

n∑j=1

Pr(xij = 1|T ) = 2pTi(1− pTi

)(1− FST) .

23 / 35

Page 45: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

FST measures population structure / differentiation

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●● ●

0.1

0.2

0.3

0.4

Alle

le fr

eque

ncy

Median diff. SNP in Human Origins (rs7626601; given MAF ≥ 10%).

F̂WCST ≈ 0.0712 using Weir-Cockerham estimator and K = 163.

24 / 35

Page 46: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

FST measures population structure / differentiation

●●

●●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●● ●

●● ●

0.1

0.2

0.3

0.4

Alle

le fr

eque

ncy

Median diff. SNP in Human Origins (rs7626601; given MAF ≥ 10%).

F̂WCST ≈ 0.0712 using Weir-Cockerham estimator and K = 163.

24 / 35

Page 47: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Comparison of population structures in simulation

T

S1

f TS1

S2

f TS2

...SK

f TSK

Indep. Subpops.T

S1 S2 ... SK

A1 A2 ... An

Admixture

00.

10.

2

Kin

ship

Indi

vidu

als

25 / 35

Page 48: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Our admixture simulation (R package ‘bnpsd’ on CRAN)Intermediate subpop. diff.

FS

T

01

A

0.0

0.2

Intermediate subpop. spread

dens

ity

B

Admixture proportions

ance

stry

01

C

Discrete subpop. approx.

ance

stry

01

D

position in 1D geography

26 / 35

Page 49: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Kinship model for genotypes

symbol meaningT ref ancestral populationi locus index

j , k individual indexespTi ref allele frequencyxij genotype (num ref alleles)ϕTjk kinship of j , k

f Tj inbreeding of j

Statistical model:

E[xij |T ] = 2pTi ,

Var(xij |T ) = 2pTi(1− pTi

)(1 + f Tj ),

Cov(xij , xik |T ) = 4pTi(1− pTi

)ϕTjk .

(Wright 1921, 1951; Malécot 1948; Jacquard 1970).

27 / 35

Page 50: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Problem: common estimators not consistent under structure

Estimate of ancestral allele frequency:

p̂Ti =12

n∑j=1

wjxij

Variance asymptotically non-zero under population structure:

Var(p̂Ti∣∣T) = pTi

(1− pTi

)ϕ̄T

Therefore, naive estimators that use p̂Ti (next) are not consistent!

28 / 35

Page 51: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Bias in standard kinship estimator

ϕ̂T ,stdjk =

m∑i=1

(xij − 2p̂Ti

) (xik − 2p̂Ti

)4

m∑i=1

p̂Ti(1− p̂Ti

) , p̂Ti =12

n∑j=1

wjxij .

Bias varies by j , k :

ϕ̂T ,stdjk

a.s.−−−→m→∞

ϕTjk − ϕ̄T

j − ϕ̄Tk + ϕ̄T

1− ϕ̄T.

True KinshipA New estimateB Standard estimateC Limit of standardD

00.

10.

2

Kin

ship

Indi

vidu

als

29 / 35

Page 52: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Bias in standard kinship estimator

ϕ̂T ,stdjk =

m∑i=1

(xij − 2p̂Ti

) (xik − 2p̂Ti

)4

m∑i=1

p̂Ti(1− p̂Ti

) , p̂Ti =12

n∑j=1

wjxij .

Bias varies by j , k :

ϕ̂T ,stdjk

a.s.−−−→m→∞

ϕTjk − ϕ̄T

j − ϕ̄Tk + ϕ̄T

1− ϕ̄T.

True KinshipA New estimateB Standard estimateC Limit of standardD

00.

10.

2

Kin

ship

Indi

vidu

als

29 / 35

Page 53: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Bias in standard kinship estimator

ϕ̂T ,stdjk =

m∑i=1

(xij − 2p̂Ti

) (xik − 2p̂Ti

)4

m∑i=1

p̂Ti(1− p̂Ti

) , p̂Ti =12

n∑j=1

wjxij .

Bias varies by j , k :

ϕ̂T ,stdjk

a.s.−−−→m→∞

ϕTjk − ϕ̄T

j − ϕ̄Tk + ϕ̄T

1− ϕ̄T.

True KinshipA New estimateB Standard estimateC Limit of standardD

00.

10.

2

Kin

ship

Indi

vidu

als

29 / 35

Page 54: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Our new estimator (R package ‘popkin’ on CRAN)

Step 1: “pre-adjusted” kinship estimator with uniform bias.

ϕ̂T ,preadjjk =

m∑i=1

(xij − 1)(xik − 1)− 1

4m∑i=1

p̂Ti(1− p̂Ti

) + 1 a.s.−−−→m→∞

ϕTjk − ϕ̄T

1− ϕ̄T,

Step 2: Estimate minimum kinship, use to unbias “step 1” estimates.

ϕ̂T ,preadjmin

a.s.−−−→m→∞

− ϕ̄T

1− ϕ̄T, ϕ̂T ,new

jk =ϕ̂T ,preadjjk − ϕ̂T ,preadj

min

1− ϕ̂T ,preadjmin

a.s.−−−→m→∞

ϕTjk .

This yields consistent f̂ T ,newj , F̂ new

ST estimators!

30 / 35

Page 55: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Our new estimator (R package ‘popkin’ on CRAN)

Step 1: “pre-adjusted” kinship estimator with uniform bias.

ϕ̂T ,preadjjk =

m∑i=1

(xij − 1)(xik − 1)− 1

4m∑i=1

p̂Ti(1− p̂Ti

) + 1 a.s.−−−→m→∞

ϕTjk − ϕ̄T

1− ϕ̄T,

Step 2: Estimate minimum kinship, use to unbias “step 1” estimates.

ϕ̂T ,preadjmin

a.s.−−−→m→∞

− ϕ̄T

1− ϕ̄T, ϕ̂T ,new

jk =ϕ̂T ,preadjjk − ϕ̂T ,preadj

min

1− ϕ̂T ,preadjmin

a.s.−−−→m→∞

ϕTjk .

This yields consistent f̂ T ,newj , F̂ new

ST estimators!

30 / 35

Page 56: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Performance of new estimator

True KinshipA New estimateB Standard estimateC Limit of standardD

00.

10.

2

Kin

ship

Indi

vidu

als

31 / 35

Page 57: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Bias in FST estimators for independent subpopulationsPrevious estimator for n subpopulations, simplified for known AFs (πij):

F̂ indepST =

m∑i=1

σ̂2i

m∑i=1

p̂Ti(1− p̂Ti

)+ 1

nσ̂2i

,

p̂Ti =1n

n∑j=1

πij , σ̂2i =

1n − 1

n∑j=1

(πij − p̂Ti

)2.

Estimator is biased in dependent subpopulations:

F̂ indepST

a.s.−−−→m→∞

FST − 1n−1

(nθ̄T − FST

)1− 1

n−1

(nθ̄T − FST

) .

32 / 35

Page 58: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Bias in FST estimators for independent subpopulationsPrevious estimator for n subpopulations, simplified for known AFs (πij):

F̂ indepST =

m∑i=1

σ̂2i

m∑i=1

p̂Ti(1− p̂Ti

)+ 1

nσ̂2i

,

p̂Ti =1n

n∑j=1

πij , σ̂2i =

1n − 1

n∑j=1

(πij − p̂Ti

)2.

Estimator is biased in dependent subpopulations:

F̂ indepST

a.s.−−−→m→∞

FST − 1n−1

(nθ̄T − FST

)1− 1

n−1

(nθ̄T − FST

) .

32 / 35

Page 59: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Only our new method estimates generalized FST accurately

0.00

0.05

0.10

Ind. Subpops.A

Wei

r−C

ocke

rham

Hud

sonK

Bay

eSca

n

Sta

ndar

dK

insh

ip

New

− − −−

− AdmixtureB

Wei

r−C

ocke

rham

Hud

sonK

Bay

eSca

n

Sta

ndar

dK

insh

ip

New

− −−

True FST

FSTindep limit

Estimates−

FS

T e

stim

ate

FST estimator

33 / 35

Page 60: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

The future: improved kinship has repercussions across genetics!

Accurate andefficientestimation,admixturemodeling

Associationstudies, selectiontests

Bias inheritability ofcomplex traits

Animal and plantbreeding

34 / 35

Page 61: Ochoa Lab - Relatedness and differentiation in arbitrary ......2018/01/22  · Jew_Ashkenazi Cypriot Romanian Albanian North_Ossetian Jew_GeorgianLezgin Kumyk Balkar Tajik Nogai Chuvash

Acknowledgments

John D. StoreyAndrew BassIrineo CabrerosWei HaoRiley Skeen-Gaar

Neo Christopher ChungUniversity of Warsaw

Funding:National Institutes of HealthOtsuka Pharmaceutical

Lewis-Sigler Institute for Integrative Genomics

35 / 35