haploreg, regulomedb and more on python programming lin liu yang li

15
HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

Upload: anne-grist

Post on 02-Apr-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

HaploReg, RegulomeDB and more on Python programming

Lin LiuYang Li

Page 2: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li
Page 3: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

• HaploReg retrieves the ENCODE annotation for the selected SNP, as well as other SNPs in LD

• Using the “Set Options” tab, the user can configure values such as the LD threshold and the population used from 1000 Genomes data used to calculate LD

Page 4: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

RegulomeDB

Page 5: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li
Page 6: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li
Page 7: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li
Page 8: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li
Page 9: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

Python programming wrap-up• if else• for and while loop• index: starts from 0, different from R• four important data structure:

– list: a = [1, 2, 3, 4]; a.append(5)– tuple: a = (‘cat’, ‘dog’); a[0], a[1] = a[1], a[0]– dictionary: a = {‘chr1’:{10254:’G’, 13257:’T’}}; a.keys();– sets:

• from sets import Set• species = Set([‘hs’, ‘mm’, ‘chimp’])• zoos = Set([‘mm’, ‘wolf’, ‘chimp’])• zoos | species• zoos & species• zoos - species

Page 10: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

• Some tricky fact:– Shallow copy and deep copy• Shallow copy: a = [1,2,3]; b = a; b[2] = 4; print(a)• Deep copy:

– from copy import deepcopy– a = [1, 2, 3]; b = deepcopy(a); b[2] = 4; print(a)

– List comprehension:• Like in R: loops are slow slow slow• a = [1, 2, 3]; a = [b + 1 for b in a]; print(a)

Page 11: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

• How to read bam (binary) files in python?– import pybedtools

• How to perform numerical computation in python?– import numpy as np– Include array and matrix calculation, very useful

• How to use shell script in python?– Get all files in a folder– import os– os.listdir(“yourdirectory”)

Page 12: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

Object oriented programming• Class and objects in pythonclass HMM: #constructor #transition_probs[i, j] is the probability of transitioning to state i from state j #emission_probs[i, j] is the probability of emitting emission j while in state i def __init__(self, transition_probs, emission_probs): self._transition_probs = transition_probs self._emission_probs = emission_probs

#accessors def emission_dist(self, emission): return self._emission_probs[:, emission]

@property def num_states(self): return self._transition_probs.shape[0]

@property def transition_probs(self): return self._transition_probs

Page 13: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

Interface with other programming language

• Rpy: R and python interface• cygwin: python and C interface• When to use python?– Text manipulation– Some simple machine learning implementation

(like using matlab)– Some very well-written package available: PyStan

(Bayesian MCMC sampler), matlablib, pybedtools etc

Page 14: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

• When not to use python:– Large scale simulation: most often you cannot get

rid of loops– Statistical analysis: R is much better and well

curated– Best strategy: C interface python

Page 15: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

Some good reference code for python

• Check MACS14 python script• You can learn how to write a python script into

an executable software from MACS14