introduction to data collection at xfels and serial femtosecond crystallography … · introduction...
TRANSCRIPT
Introduction to data collection at XFELs and serial femtosecond crystallography data analysis
Jose M. Martin GarciaAssistant Research Scientist
The Biodesign Center for Applied Structural DiscoveryArizona State University
Macromolecular Crystallography Workshop CSIC, Madrid, May 10th, 2017
1
2
Acknowledgements
$$$$$.......
Petra Fromme(CASD, ASU)
John Spence (CASD, ASU)
Henry Chapman(CFEL, DESY)
Anton Barty(CFEL, DESY)
Thomas White (CFEL, DESY)
Valerio Mariani(CFEL, DESY)
Nadia Zatsepin(CASD, ASU)
Rick Kirian(CASD, ASU)
Oleksandr Yefanov(CFEL, DESY)
Marius Schmidt Geroge Phillips
Lois Pollack
Collaborators
Robert Fischetti
Thomas Grant(HWMRI)
Outline
1. Introduction to XFELs and SFX
2. SFX: New challenges
3. Cheetah
4. CrystFEL
1. Introduction to XFELs and SFX
The Concept of a Free Electron Laser
5
6
The Linac Coherent Light Source (LCLS)
7
https://www.youtube.com/watch?v=RG-PYmeq2XE
8
Inside View: The Undulator Tunnel
9
Main Entrance to FEH
10
LCLS FEH
Control Room @ CXI
Downstream View: Hutch @ CXI
13
Chambers @ CXI
X-ray free electron lasers (XFELs)
World’s most powerful photon sources
Extremely bright X-ray beams
Ultra-short duration
1010
Beam Properties XFELs Synchrotrons
High flux 1013 1013
Peak brilliance 1033 1023
Max. electron energy 6 - 20 GeV 1 – 7 GeV
Repetition rate 60 - 120 Hz (27,000 Hz @ EU-XFEL) ~ 10 Hz
Pulse duration 10 – 300 fs < 100 ps
14
Review: Martin-Garcia JM, et al. “Serial Femtosecond Crystallography: A Revolution in Structural Biology”. Archives of Biochemistry and Biophysics, 602, 32-47 (2016)
Serial femtosecond crystallography (SFX)
15
One diffraction pattern per crystal per X-ray pulse
Nano / micro-crystals are delivered in a serial manner
in random orientations
Diffraction before destroy
Plasma from femtosecond X-rays
X-ray Diffraction Pattern
Photosystem II
17
Crystallography @ synchrotrons
Crystallography @ XFELs
Need large crystals
Crystals are usually frozen
Crystals are on a goniometer
Many images per crystal
Up to 1000 images per data set
1-10 crystals per data set
Full reflections
Nano / micro-crystals (0.1 - 10 µm)
Room temperature
Crystals are in liquid or highly-viscous jet, not on a goniometer
One image per crystal per pulse
At least 10,000 images per data set
Hundreds of thousands of crystals
Partial reflections
18
goals• extract kinetics and dynamics• kinetic mechanism• rate coefficient• barriers of activation and• molecular structures of reaction intermediates
for• reversible reactions and• irreversible (catalyzed) reactions
fromcrystallographic data ONLY!
Time-Resolved Crystallography at XFELs
19
The ideal combination for TR-SFX: small crystals + XFELs
20
• Data collected in December 2015
• CXI instrument @ LCLS
• Protein: β-lactamase C from M. tuberculosis
• Substrate: Ceftriaxone
• Crystal size 1-3 µm
• Sample delivery: Mixing-jet injector
• Time delay: 2s
• Photon energy: 9 keV
• Pulse duration: 40 fs
• Repetition rate: 120 Hz
Study of enzymatic reactions
Petra Fromme, John Spence, Uwe Weierstall
Marius Schmidt
Geroge Phillips Lois Pollack
Kupitz C, et al, 2016. Structural Enzymology using X-ray free Electron Lasers. Structural Dynamics, 4(4):044003
The proof-of-principle
21
Study of enzymatic reactions
Kupitz C, et al, 2016. Structural Enzymology using X-ray free Electron Lasers. Structural Dynamics, 4(4):044003
22Kupitz C, et al, 2016. Structural Enzymology using X-ray free Electron Lasers. Structural Dynamics, 4(4):044003
23
Study of enzymatic reactions
Kupitz C, et al, 2016. Structural Enzymology using X-ray free Electron Lasers. Structural Dynamics, 4(4):044003
24
2. SFX: New Challenges
25
Sample delivery
Detector technology
CSPAD (Cornell-SLAC Pixel Array Detector) @ CXI, LCLS
• 4 quadrants independently movable to change the size of the hole in the center (no beam stop)
• 32 modules tiled to fill 1700 x 1700 pixels with gaps between modules
• 2.3 x 106 pixels
• Pixel size 110 µm x 110 µm
• Dynamic range of about 350 photons at 9.4 keV
• 120 Hz frame rate
26Philipp, H. T., et al. (2011). Pixel array detector for X-ray free electron laser experiments. Nucl. Instrum. Methods Phys. Res. A, 649, 67–69.
Why do we need so many patterns?
Crystal size
Crystal shape
Crystal orientation
Crystal quality
XFEL beam position
XFEL beam energy spectrum
XFEL beam intensity
Partially recorded reflections (no crystaloscillation, monochromatic beam)
• In SFX every pulse is like a new experiment
• Need 10,000 - 100,000 indexed patterns (individual crystals) for one data set (up to 1,000 from one crystal at synchrotrons)
27
The need of new software
LCLS pulse structure (120 Hz; 7,200pulses/min)
CSPAD detector @ CXI, LCLS : 2.3 x 106
pixels
4.5 MB/frame => 2 TB/hour => 120 TB /
experiment (5 shifts, 1 shift = 12 h)
New type of data
Large amount of data
New, complicated detectors (hybrid pixel
array detectors)
100 msec fs
Data Handling
28
SFX data analysis pipeline
OnDA
1.Online monitoring2.Live hit rate and resolution estimate3.Live saturating pixel tracking
Cheetah
1.Hit finding (data reduction)2.Background subtraction3.Clean diffraction patterns and meta data saved as HDF5 or CXI4. Statistics and preliminary analysis
CrystFEL
1.Indexing2.Integration3.Merging4.Post refinement
DAQ: raw XTC files containing X-ray pulse parameters, pump laser signal, diagnostics, motor positions, etc.
CSPAD detector
2. Cheetah
30
http://www.desy.de/~barty/cheetah/Cheetah/Cheetah_GUI.html
What does Cheetah do?
Barty, A. et al. (2014). “Cheetah: software for high-throughput reduction and analysis of serial femtosecond X-ray diffraction data,” J Appl. Cryst., vol. 47, pp. 1118–1131.
2. Rapid feedback
Hit rate, resolution, diffraction qualityQuickly viewing images
3. Data reduction
Keeps only useful events crystals(ie: frames with crystal diffraction)
4. Data translation
XTC data is converted to a facilityindependent format (HDF5 or CXI)
5. Data organization
Summarises what is in each run; easy togroup data by sample; summarisesstatistics
31Barty, A. et al. (2014). “Cheetah: software for high-throughput reduction and analysis of serial femtosecond X-ray diffraction data,” J Appl. Cryst., vol. 47, pp. 1118–1131.
Cheetah: Background subtraction
After ‘running’ background subtraction
After ‘local’ background subtraction
• Local background subtraction is advisable for samples deliveredin a liquid or viscous jet.
• Uses the data from the current frame to estimate the backgroundof the current frame.
• Background = median of all pixels values in a box of side length2r+1.
• The area of the box is at least twice the area of any potentialBragg peaks and contains at least three times the number or pixelin the peak.
1- Running background subtraction
2- Local background subtraction
• Uses the many blank frames interleaved between hits to providean up-to-date estimate of background signal in the data.
• Background = median of pixels in the entire blank data set.
32
Cheetah: Hit finding
1- Identification of possible Bragg peaks.
• Threshold (pixel intensity applied over the entire image)
• Min_Number_pixels
• Max_Number_pixels
• SNR (weak peaks relative to background)
• Peakmask (pixel mask identifying regions to exclude from peak searching)
2- Identification of sample hits.
• npeaks > 15 (minimum number required by CrystFEL)
Hit rates depend on the experiment and sample delivery techniques. Nanocrystal diffraction in solution typically has hit rates of 10–15% (30-40 % high-viscosity injector), although extrems as low as 1% have been observed for dilute samples.
Current sample delivery techniques are far from achieving the goal of 100% useful data, and thus frame rejection strategies are currently very effective in reducing data volumes.
Barty, A. et al. (2014). “Cheetah: software for high-throughput reduction and analysis of serial femtosecond X-ray diffraction data,” J Appl. Cryst., vol. 47, pp. 1118–1131.
Cheetah ‘quick start’
33
Cheetah ‘flash start’
34
Cheetah GUI
Newly collected data (new runs) appear automatically ready to process
Status of data collection
35
Cheetah GUI
One-click to start the processing of data sets
36
Cheetah GUI
Status of processing is continuallyupdated
Contents of each run and associated data directory
Cheetah GUI
38
Cheetah GUIHit rate
Resolution
Cheetah GUI
Virtual powder patterns
Hits
Blanks
40
3. CrystFEL
41White, T. A., et al. (2012). "CrystFEL: a software suite for snapshot serial crystallography". J. Appl. Cryst. 45, p335–341.
White, T. A., et al. (2016). “Recent developments in CrystFEL”. J. Appl. Cryst. 49, 680-689.
http://www.desy.de/~twhite/crystfel/index.html
• Suite of programs for processing serial crystallography data acquired at XFELs (and synchrotrons too!!).
• CrystFEL does……IndexingIntegrating MergingScalingViewingHit finding (too!)
• CrystFEL final output files (mtz files) can be fed into Phenix, CCP4, etc.
• Unlike Cheetah, CrystFEL uses command lines or scripts. A CrystFELGUI is on its way!!!
What is CrystFEL?
Latest version: CrystFEL version 0.6.2
42White, T. A., et al. (2012). "CrystFEL: a software suite for snapshot serial crystallography". J. Appl. Cryst. 45, p335–341.
White, T. A., et al. (2016). “Recent developments in CrystFEL”. J. Appl. Cryst. 49, 680-689.
indexamajig
Rapid indexing, integration and data reduction program.
pattern_sim
A diffraction pattern simulation tool.
process_hkl
A tool merging and scaling intensities from many patterns into a single reflection list, via the Monte Carlo method.
partialator
Full scaling and post-refinement process for accurate merging of data and outlier rejection.
ambigator
A tool for resolving indexing ambiguities.
get_hkl
A tool for manipulating reflection lists, such as performing symmetry expansion.
cell_explorer
A tool for examining the distributions of unit cell parameters.
compare_hkl and check_hkl
Tools for calculating figures of merit, such as completeness and R-factors.
partial_sim
A tool for calculating partial reflection intensities, perhaps for testing the convergence of Monte Carlo merging.
hdfsee
A simple viewer for images stored in HDF5 format.
render_hkl
A tool for rendering slices of reciprocal space in two dimensions.
geoptimiser
A program to refine and optimize detector geometry.
CrystFEL core programs
43
CrystFEL: Overall pipeline
44
Flow diagram of diffraction pattern processing in indexamajig
• Two peak search methods: peaks=hdf5 (Cheetah’s output), peaks=zaef (internal algorithm)• Input files: Diffraction patterns (HDF5 or CXI formats)• Output file: “stream” file (long plain text)• Geometry file (plain text file)• Unit cell parameters (text file containing the “CRYST1” line in PDB files)
White, T. A., et al. (2012). "CrystFEL: a software suite for snapshot serial crystallography". J. Appl. Cryst. 45, p335–341.
45
Indexing methods
$ indexamajig -i |....| --indexing=method1,method2,... |....|
mosflm-raw-nolatt-nocell
Invoke Mosflm. To use this option, 'ipmosflm' must be in
your shell's search path.
Do not check the resulting unit cell with the target cell. This
option is useful when you need to determine the unit cell ab initio.
Do not use lattice type information to guide
the indexing.
Do not use unit cell parameters as prior
information for the core indexing algorithm
This also applies to other indexing methods such as DirAx, and XDS
46
Indexing methods
$ indexamajig -i |....| --indexing=method1,method2,... |....|
mosflm-comb-latt-cell
Invoke Mosflm. To use this option, 'ipmosflm' must be in
your shell's search path.
Use lattice type information to guide
the indexing.
Use unit cell parameters as prior information for the
core indexing algorithm
This also applies to other indexing methods such as DirAx, and XDS
Check linear combinations of the unit cell basis vectors to see if a cell can be produced which looks like your unit cell
47
Indexing methods
$ indexamajig -i |....| --indexing=method1,method2,... |....|
mosflm-axes-latt-cell
Invoke Mosflm. To use this option, 'ipmosflm' must be in
your shell's search path.
Use lattice type information to guide
the indexing.
Use unit cell parameters as prior information for the
core indexing algorithm
This also applies to other indexing methods such as DirAx, and XDS
Check permutations of the axes for correspondence with your cell, but do not check linear combinations. This is
useful to avoid a potential problem when one of the unit cell axis lengths is close to a multiple of one of the others
48
Indexing methods
$ indexamajig -i |....| --indexing=method1,method2,... |....|
mosflm-axes-latt-cell-retry
Invoke Mosflm. To use this option, 'ipmosflm' must be in
your shell's search path.
Use lattice type information to guide
the indexing.
Use unit cell parameters as prior information for the
core indexing algorithm
This also applies to other indexing methods such as DirAx, and XDS
Check permutations of the axes for correspondence with your cell, but do not check linear combinations. This is
useful to avoid a potential problem when one of the unit cell axis lengths is close to a multiple of one of the others
49
Indexamajig1. Create a list of filenames to process:
$ find /reg/d/psdm/cxi/cxin5016/results/slab/jose/cheetah/hdf5/r0052-BlaC/ -name
'cxin5016-r0052-c00.cxi' -print > tutorial.lst
50
Indexamajig1. Create a list of filenames to process:
$ find /reg/d/psdm/cxi/cxin5016/results/slab/jose/cheetah/hdf5/r0052-BlaC/ -name
'cxin5016-r0052-c00.cxi' -print > tutorial.lst
2. Rough estimation of the unit cell parameters:
$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-
raw-nolatt-nocell --int-radius=3,4,5 -o tutorial.stream
$ grep "Cell parameters" tutorial.stream
$ grep "centering" tutorial.stream
$ cell_explorer tutorial.stream
51
mosflm-raw-nolatt-nocell
Invoke Mosflm. To use this option, 'ipmosflm' must be in
your shell's search path.
Do not check the resulting unit cell with the target cell. This
option is useful when you need to determine the unit cell ab initio.
Do not use lattice type information to guide
the indexing.
Do not use unit cell parameters as prior
information for the core indexing algorithm
52
53
The Unit Cell Explorer tool
54
Indexamajig1. Create a list of filenames to process:
$ find /reg/d/psdm/cxi/cxin5016/results/slab/jose/cheetah/hdf5/r0052-BlaC/ -name
'cxin5016-r0052-c00.cxi' -print > tutorial.lst
2. Rough estimation of the unit cell parameters:
$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-
raw-nolatt-nocell --int-radius=3,4,5 -o tutorial.stream
$ grep "Cell parameters" tutorial.stream
$ grep "centering" tutorial.stream
$ cell_explorer tutorial.stream
3. Index the patterns using Bravais lattice information only:
$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-
raw-latt-nocell –p blaC.cell --int-radius=3,4,5 -o tutorial.stream
55
56
57
Indexamajig1. Create a list of filenames to process:
$ find /reg/d/psdm/cxi/cxin5016/results/slab/jose/cheetah/hdf5/r0052-BlaC/ -name
'cxin5016-r0052-c00.cxi' -print > tutorial.lst
2. Rough estimation of the unit cell parameters:
$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-
raw-nolatt-nocell --int-radius=3,4,5 -o tutorial.stream
$ grep "Cell parameters" tutorial.stream
$ grep "centering" tutorial.stream
$ cell_explorer tutorial.stream
3. Index the patterns using Bravais lattice information only:
$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-
raw-latt-nocell –p blaC.cell --int-radius=3,4,5 -o tutorial.stream
4. Index the patterns using actual unit cell parameters:
$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --int-radius=3,4,5 -
o tutorial.stream --indexing=mosflm-axes-latt-cell -p blaC.cell –integration=rings-
sat –tolerance=2,2,2,1.5
58
59
Indexamajig1. Create a list of filenames to process:
$ find /reg/d/psdm/cxi/cxin5016/results/slab/jose/cheetah/hdf5/r0052-BlaC/ -name
'cxin5016-r0052-c00.cxi' -print > tutorial.lst
2. Rough estimation of the unit cell parameters:
$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-
raw-nolatt-nocell --int-radius=3,4,5 -o tutorial.stream
$ grep "Cell parameters" tutorial.stream
$ grep "centering" tutorial.stream
$ cell_explorer tutorial.stream
3. Index the patterns using Bravais lattice information only:
$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --indexing=mosflm-
raw-latt-nocell –p blaC.cell --int-radius=3,4,5 -o tutorial.stream
4. Index the patterns using actual unit cell parameters:
$ indexamajig -i tutorial.lst -g cxil2316-nz1.geom --peaks=cxi --int-radius=3,4,5 -
o tutorial.stream --indexing=mosflm-axes-latt-cell -p blaC.cell –integration=rings-
sat –tolerance=2,2,2,1.5
5. Evaluate the quality of indexing:
$ ./check-near-bragg tutorial.stream -g cxil2316-nz1.geom
$ ./check-peak-detection --not-indexed tutorial.stream -g cxil2316-nz1.geom
$ ./check-peak-detection --indexed tutorial.stream -g cxil2316-nz1.geom
60
CrystFEL: Overall pipeline
White, T. A., et al. (2012). "CrystFEL: a software suite for snapshot serial crystallography". J. Appl. Cryst. 45, p335–341.
61
1- Process_hkl
Takes a data stream, such as that from indexamajig, and merges the many individual intensities together toform a single list of reflection intensities which are useful for crystallography. Merging is done by theMonte Carlo method, otherwise known as taking the mean of the individual values
$ process_hkl -i tutorial.stream -o tutorial.hkl -y 2/m –-lowres=40 –-
highres=2.0 –nshells=25
$ process_hkl -i tutorial.stream -o tutorial.hkl -y 2/m –-lowres=40 –-
highres=2.0 –nshells=25 –-even=only
$ process_hkl -i tutorial.stream -o tutorial.hkl -y 2/m –-lowres=40 –-
highres=2.0 –nshells=25 –-odd=only
2- Partialator is the alternative to process_hkl.
Merging and scaling the intensities
62
Symmetry Classification for SFX experiments
63
64
Reflections quality check
1- Check_hkl
It calculates figures of merit for reflection data, such as completeness and average signal strengths, inresolution shells. check_hkl accepts a single reflection list in CrystFEL's format, and you must alsoprovide a unit cell (in a PDB file or CrystFEL unit cell format).
$ check_hkl tutorial.hkl -y 2/m -p blaC.cell --lowres=40 --highres=3 --shells=25
65
66
Reflections quality check
1- Check_hkl
It calculates figures of merit for reflection data, such as completeness and average signal strengths, inresolution shells. check_hkl accepts a single reflection list in CrystFEL's format, and you must alsoprovide a unit cell (in a PDB file or CrystFEL unit cell format).
$ check_hkl tutorial.hkl -y 2/m -p blaC.cell --lowres=40 --highres=3 --shells=25
2- Compare_hkl
It compares two sets of reflection data and calculates figures of merit such as R-factors or CC1/2.Reflections will be considered equivalent according to your choice of point group. You need to provide aunit cell, as a PDB file or a CrystFEL unit cell file.
$ compare_hkl tutorial.hkl1 tutorial.hkl2 -y 2/m -p blaC.cell --lowres=40 --
highres=3 --shells=25
67