published by: international chinese statistical

126

Upload: others

Post on 26-Apr-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Published by: International Chinese Statistical
Page 2: Published by: International Chinese Statistical

Published by: International Chinese Statistical Association and Department of Statistics, Colorado State UniversityPhotographer for the front cover: Ya Zhang

Page 3: Published by: International Chinese Statistical

24th Applied Statistics Symposium and 13th Graybill Conference

2015

CONFERENCE INFORMATION, PROGRAM AND ABSTRACTS

June 14 - 17, 2015

Colorado State University: Lory Student Center

Fort Collins, Colorado, USA

Organized by

International Chinese Statistical Association and Department of Statistics, ColoradoState University

c©2015International Chinese Statistical Association and Department of Statistics, Colorado

State University

Page 4: Published by: International Chinese Statistical

Professor Franklin A. Graybill

Department of Statistics Colorado State University

The following graduates of the Department of Statistics at Colorado State University completed their

degrees under the guidance of Professor Franklin A. Graybill

Mohamed H. Albohali (MS '79) Robert A. Ahlbrandt (MS '87) Carmen E. Arteaga (MS '80) James H. Baylis (MS '77) David C. Bowden (MS '65, PhD '68) Brent D. Burch (MS '93, PhD '96) James A. Calvin (PhD '85) Terrence L. Connell (MS '63, PhD '66) Ruth Ann Daniel (MS '80) Ali Mashat Deeb (MS '81) Richard M. Engeman (MS '75) Rana S. Fayyad (PhD '95) Mark J. Grassl (MS '80) Rongde Gui (PhD '92) Paul A. Hatab (MS '77) William C. Heiny (MS '81) Sakthivel Jeyaratnam (PhD '78) Dallas E. Johnson (PhD '71) Thomas A. Jones (MS '67) Yongsang Ju (MS '92) Adam Kahn (MS '78) M. Kazem Kazempour (PhD '88)

Albert Kingman (PhD '69) Stephen L. Kozarich (PhD '71) Ricardo A. Leiva (MS '82) Tai-Fang Chen Lu (MS '79, PhD '85) Sandra Mader (MS '77) Farooq Maqsood (MS '84) Louise R. Meiman (MS '67) Ronald R. Miller (MS '76) George A. Milliken (MS '68, PhD '69) Michael E. Mosier (PhD '92) William B. Owen (PhD '65) Antonio Reverter-Gomez (MS '94) Robert C. Rounding (PhD '65) Bhabesh Sen (PhD '88) Jeanne Simpson (MS '78) Syamala Srinivasan (MS '84, PhD '86) R. Kirk Steinhorst (MS '69, PhD '71) Naitee Ting (PhD ’87) N. Scott Urquhart (MS '63) Antonia Wang (MS '82) Chih-Ming (Jack) Wang (PhD '78)

Page 5: Published by: International Chinese Statistical

Contents

Welcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1Conference Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Committees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Conference Venue Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5Program Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7Keynote Lectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8Graybill Plenary Lectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9Leadership Forum Lectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10Student Paper Awards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11Social Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12Banquet Speaker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13Short Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14ICSA 2016 in Atlanta, GA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19ICSA 2016 China Statistics Conferenc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20ICSA Banquet at JSM 2015 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21New Journal Announcement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Scientific Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Monday, June 15. 8:20 AM - 9:40 AM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Monday, June 15. 10:00 AM-11:40 AM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24Monday, June 15. 1:00 PM - 2:40 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27Monday, June 15. 3:00 PM - 4:40 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30Monday, June 15. 4:40 PM - 6:00 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32Tuesday, June 16. 8:40 AM - 9:40 AM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Tuesday, June 16. 10:00 AM - 11:40 AM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33Tuesday, June 16. 1:00 PM - 2:40 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36Tuesday, June 16. 3:00 PM - 4:00 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Wednesday, June 17. 8:40 AM - 10:20 AM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39Wednesday, June 17. 10:40 AM - 12:20 PM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Abstracts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44Session 1: Best Practices for Delivery of Adaptive Clinical Trials Illustrated with Case Studies . . . . . . . . . . 44Session 2: Chemistry, Manufacturing, and Controls (CMC) in Pharmaceuticals: Current Statistical Challenges I . 44Session 3: Chemistry, Manufacturing, and Controls (CMC) in Pharmaceuticals: Current Statistical Challenges II 45Session 4: New Techniques for Functional and Longitudinal Data Analysis . . . . . . . . . . . . . . . . . . . . 45Session 5: Recent Advancements in Statistical Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . 46Session 6: Recent Advances in Analyzing Genomic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47Session 7: Scalable Multivariate Statistical Learning with Massive Data . . . . . . . . . . . . . . . . . . . . . . 48Session 8: New Statistical Advance in Genomics and Health Science Applications . . . . . . . . . . . . . . . . . 49Session 9: SII Special Invited Session on Modern Bayesian Statistics I . . . . . . . . . . . . . . . . . . . . . . . 50Session 10: SII Special Invited Session on Modern Bayesian Statistics II . . . . . . . . . . . . . . . . . . . . . . 51Session 11: Emerging Issues in Time-to-Event Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51Session 12: Taiwan National Health Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52Session 13: Recent Advance in Longitudinal Data Analyses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53Session 15: Innovative Statistical Approaches in Nonclinical Research . . . . . . . . . . . . . . . . . . . . . . . 54Session 16: Statistical Advances for Genetic Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

iii

Page 6: Published by: International Chinese Statistical

Session 18: Statistical Methods for Sequencing Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 56Session 19: Recent Developments in the Theory and Applications of Spatial Statistics . . . . . . . . . . . . . . . 56Session 20: Risk Prediction Modeling in Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57Session 21: The Application of Latent Variable and Mixture Models to the Biological Sciences . . . . . . . . . . 57Session 22: Clinical Trials with Multiple Objectives: Maximizing the Likelihood of Success . . . . . . . . . . . 58Session 23: Issues Related to Subgroup Analysis in Confirmatory Clinical Trials: Challenges and Opportunities . 59Session 24: Recent Developments in Missing Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60Session 25: Spatial and Spatio-Temporal Modeling in Environmental and Ecological Studies . . . . . . . . . . . 61Session 26: Challenges in Analyzing Complex Data Using Regression Modeling Approaches . . . . . . . . . . 61Session 27: Bayesian Applications in Biomedical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Session 28: Go/No Go Decision Criteria and Probability of Success in Pharmaceutical Drug Development . . . . 63Session 29: Machine Learning for Big Data Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64Session 30: Tensor-Structured Statistical Modelling and Inferences . . . . . . . . . . . . . . . . . . . . . . . . 64Session 31: Adaptive Designs for Early-Phase Oncology Clinical Trials . . . . . . . . . . . . . . . . . . . . . . 65Session 32: Recent Development on Next Generation Sequencing Based Data Analysis . . . . . . . . . . . . . . 66Session 33: Challenges of Quantile Regression in High-Dimensional Data Analysis: Theory and Applications . . 66Session 34: Recent Advances in Genomics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67Session 35: Novel Designs and Applications of Adaptive Randomization in Medical Research . . . . . . . . . . 68Session 36: Lifetime Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68Session 37: Statistical Methods for Large Computer Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 69Session 38: New Approaches for Analyzing Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Session 39: Statistica Sinica Special Invited Session on Spatial and Temporal Data Analysis . . . . . . . . . . . 70Session 40: Use of Biomarker and Genetic Data in Drug Development . . . . . . . . . . . . . . . . . . . . . . . 71Session 41: New Frontier of Functional Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Session 42: New Methodology in Spatial and Spatio-Temporal Data Analysis . . . . . . . . . . . . . . . . . . . 72Session 44: Funding Opportunities and Grant Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73Session 45: Advances and Case Studies for Multiplicity Issues in Clinical Trials . . . . . . . . . . . . . . . . . 73Session 46: Recent Advances in Integrative Analysis of Omics Data . . . . . . . . . . . . . . . . . . . . . . . . 74Session 47: New Development in Nonparametric Methods and Big Data Analytics . . . . . . . . . . . . . . . . 74Session 48: Trends and Innovation in Missing Data Sensitivity Analyses . . . . . . . . . . . . . . . . . . . . . . 75Session 49: Multi-Regional Clinical Trial Design and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Session 50: Biostatistics and Health Sciences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76Session 51: Recent Developments in Analyzing Censored Survival Data . . . . . . . . . . . . . . . . . . . . . . 77Session 52: Advances in Survey Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78Session 53: Innovative Statistical Methods in Genomics and Genetics . . . . . . . . . . . . . . . . . . . . . . . 78Session 54: Recent Development in Epigenetic Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79Session 55: New Method Development for Survival Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80Session 56: Recent Developments in Statistical Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . 81Session 57: Recent Developments on High-Dimensional Inference in Biostatistics . . . . . . . . . . . . . . . . . 82Session 58: Blinded and Unblinded Evaluation of Aggregate Safety Data during Clinical Development . . . . . . 82Session 59: Design and Analysis of Non-Inferiority Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . 83Session 60: Toward More Effective Identification of Biomarkers and Subgroups for Development of Tailored

Therapies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83Session 61: Design and Analysis Issues in Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Session 62: Statistical Challenges in Economic Research Involving Medical Costs . . . . . . . . . . . . . . . . . 85Session 63: Adaptive Design and Sample Size Re-Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86Session 64: Recent Development in Personalized Medicine and Survival Analysis . . . . . . . . . . . . . . . . . 86Session 65: New Strategies to Identify Disease Associated Genomic Biomarkers . . . . . . . . . . . . . . . . . 87Session 66: Recent Advances in Empirical Likelihood Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 88Session 67: New Advances in Adaptive Design and Analysis of Clinical Trials . . . . . . . . . . . . . . . . . . 89Session 68: Design and Analysis in Drug Combination Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Session 69: Recent Developments in Empirical Likelihood Methodologies: Diagnostic Studies, Goodness-of-Fit

Testing, and Missing Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90Session 70: Use of Simulation in Drug Development and Decision Making . . . . . . . . . . . . . . . . . . . . 91Session 71: Next Generation Functional Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91Session 73: Non-Parametrics and Semi-Parametrics: New Advances and Applications . . . . . . . . . . . . . . . 92

Page 7: Published by: International Chinese Statistical

Session 74: Empirical Likelihoods for Analyzing Imcomplete Data . . . . . . . . . . . . . . . . . . . . . . . . . 93Session 75: Model Selection in Complex Data Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Session 76: Advances in Statistical Methods of Identifying Subgroup in Clinical Studies . . . . . . . . . . . . . 94Session 77: Recent Innovative Methodologies and Applications in Genetics & Pharmacogenomics (GpGx) . . . . 95Session 78: Analysis and Classification of High Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . . . 96Session 79: Recent Developments on Combining Inferences and Hierarchical Models . . . . . . . . . . . . . . . 96Session 80: Recent Advances in Development and Evaluation of Predictive Biomarkers . . . . . . . . . . . . . . 97Session 81: What Are the Expected Professional Behaviors After Statistics Degrees . . . . . . . . . . . . . . . . 98Session 82: The Jiann-Ping Hsu Invited Session on Biostatistical and Regulatory Sciences . . . . . . . . . . . . 98Session 83: Dose Response/Finding Studies in Drug Development . . . . . . . . . . . . . . . . . . . . . . . . . 99Session 84: Design More Efficient Adaptive Clinical Trials Using Biomarkers . . . . . . . . . . . . . . . . . . . 100Session 85: Advances in Nonparametric and Semiparametric Statistics . . . . . . . . . . . . . . . . . . . . . . . 100Session 86: Cutting-Edge New Tools for Statistical Analysis and Modeling . . . . . . . . . . . . . . . . . . . . 101Session 87: Advanced Methods for Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102Session 88: Advanced Development in Big Data Analytics Tools . . . . . . . . . . . . . . . . . . . . . . . . . . 102Session 89: Recent Advances in Biostatistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103Session 90: Adaptive Designs and Personalized Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104Session 91: Recent Developments of High-Dimensional Data Inference and Its Applications . . . . . . . . . . . 105Session 92: Issues in Probabilistic Models for Random Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 105Session 93: Negotiation Skills Critical for Statistical Career Development . . . . . . . . . . . . . . . . . . . . . 106Session C01: Disease Models, Observational Studies, and High Dimensional Regression . . . . . . . . . . . . . 107Session C02: Design and Analysis of Clinical Trials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108Session C03: Functional Data, Semi-parametric and Non-parametric Methods . . . . . . . . . . . . . . . . . . . 109Session C04: Multiple Comparisons, Meta-analysis, and Mismeasured Outcome Data . . . . . . . . . . . . . . . 110Session P01: Poster Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Index of Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Page 8: Published by: International Chinese Statistical
Page 9: Published by: International Chinese Statistical

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 1

Welcome

2015 ICSA/Graybill Joint Conference

June 14-17, Fort Collins, Colorado, USA

Welcome to the 2015 Joint 24th International Chinese Statistical Association (ICSA) Applied Statistics

Symposium and 13th Graybill Conference!

The Executive Committee of this Joint Conference has been working hard to put together a very strong program

including 7 short courses, two Plenary Presentations, one Plenary Leadership Forum, 94 scientific sessions and

social events. The keynote speech will be delivered by Professor Susan Murphy (University of Michigan), the

Graybill Plenary speech will be presented by Professor Richard Davis (Columbia University). Three panelists

(Professor Xiao-Li Meng representing academia, Dr. Greg Campbell representing government, and Dr. Janet

Wittes representing industry/business) will participate in the Plenary Leadership Forum moderated by Dr. Wei

Shen, who is the 2015 President of ICSA. The scientific program mainly covers two broad statistical application

areas – those topics relating to bio-pharmaceutical applications, and topics which are non-bio-pharmaceutical

applications. We hope this Joint Conference will provide abundant opportunities for you to engage, learn and

network. We also hope you will be able to obtain inspirations to advance old research ideas and to develop new

ones. We sincerely believe this will be a memorable and worthwhile learning experience for you. Social events in this 2015 Joint Conference include mixer (Sunday, June 14 evening), banquet (Tuesday, June

16 – banquet speaker will be Dr. Howard Wainer) and three excursion programs. If you come to Colorado, you

may not want to miss the opportunities of enjoying the various activities below. In June, Fort Collins has daily high temperatures that range from 74°F to 85°F with evening temperatures in the

low 50s. Nestled next to the foothills, it is home to Colorado State University and is only an hour away from

Rocky Mountain National Park. Often called the “Napa Valley of Craft Beer,” Fort Collins is home to a number

of microbreweries that together produce one of the largest city volumes of craft brewed beer in the country.

Convenient access, clear water, challenging rapids and beautiful scenery make the Cache la Poudre a rafter's

paradise from May through September.

Thanks for coming to the 2015 ICSA/Graybill Joint Conference at Fort Collins, Colorado. Naitee Ting, Chair, Executive Committee, 2015 ICSA/Graybill Joint Conference

Page 10: Published by: International Chinese Statistical

2 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Committee

Naitee Ting (Chair) Boehringer-Ingelheim Pharmaceuticals, Inc.

Scott Evans (Program Committee Chair) Harvard School of Public Health

Jim zumBrunnen (Local Committee Co-Chair) Colorado State University

Haonan Wang (Local Committee Co-Chair) Colorado State University

Yingqi Zhao (Treasurer) University of Wisconsin-Madison

Xiaoming Li (Fund Raising Chair) Gilead Pharmaceuticals

Local Committee Jim zumBrunnen (Co-Chair) Colorado State University

Haonan Wang (Co-Chair) Colorado State University

Program Committee Scott Evans (Chair) Harvard School of Public Health

Invited Session Committee Greg Wei (Co-Chair) SynteractHCR

Jay Breidt (Co-Chair) Colorado State University

Chunming Zhang University of Wisconsin-Madison

Jun Zhu University of Wisconsin-Madison

Contributed Session Committee Peng-Liang Zhao (Chair) Sanofi-aventis U.S. LLC.

Poster Session Committee Jun Yan (Chair) University of Connecticut

Yu Cheng University of Pittsburgh

Haoda Fu Eli Lilly and Company

Rui Song North Carolina State University

Chengguang Wang Johns Hopkins University

Student Paper Award Committee Shuangge Ma (Chair) Yale University

Gang Li University of California, Los Angeles

Tiejun Tong HongKong Baptist University

Ray Liu, Takeda

Lili Yu GoergiaSouthern University

Richard McNally Covance Inc.

Short Courses Committee Brian Wiens (Chair) Portola Pharmaceuticals

Ivan Chan Merck & Co.

Page 11: Published by: International Chinese Statistical

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 3

Committee

Program Book Committee Fang Yu (Chair) University of Nebraska Medical Center

Fang Qiu University of Nebraska Medical Center

Casey Blaser University of Nebraska Medical Center

Fei Jiang University of Nebraska Medical Center

Zhezhen Jin Columbia University

Tian Zheng Columbia University

Mengling Liu New York University

Proceeding Book Committee Jianchang Lin (Chair) Takeda

Bushi Wang Boehringer-Ingelheim Pharmaceuticals, Inc.

Xiaowen Hu Colorado State University

Kun Chen University of Connecticut

Lan Huang U.S. Food and Drug Administration

Fund Raising Committee Xiaoming Li (Chair) Gilead Pharmaceuticals

Guojun Yuan EMD Serono, Merck KGaA

Ranye Sun Bank of America

Jingyang Zhang Fred Hutchinson Cancer Research Center

Treasurers Yingqi Zhao (Treasure) University of Wisconsin-Madison

Wen Zhou (Assistant Treasurer) Colorado State University

Webmasters Jim zumBrunnen (Conference Web) Colorado State University

Simon Gao (ICSA Web) BioPier Inc.

Page 12: Published by: International Chinese Statistical

The 2015 ICSA/Graybill joint conference program committees gratefully

acknowledge the invaluable and generous support of our sponsors and Ex-

hibitors.

Sponsors

Exhibitors

CRC Press — Taylor & Francis Group

The Lotus Group LLC

Springer Science & Business Media

Page 13: Published by: International Chinese Statistical

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 5

Lory Student Center, Colorado State University

Floor Plans (2nd Floor)

Page 14: Published by: International Chinese Statistical

6 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Lory Student Center, Colorado State University

Floor Plans (3rd Floor)

Page 15: Published by: International Chinese Statistical

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 7

Short Program Sunday June 14, 2015

Time Room Session 7:30 AM – 5:00 PM 3rd floor Foyer Registration

8:00 AM – 5:00 PM 304 Short Course: Measurement error

8:00 AM – 5:00 PM 306 Short Course: Prevention and treatment of missing data: turning

guidance into practice

8:00 AM – 5:00 PM 308 Short Course: Practical Bayesian computation

8:00 AM – 12:00PM 310 Short Course: Graphical approaches to multiple test problems

8:00 AM – 12:00PM 300 Short Course: Network based analysis of big data

9:45 AM – 10:15 AM Grand Ballroom C/D Break

12:00 PM – 1:00 PM LSC Food Court Lunch for Full-Day Course Attendees

1:00 PM – 5:00 PM 300 Short Course: Classification and regression trees and forests

1:00 PM – 5:00 PM 310 Short Course: Patient-Reported Outcomes: measurement,

implementation and interpretation

2:45 PM – 3:15 PM Grand Ballroom C/D Break

6:00 PM – 8:30 PM Longs Peak ICSA Board Meeting (Dinner – Invited only)

7:00 PM – 9:00 PM Grand Ballroom C/D Opening Mixer

Monday June 15, 2015 8:00 AM – 5:00 PM 3rd floor Foyer Registration

8:20 AM – 8:40 AM Grand Ballroom A/B Welcome Naitee Ting, Conference Chair

Wei Shen, President ICSA

Dean Jan Nerger, College of Natural Sciences, CSU

8:40 AM – 9:40 AM Grand Ballroom A/B Keynote: Susan Murphy, University of Michigan

9:40 AM – 10:00 AM Grand Ballroom C/D Break

10:00 AM – 11:40 AM See Program Parallel sessions

11:40 AM – 1:00 PM Lunch on own

1:00 PM – 2:40 PM See Program Parallel sessions

2:40 PM – 3:00 PM Grand Ballroom C/D Break

3:00 PM – 4:40 PM See Program Parallel sessions

4:40 PM – 6:00 PM Grand Ballroom C/D Poster presenters

Tuesday June 16, 2015 8:00 AM – 5:00 PM 3rd floor Foyer Registration

8:40 AM – 9:40 AM Grand Ballroom A/B Graybill Plenary: Richard Davis, Columbia University

9:40 AM – 10:00 AM Grand Ballroom C/D Break

10:00 AM – 11:40 AM See Program Parallel sessions

11:40 AM – 1:00 PM Lunch on own

1:00 PM – 2:40 PM See Program Parallel sessions

2:40 PM – 3:00 PM Grand Ballroom C/D Break

3:00 PM – 4:00 PM Grand Ballroom A/B Leadership Forum 6:00 PM Grand Ballroom C/D Cash Bar

6:30 PM – 9:00 PM Grand Ballroom C/D Banquet (fee event)

Wednesday June 17, 2015 8:00 AM – 12:30 PM 3rd floor Foyer Registration

8:40 AM – 10:20 AM See Program Parallel sessions

10:20 AM – 10:40 AM Grand Ballroom C/D Break

10:40 AM – 12:20 PM See Program Parallel sessions

1:30 PM Excursion (fee event)

Page 16: Published by: International Chinese Statistical

8 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Keynote Speaker

Susan Murphy H.E. Robbins Professor of Statistics & Professor of

Psychiatry

University of Michigan

Susan Murphy’s research focuses on improving

sequential, individualized, decision making in health, in

particular on clinical trial design and data analysis to

inform the development of adaptive interventions (e.g.

treatment algorithms). She is a leading developer of the

Sequential Multiple Assignment Randomized Trial

(SMART) design which has been and is being used by

clinical researchers to develop adaptive interventions in

depression, alcoholism, treatment of ADHD, substance

abuse, HIV treatment, obesity, diabetes, and autism. She

collaborates with clinical scientists, computer scientists

and engineers and mentors young clinical scientists on

developing adaptive interventions. Susan is currently

working as part of several interdisciplinary teams to

develop clinical trial designs and learning algorithms for

settings in which patient information is collected in real

time (e.g. via smart phones or other wearable devices) and

thus sequences of interventions can be individualized

online. She is a Fellow of IMS, ASA, the College on

Problems in Drug Dependence, a former editor of the

Annals of Statistics, a member of the Institute of Medicine

and a 2013 MacArthur Fellow.

Title: Experimental Design, Data Analysis Methods for Mobile Interventions

Location and Time: Grand Ballroom A/B, Monday June 15, 8:40-9:40 AM

Abstract: Micro-randomized trials are trials in which individuals are randomized 100's or 1000's of times

over the course of the study. The goal of these trials is to assess the impact of momentary interventions, e.g.

interventions that are intended to impact behavior over small time intervals. A fast growing area of mHealth

concerns the use of mobile devices for both collecting real-time data, for processing this data and for providing

momentary interventions. We discuss the design and analysis of these types of trials.

Page 17: Published by: International Chinese Statistical

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 9

Graybill Plenary Speaker

Richard Davis Chair, Howard Levene Professor of Statistics

Columbia University

Richard Davis is Chair and Howard Levene Professor of Statistics at

Columbia University. He is currently president-elect of the Institute

of Mathematical Statistics. He received his Ph.D. degree in

Mathematics from the University of California at San Diego in 1979

and has held academic positions at MIT, Colorado State University,

and visiting appointments at numerous other universities. Recently

he was Hans Fischer Senior Fellow at the Technical University of

Munich and Villum Kan Rasmussen Visiting Professor at the

University of Copenhagen. Davis is a fellow of the Institute of Mathematical Statistics and the American

Statistical Association, and is an elected member of the International Statistical Institute. He is co-author (with

Peter Brockwell) of the bestselling books, "Time Series: Theory and Methods", "Introduction to Time Series and

Forecasting", and the time series analysis computer software package, "ITSM2000". Together with Torben

Andersen, Jens-Peter Kreiss, and Thomas Mikosch, he co-edited the "Handbook in Financial Time Series." In

1998, he won (with collaborator W.T.M Dunsmuir) the Koopmans Prize for Econometric Theory.

He has served on the editorial boards of major journals in probability and statistics and most recently was Editor-

in-Chief of the Bernoulli Journal, 2010-2012. He has advised/co-advised 31 PhD students and has presented short

courses on time series and heavy-tailed modeling. His research interests include time series, applied probability,

extreme value theory, and spatial-temporal modeling.

Title: Sparse Vector Autoregressive Modeling

Location and Time: Grand Ballroom A/B, Tuesday Jun16, 8:40-9:40 AM

Abstract: The vector autoregressive (VAR) model has been widely used for modeling temporal dependence

in a multivariate time series. For large (and even moderate) dimensions, the number of VAR parameters can be

prohibitively large resulting in noisy estimates and difficult-to-interpret temporal dependence. As a remedy, we

propose a methodology for fitting sparse VAR models (sVAR) in which most of the autoregressive coefficients

are set equal to zero. The first step in selecting the nonzero coefficients is based on an estimate of the partial

squared coherency (PSC) together with the use of BIC. The PSC is useful for quantifying conditional relationships

between marginal series in a multivariate time series. A refinement step is then applied to further reduce the

number of parameters. The performance of this 2-step procedure is illustrated with both simulated data and

several real examples. The inclusion of a reduced rank covariance estimator of the noise will also be

discussed. (This is joint work with Pengfei Zang and Tian Zheng).

Page 18: Published by: International Chinese Statistical

10 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Leadership Forum Panelists Gregory Campbell Director, Division of Biostatistics, Office of Surveillance and Biometrics, Center for Devices

and Radiological Health, FDA

Gregory Campbell is the Director of the Division of Biostatistics in the Office of Surveillance and

Biometrics (OSB) of Center for Devices and Radiological Health (CDRH) of the Food and Drug

Administration (FDA) since he came to FDA in 1995 from a tenured scientist positions at NIH. Dr.

Campbell leads a group of about 60 statisticians at the FDA that provides statistical support to CDRH

as a whole and, in particular, the statistical reviews of FDA’s pre-market device submissions. With

the help of statisticians in his Division, he pioneered the implementation in a regulatory environment

of Bayesian statistics (and more recently propensity scores and adaptive designs). He is an Associate

Editor for the journal Statistics in Pharmaceutical Research. He has been the recipient of the FDA’s

Commendable Service Award, Award of Merit and Outstanding Service Award as well as the CDRH

Outstanding Scientific Award for Excellence in Analytical Science and the CDRH Diversity Award. He has been a member for over

ten years of the Senior Biomedical Research Service in the Department of Health and Human Services and been a Fellow of the American

Statistical Association since 1998. He has served in leadership positions for the Eastern North American Region of the International

Biometric Society and on the Board of Directors of the Society for Clinical Trials and has been instrumental in the recent establishment

of the Medical Device and Diagnostics Section of the American Statistical Association. He gave a keynote address at ICSA Applied

Statistics Symposium in Indianapolis in 2010.

Xiao-Li Meng Dean, Harvard University Graduate School of Arts and Sciences

Xiao-Li Meng, Dean of the Harvard University Graduate School of Arts and Sciences (GSAS),

Whipple V. N. Jones Professor and former chair of Statistics at Harvard, is well known for his depth

and breadth in research, his innovation and passion in pedagogy, and his vision and effectiveness in

administration, as well as for his engaging and entertaining style as a speaker and writer. Meng has

received numerous awards and honors for the more than 120 publications he has authored in at least a

dozen theoretical and methodological areas, as well as in areas of pedagogy and professional

development; he has delivered more than 400 research presentations and public speeches on these

topics, and he is the author of “The XL-Files," a regularly appearing column in the IMS (Institute of

Mathematical Statistics) Bulletin. His interests range from the theoretical foundations of statistical

inferences (e.g., the interplay among Bayesian, frequentist, and fiducial perspectives; quantify

ignorance via invariance principles; multi-phase and multi-resolution inferences) to statistical methods

and computation (e.g., posterior predictive p-value; EM algorithm; Markov chain Monte Carlo; bridge

and path sampling) to applications in natural, social, and medical sciences and engineering (e.g.,

complex statistical modeling in astronomy and astrophysics, assessing disparity in mental health services, and quantifying statistical

information in genetic studies). Meng received his BS in mathematics from Fudan University in 1982 and his PhD in statistics from

Harvard in 1990. He was on the faculty of the University of Chicago from 1991 to 2001 before returning to Harvard as Professor of

Statistics, where he was appointed department chair in 2004 and the Whipple V. N. Jones Professor in 2007. He was appointed GSAS

Dean on August 15, 2012.

Janet Wittes President, Statistics Collaborative, Inc.

Janet Wittes, PhD is President of Statistics Collaborative, Inc. which she founded in 1990. One of the

main activities of Statistics Collaborative is to serve as the statistical reporting group for independent

data monitoring committees. Previously, she was Chief, Biostatistics Research Branch, National Heart,

Lung, & Blood Institute (1983–89). Her 2006 monograph, “Statistical Monitoring of Clinical Trials –

A Unified Approach” by Proschan, Lan, and Wittes, deals with sequential trials. Her research has

focused on design of randomized clinical trials, capture-recapture methods in epidemiology, and

sample size recalculation. She has served on a variety of advisory committees and data monitoring

committees for government (NHLBI, the VA, and NCI) and industry. For the FDA, she has been a

regular member of the Circulatory Devices Advisory Panel and has served as an ad hoc member of

several other panels. Currently, she is a regular member of the Gene Therapy Advisory Committee.

She was formerly Editor in Chief of Controlled Clinical Trials (1994-98). She received her Ph.D. in

Statistics from Harvard University.

Page 19: Published by: International Chinese Statistical

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 11

Student Award Winners

Jiann-Ping Hsu Pharmaceutical and

Regulatory Sciences Student Paper

Award

Yang Ni, Rice University Title: Bayesian Nonlinear Model

Selection for Gene Regulatory Networks

Session 82: The Jiann-Ping Hsu Invited

Session on Biostatistical and Regulatory

Sciences (Virginia Dale, level 3)

Time: Tuesday, June 16th 10:00 AM -

11:40 AM

ASA Bio-pharmaceutical Awards

Qingning Zhou, University of Missouri

Title: A Sieve Semiparametric

Maximum Likelihood Approach for

Regression Analysis of Bivariate

Interval-censored Failure Time Data

Session 64: Recent Development in

Personalized Medicine and Survival

Analysis (ASCSU Senate Chamber,

Level 2)

Time: Monday, June 15th 1:00 PM –

2:40 PM

Wei Ding, University of Michigan

Title: Composite Likelihood Approach

in Gaussian Copula Regression Models

with Missing Data

Session 24: Recent Developments in

Missing Data Analysis (386, level 3)

Time: Wednesday, June 17th 10:40AM-

12:20PM

ICSA Student Paper Awards Yinfei Kong, University of Southern

California

Title: Innovated Interaction Screening

for High-Dimensional Nonlinear

Classification

Session 7: Scalable Multivariate

Statistical Learning with Massive Data

(386, level 3)

Time: Monday, June 15th 1:00 PM –

2:40 PM

Yuan Huang, Pennsylvania State

University

Title: Projection Test for High-

Dimensional Mean Vectors with

Optimal Direction

Session 91: Recent Developments of

High-Dimensional Data Inference and

Its Applications (306, level 3)

Time: Wednesday, June 17th 10:40 AM

– 12:20 PM

Weichen Wang, Princeton University

Title: Projected Principal Component

Analysis in Factor Models

Session 91: Recent Developments of

High-Dimensional Data Inference and

Its Applications (306, level3)

Time: Wednesday, June 17th 10:40 AM

– 12:20 PM

Chongliang Luo, University of

Connecticut

Title: Canonical Variate Regression

Session 27: Bayesian Applications in

Biomedical Studies (310, level 3)

Time: Wednesday, June 17th 10:40AM-

12:20PM

Yanping Liu, Temple University

Title: A New Approach to Multiple

Testing of Grouped Hypotheses Congratulations!

Page 20: Published by: International Chinese Statistical

12 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Social Events

Opening

Mixer Sunday June 14, 7-9 PM

Grand Ballroom C/D

Cash Bar Tuesday June 16, 6 PM Grand Ballroom C/D

Banquet Tuesday June 16

6:30-9:00 PM

Grand Ballroom C/D

Excursions Departure location and time: CSU/ Hilton, Wednesday June 17 1:30 PM

Option 1 (5 hour tour): Rocky Mountain National Park The excursion by bus includes admittance into Rocky Mountain National Park. Spectacular

views are available along the highest continuous paved road in North America, which crests at

the Continental Divide. The highest point on Trail Ridge road is 12,183 feet.

Price: $50 (transportation and box lunch included)

Option 2 (3 hour tour): Brewery Tour in Fort Collins The tour and tasting is at New Belguim Brewery, which is Fort Collins’ largest

microbrewery. Fort Collins has been called the Napa Valley of Craft Beer, and is home to the

most brewers and microbreweries per capita in all of Colorado.

Price: $25 (Transportation included)

Option 3 (5 hour tour): Rafting the Poudre River This is whitewater rafting on Colorado's only Wild & Scenic river, Cache La Poudre. Our

outfitter/guide will be Wanderlust Adventure of Fort Collins. Many fun and continuous rapids

like Pinball, Roller Coaster, the Squeeze, Slideways and Headless will be traversed.

Price: $95 (Transportation and box lunch; paddle jackets, fleece pullovers life jacket &

helmet included) + $10-12 (Optional; for Wetsuits)

Page 21: Published by: International Chinese Statistical

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 13

Banquet Speaker

Howard Wainer Distinguished Research Scientist, National Board of Medical Examiners

Howard Wainer was born in Brooklyn New York, on October 26, 1943. He

received a BS in mathematics from Rensselaer Polytechnic Institute in 1965

and an AM and PhD from Princeton in psychometrics in 1967 and 1968

respectively. He taught at The University of Chicago before moving to the

Bureau of Social Science Research during the Carter administration. He was

a Principal Research Scientist at ETS for 21 years before assuming his

current position as Distinguished Research Scientist at the National Board of

Medical Examiners. From 2003 until 2013 he was (adjunct) Professor of

Statistics at the Wharton School of the University of Pennsylvania. He has

published more than 400 articles and chapters in scholarly journals and

books; his 20th book, Medical Illuminations: Using Evidence, Visualization &

Statistical thinking to Improve Healthcare was published by Oxford

University Press in 2014 and was a finalist for the Royal Society Winton

Book Prize. His next book Truth or Truthiness: Distinguishing Fact from

Fiction by Learning to Think like a Data Scientist will be published by

Cambridge University Press next year.

He is a Fellow of the American Statistical Association and the American Educational Research Association and has been

the recipient of numerous awards including:

ACT/AERA E. F. Lindquist Award for Outstanding Research in Testing & Measurement, 2015.

American Educational Research Association Significant Contribution to Educational Measurement & Research

Methodology Award, 2014.

Psychometric Society Lifetime Achievement Award, 2013.

The Samuel J. Messick Award for Distinguished Scientific Contributions from Division 5 of the American

Psychological Association, 2009.

Career Achievement Award for Contributions to Educational Measurement. National Council on Measurement in

Education, April 2007.

Award for Scientific Contribution to a Field of Educational Measurement for the development of Testlet

Response Theory, National Council on Measurement in Education, April 2006.

Senior Scientist Award, Educational Testing Service, 1990-1992

Title: Pictures at an exhibition: Sixteen visual conversations about one thing

Time and Location: Grand Ballroom C/D, Tuesday June 16 6:30 PM

Abstract: In 1951 the famous graphic designer Will Burtin presented a graphic showing the efficacy of three

antibiotics in treating 16 bacteria. In this talk we will examine Burtin’s solution as well as 15 others. We will find that

that there are many paths to salvation, but that had the data been displayed differently, important discoveries could have

been accelerated by decades.

Page 22: Published by: International Chinese Statistical

14 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Short Courses

1. Measurement Error

Presenters: John Buanoccorsi, Professor Emeritus,

Dept. of Mathematics and Statistics, University of

Massachusetts Amherst.

Email: [email protected]

Course length: One day

Outline/Description

Measurement error is ubiquitous and it is well known that the

inability to exactly measure predictors in regression problems

often leads to biased estimators and invalid inferences. This

problem has a long history in linear problems but saw an

explosion of interest over the last twenty years as methods were

expanded to both deal with more complex models and address

a number of practical problems that arise in practice. The

methodology has been successfully, and widely used across a

wide range of disciplines, most notably (but certainly not

limited to) Epidemiology.

This course will present an introductory, and relatively applied,

look at measurement error in regression settings including

linear and nonlinear models, the latter including generalized

linear models and more explicitly logistic regression. The goal

of the course is to introduce attendees to models used for

measurement error, the impacts of measurement error on so-

called naive analyses, which ignore it, and provide an extensive

overview of the myriad techniques available to correct for it,

along with associated inferences. We deal both with the case of

additive error, in which case the measurement error parameters

are usually estimated through replication, as well as non-

additive error, where validation data (either internal or external)

is exploited to correct for measurement error. Detailed

examples will be provided from a variety of disciplines and,

although the course does not have a computer component

associated with it, an overview of available software and its use

will be presented. Time permitting, we will briefly discuss

measurement error in mixed/longitudinal models and time

series.

Students should have some prior exposure to basic

mathematical statistics and have familiarity with regression

models, including seeing models and methods expressed in

matrix-vector form.

References

Buonaccorsi (2010), "Measurement Error: Models, Methods

and Applications''; Chapman & Hall.

About the presenter

John Buonaccorsi is Professor Emeritus of Mathematics and

Statistics at the University of Massachusetts-Amherst. He

received his M.S. and Ph.D. degrees from Colorado State

University and has been at the University of Massachusetts

since 1982. He was a long-time member of the University’s

Statistical Consulting Center and coordinator of the graduate

options in Statistics for many years. He is the author of over 70

articles and book chapters and is author of the 2010 book

“Measurement Error: Models, Methods and Applications”, part

of the Chapman-Hall series on interdisciplinary statistics. His

original research interests were in optimal experimental design,

estimation of ratios and calibration, followed by a focus on

measurement error, an area he has worked in for over 25 years.

He has also published extensively in various applied areas

including quantitative ecology, with a recent emphasis on

population dynamics. He has a long-standing collaboration with

colleagues at the University of Oslo Medical School addressing

measurement error methods in epidemiologic contexts.

2. Prevention and Treatment of Missing Data: Turning

Guidance into Practice

Presenters: Craig Mallincrokdt, Geert Mohlenbergs, Bohdana

Ratitch, Lei Xu, et al

Email: [email protected]

Course Length: One day

Outline/Description:

Recent research has fostered new guidance on preventing and

treating missing data in clinical trials. This short course is based

on work from the Drug Information Association’s Scientific

Working Group (DIASWG) on Missing Data. The first half-

day of the course begins with an overview of the research and

other background that fostered the new guidance, including a

brief history of the work by the National Research Council

Expert Panel on missing data that provided detailed advice to

FDA on the prevention and treatment of missing data. The first

half day will also distill common elements from recent guidance

into 3 pillars: 1) setting clear objectives; 2) minimizing missing

data; and, 3) pre-specifying a sensible primary analyses and

appropriate sensitivity analyses. Specific means for putting the

guidance into action are proposed, including detailed coverage

of developing an overall analytic road map. In the second half-

day, specific software tools developed by the DIASWG to

implement the analytic road map will be demonstrated on an

example data set. Attendees will be provided with these

programs at no cost and encouraged to run the programs

concurrent with the demonstration. Several DIASWG

members will be available to assist attendees in running the

programs.

Learning objectives

1) Understand the three pillars of preventing and treating

missing data, with emphasis on developing a complete

analytic road map that includes a sensible primary analysis

and appropriate sensitivity analyses.

2) Be able to apply the three pillars principles to their own

research

Page 23: Published by: International Chinese Statistical

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 15

Short Courses

3) Understand the theory behind key sensitivity analyses and

be able to run the macros developed by the DIASWG that will

be given free of charge to attendees.

About the presenters

Dr. Mallinckrodt received his PhD in 1993 from Colorado State

University, where he subsequently held a joint appointment in

the departments of statistics and clinical sciences. Craig joined

Lilly in 1998 and has extensive drug development experience

covering all four clinical phases in multiple therapeutic areas.

Dr. Mallinckrodt has published extensively on missing data. He

led the PhRMA expert team and currently leads the Drug

Information Association Scientific Working Group on missing

data. Dr. Mallinckrodt is a Fellow of the American Statistical

Association and recently won the Royal Statistical Society’s

award for excellence in the pharmaceutical industry for his

book titled A Practical Guide to the Prevention and Treatment

of Missing Data.

Dr. Geert Molenberghs is Professor of Biostatistics at the

Universiteit Hasselt and Katholieke Universiteit Leuven in

Belgium. He received the B.S. degree in mathematics (1988)

and a Ph.D. in biostatistics (1993) from the Universiteit

Antwerpen. Dr Molenberghs published methodological work

on surrogate markers in clinical trials, categorical data,

longitudinal data analysis, and on the analysis of non-response

in clinical and epidemiological studies. He served as Joint

Editor for Applied Statistics (2001-2004), Co-editor for

Biometrics (2007--2009) and as President of the International

Biometric Society (2004-2005). He currently is Co-editor for

Biostatistics (2010--). He was elected Fellow of the American

Statistical Association and received the Guy Medal in Bronze

from the Royal Statistical Society. He has held visiting

positions at the Harvard School of Public Health (Boston, MA).

He is founding director of the Center for Statistics at Hasselt

University and currently the director of the Interuniversity

Institute for Biostatistics and statistical Bioinformatics, I-

BioStat, a joint initiative of the Hasselt and Leuven universities.

Geert Molenberghs and Geert Verbeke are editor and author of

several books on longitudinal data analysis, possibly subject to

missingness (Springer Lecture Notes 1997, Springer Series in

Statistics 2000, Springer Series in Statistics 2005, Chapman

Hall/CRC 2007), and they have taught well over a hundred

short and longer courses on the topic in universities as well as

industry, in Europe, North America, Latin America, and

Australia. Geert Verbeke and Geert Molenberghs received

several Excellence in Continuing Education awards for courses

offered at the Joint Statistical Meetings.

Dr. Bohdana Ratitch is Senior Statistical Scientist at Quintiles.

Bohdana received a Ph.D. degree in computer science/statistical

learning from McGill University in Montreal, Canada in 2005.

Bohdana has been working in biostatistics and clinical trials for

over 10 years. Missing data in clinical trials is one of her areas

of special interest and she is working actively to advance and

promote knowledge in this filed in the clinical research

community. She is a member of the DIA Special Working

Group on Missing Data and a co-author of the book “Clinical

Trials with Missing Data: A Guide for Practitioners” (Wiley,

2014).

Dr. Lei Xu is Principal Biostatistician at Biogen. Lei received

his Ph.D degree in Statistics from University of Wisconsin-

Madison and have 8 years of experience on late phase drug

development. He is a member of both ICSA and ASA. He

previously served as missing data Hub leader at Eli Lilly and

has been actively served in the Drug Information Association

Scientific Working Group on missing data since 2012.

3. Practical Bayesian Computation

Presenter: Fang Chen, Bob Lucas, SAS Institute, Inc.

Email: [email protected]

Course length: One Day

Outline/Description

This one-day course reviews the basic concepts of Bayesian

inference and focuses on the practical use of Bayesian

computational methods. The objectives are to familiarize

statistical programmers and practitioners with the essentials of

Bayesian computing, and to equip them with computational

tools through a series of worked-out examples that demonstrate

sound practices for a variety of statistical models and Bayesian

concepts.

The first part of the course will review differences between

classical and Bayesian approaches to inference, fundamentals

of prior distributions, and concepts in estimation. The course

will also cover MCMC methods and related simulation

techniques, emphasizing the interpretation of convergence

diagnostics in practice. The rest of the course will take a topic-

driven approach that introduces Bayesian simulation, analysis,

and illustrates the Bayesian treatment of a wide range of

statistical models using software with code explained in detail.

The course will present major applications areas and case

studies, including multi-level hierarchical models, multivariate

analysis, non-linear models, meta-analysis, latent variable

models, and survival models. Special topics that are discussed

include Monte Carlo simulation, sensitivity analysis, missing

data, model assessment and selection, variable subset selection,

and prediction. The examples will be done using SAS (PROC

MCMC), with a strong focus on technical details.

Attendees should have a background equivalent to an M.S. in

applied statistics. Previous exposure to Bayesian methods is

useful but not required. Familiarity with material at the level of

this text book is appropriate: Probability and Statistics (Addison

Wesley), DeGroot and Schervish.

About the presenter

Fang Chen (PhD in Statistics from Carnegie Mellon University

in 2004) is Senior Manager of Bayesian Statistical Modeling in

Page 24: Published by: International Chinese Statistical

16 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Short Courses

Advanced Analytics Division at SAS Institute Inc. Among his

responsibilities are the development of Bayesian analysis

software and the MCMC procedure. He has written about

Bayesian modeling using the MCMC procedure and taught

courses and tutorials on practical Bayesian computation.

4. Graphical Approaches to Multiple Test Problems.

Presenter: Dong Xi, Statistical Methodologist, Novartis.

Email: [email protected]

Course length: Half Day

Outline/Description

Methods for addressing multiplicity are becoming increasingly

more important in clinical trials and other applications.

Examples of such study objectives include investigation of

multiple doses or regimens of a new treatment, multiple

endpoints, subgroup analyses or any combination of these. This

short course will provide a practical guidance on how to

construct multiple testing procedures (MTPs) for such

hypotheses with an emphasis on graphical approaches.

Course outline

1. Introduction to multiple testing procedures

In the first part of this course, we will introduce the concept of

multiplicity and its impact on scientific research. To deal with

multiplicity issues, we will discuss basic concepts of MTPs

including the error rate, adjusted p-values and single-step and

stepwise procedures. Common MTPs such as Bonferroni,

Holm, Hochberg and Dunnett will be introduced and compared.

We will describe the closure principle and closed testing

procedures as an important way to construct MTPs.

2. Graphical approaches to multiple testing

In the second part of the course, we will focus on graphical

approaches that can be applied to common multiple test

problems. Using graphical approaches, one can easily construct

and explore different test strategies and thus tailor the test

procedure to the given study objectives. The resulting multiple

test procedures are represented by directed, weighted graphs,

where each node corresponds to an elementary hypothesis,

together with a simple algorithm to generate such graphs while

sequentially testing the individual hypotheses. We also present

one case study to illustrate how the approach can be used in

clinical practice. The presented methods will be illustrated

using the graphical user interface from the gMCP package in R,

which is freely available on CRAN.

Textbook/References

• Bretz, F., Hothorn, T., and Westfall, P. (2010) Multiple

Comparisons with R. Chapman and Hall, Boca Raton.

• Bretz, F., Maurer, W., Maca, J (2014) Graphical approaches

to multiple testing. Young, W. and Chen, D. (eds.), Clinical

Trial Biostatistics and Biopharmaceutical Applications, Taylor

& Francis.

• Dmitrienko, A., Tamhane, A. C. and Bretz, F. (Eds.) (2009)

Multiple Testing Problems in Pharmaceutical Statistics.

Chapman & Hall/CRC Biostatistics Series, Boca Raton.

About the presenter

Dong Xi is a statistical methodologist in the Statistical

Methodology and Consulting Center at Novartis

Pharmaceuticals Corporation. He received his Ph.D. in

Statistics from Northwestern University before joining

Novartis. He has been supporting the design and analysis of

clinical trials across different therapeutic areas. His research

interest includes multiplicity issues, dose finding and missing

data. 5. Network Based Analysis of Big Data

Presenter: Shuangge Steven Ma, Yale University.

Email: [email protected]

Course Length: Half day

Outline/Description

With the fast development in data collection and storage

techniques, big data are now routinely encountered in

biomedicine, engineering, social science, and many other

scientific fields. In many of the existing analyses, the

interconnections among functional units have not been

sufficiently accounted for, leading to a loss of efficiency or even

failures of many statistical models. Recently, network-based

analysis has emerged as an effective analysis tool for modeling

big data. In this short course, we will survey the newly

developed network-based analysis methods for big data, with

an emphasis on methodological development and applications.

Topics to be covered will include:

Background of network analysis, including motivating

examples from biomedicine and social science.

A brief survey of network construction methods.

Experiment-based, statistical, and hybrid methods will be

introduced. We will introduce network construction

algorithms, their rationale, and software implementation.

Incorporating network information in statistical modeling.

With big data, the two main analysis paradigms are

marginal analysis and joint analysis. For each paradigm,

we will introduce multiple recently-developed statistical

methods, their rationale, and software implementation.

Demonstrating examples from biomedicine will be

provided, showing the practical impact of network

analysis.

Page 25: Published by: International Chinese Statistical

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 17

Short Courses

Network analysis of samples. A representative example is

social network. Another example is the recently proposed

concept of “human disease network”.

We will introduce concepts and analysis methods and show data

analysis examples. After taking the course, audiences are

expected to have a good understanding of (a) the “big picture”

of analyzing big data using network-based methods, (b) a set of

recently proposed methods, and (c) their software

implementation. The demonstrating examples will be from

multiple scientific fields and expected to be tightly related to

daily practice of the audience.

Intended audience will include researchers from academia,

pharmaceutical companies, consulting firms, and government

agencies as well as advanced graduate students. Prerequisites:

master-level training in statistics or a related field; generic

knowledge of big data; knowledge of statistical software

especially R will be a plus but not required.

About the presenter

Shuangge (Steven) Ma, PhD in Statistics from University of

Wisconsin in 2004, is an associate professor in biostatistics at

Yale University. His research interests include analysis of

high-dimensional data, genetic epidemiology, cancer studies,

and health economics. He has published two books and over

100 journal articles. He has served as associate editor of

multiple journals. He is an elected member of ISI and fellow

of ASA.

6. Classification and Regression Trees and Forests

Presenters: Wei-Yin Loh, University of Wisconsin

Email: [email protected]

Course length: Half day

Outline/Description

It is more than 50 and 30 years since AID (Morgan and Sonquist

1963) and CART (Breiman et al 1984) appeared. Rapidly

increasing use of trees among practitioners has led to great

advances in algorithmic research over the last two

decades. Modern tree models have higher prediction accuracy

and do not have selection bias. They can fit linear models in

the nodes using GLM, quantile, and other loss functions;

response variables may be multivariate, longitudinal, or

censored; and classification trees can employ linear splits and

fit kernel and nearest-neighbor node models.

The course begins with examples to compare tree and

traditional models. Then it reviews the major algorithms,

including AID, CART, C4.5, CHAID, CRUISE, CTREE,

GUIDE, M5, MOB, and QUEST. Real data are used to illustrate

the features of each, and results on prediction accuracy and

model complexity versus forests and some machine learning

methods are presented. Examples are drawn from business,

science, and industry, and include applications to subgroup

identification for personalized medicine, missing value

imputation in surveys, and differential item functioning in

educational testing. Relevant software is mentioned where

appropriate. Attendees should be familiar with multivariate

analysis at the level of Johnson and Wichern's "Applied

Multivariate Statistical Analysis."

Course Outline:

The target audience is statistical researchers and practitioners

from academia, business, government, and industry. The

course is particularly useful for people who routinely analyze

large and complex datasets and who want to know the latest

advances in algorithms and software for classification and

regression tree methods.

About the presenter

Wei-Yin Loh is Professor of Statistics at the University of

Wisconsin, Madison. He has been developing algorithms for

classification and regression trees for thirty years. He is the co-

author (with his students) of the FACT, QUEST, CRUISE, and

LOTUS algorithms and the author of GUIDE

(www.stat.wisc.edu/~loh/guide.html). Versions of his QUEST

algorithm are implemented in IBM SPSS and Dell Statistica.

Dr. Loh is a fellow of the American Statistical Association and

the Institute of Mathematical Statistics and a consultant to

government and industry. He is a recipient of the Benjamin

Reynolds Award for teaching, the U.S. Army Wilks Award for

statistics research and application, and an Outstanding Science

Alumni Award from the National University of Singapore. He

has supervised the thesis research of twenty nine PhDs to date.

7. Patient-Reported Outcomes: Measurement,

Implementation and Interpretation

Presenter: Joseph C. Cappelleri, Pfizer, Inc.

Email: [email protected]

Course length: Half day

Outline/Description:

This half-day short course provides an exposition on health

measurement scales – specifically, on patient-reported

outcomes based on the instructor’s co-authored book. Some key

elements in the development of a patient-reported outcome

(PRO) instrument are noted. Highlighted here is the importance

of the conceptual framework used to depict the relationship

between items in a PRO instrument and the concepts measured

by it. The core topics of validity and reliability are discussed.

Validity, which is assessed in several ways, provides the

evidence and extent that the PRO taps into the concept that it is

purported to measure in a particular setting. Reliability of a

PRO instrument involves its consistency or reproducibility as

Page 26: Published by: International Chinese Statistical

18 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Short Courses assessed by internal consistency and test-retest reliability.

Exploratory factor analysis and confirmatory factor analysis are

described as techniques to understand the underlying structure

of a PRO measure with multiple items.

While most of the presentation centers on psychometrics from

a classical test theory perspective, attention is also given to item

response theory as an approach to scale development and

evaluation. Cross-sectional analysis and longitudinal analysis

of PRO scores are covered. Also covered is the topic of

mediation modeling as a way to identify and explain the

mechanism that underlies an observed relationship between an

independent variable and a dependent variable via the inclusion

of a third explanatory variable, known as a mediator variable.

Variations of missing data for PRO measures are highlighted, as is the topic of multiple testing. Finally, approaches to

interpret PRO results are elucidated in order to make these

results useful and meaningful. Illustrations are provided mainly

through real-life examples and also through simulated examples

using SAS.

Textbook/References:

Cappelleri JC, Zou KH, Bushmakin AG, Alvir JMJ,

Alemayehu D, Symonds T. Patient-Reported Outcomes:

Measurement, Implementation and Interpretation. Boca

Raton, Florida: Chapman & Hall/CRC Press. 2013.

Cappelleri JC, Bushmakin AG. Interpretation of patient-

reported outcomes. Statistical Methods in Medical Research.

2014; 23:460-483.

Cappelleri JC, Althof SE, Siegel RL, Shpilsky A, Bell SS,

Duttagupta S. Development and validation of the Self-Esteem

And Relationship (SEAR) questionnaire in erectile

dysfunction. International Journal of Impotence Research.

2004; 16:30-38.

Cella D, Li JZ, Cappelleri JC, Bushmakin A, Charbonneau C,

Kim ST, Chen I, Michaelson MD, Motzer RJ. Quality of life

in patients with metastatic renal cell carcinoma treated with

sunitinib versus interferon-alfa: Results from a phase III

randomized trial. Journal of Clinical Oncology. 2008;

26:3763-3769.

Fairclough DL. Design and Analysis of Quality of Life

Studies in Clinical Trials. 2nd edition. Boca Raton, FL:

Chapman & Hall/CRC Press. 2010.

Food and Drug Administration (FDA). 2009. Guidance for

industry on patient-reported outcome measures: Use in

medical product development to support labeling claims.

Federal Register 74(235):65132–65133.

http://www.fda.gov/downloads/Drugs/GuidanceComplianceR

egulatoryInformation/Guidances/UCM193282.pdf.

Patrick DL, Burke LB, Gwaltney CH, Kline Leidy N, Martin

ML, Molsen E, Ring L. Content validity—Establishing and

reporting the evidence in newly developed patient reported

outcomes (PRO) instruments for medical product evaluation:

ISPOR PRO good research practices task force report: Part

1—Eliciting concepts for a new PRO instrument. Value in

Health. 2011; 14:967–977.

Patrick DL, Burke LB, Gwaltney CH, Kline Leidy N. Martin

ML, Molsen E, Ring L. Content Validity—Establishing and

reporting the evidence in newly developed patient reported

outcomes (PRO) instruments for medi- cal product

evaluation: ISPOR PRO good research practices task force

report: Part 2—Assessing respondent understanding. Value in

Health. 2011; 14:978–988.

Russell IJ, Crofford LJ, Leon T, Cappelleri JC, Bushmakin

AG, Whalen E, Barrett JA, Sadosky A. The effects of

pregabalin on sleep disturbance symptoms among individuals

with fibromyalgia syndrome. Sleep Medicine. 2009; 10:604-

610.

About the presenter:

Joseph C. Cappelleri earned his M.S. in statistics from the City

University of New York (Baruch College), Ph.D. in

psychometrics from Cornell University, and M.P.H. in

epidemiology from Harvard University. In June 1996, Joe

joined Pfizer Inc as a statistical scientist collaborating with

Outcomes Research and is a senior director of biostatistics at

Pfizer. He is also an adjunct professor of biostatistics at Brown

University, adjunct professor of statistics at the University of

Connecticut, and adjunct professor of medicine at Tufts

Medical Center. A Fellow of the American Statistical

Association, and chair of its Health Policy Statistics Section,

Joe has delivered numerous conference presentations and has

published extensively on clinical and methodological topics,

including regression-discontinuity designs, meta-analysis, and

health measurement scales. He is the lead author of the book

“Patient-Reported Outcomes: Measurement, Implementation

and Interpretation.”

Page 27: Published by: International Chinese Statistical

Bin Yu University of California

Berkeley

David Madigan

Columbia University

Keynote Speakers

For inquiries, please email:

Dr. Yichuan Zhao

[email protected]

25th ICSA Applied Statistics

Symposium

June 12-15, 2016 Hyatt Regency Hotel

265 Peachtree Street NE Atlanta, GA 30303

Page 28: Published by: International Chinese Statistical

20 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

10th ICSA International Conference

The 10th ICSA International Conference

Shanghai, China, December 19-22, 2016

The 10th ICSA International Conference will be held at Xuhui campus of Shanghai Jiao Tong University (SJTU), Shanghai,

China, during December 19-22, 2016. The theme of this conference is to promote global growth of modern statistics in the

21st century. The purpose of this conference is to bring statisticians from all over the world to Shanghai, China, which is the

financial, trade, information and shipping center of China, to share cutting-edge research, discuss emerging issues in the

field of modern probability and statistics with novel applications, and network with colleagues from all parts of the world.

James O. Berger of Duke University, Tony Cai of University of Pennsylvania, Kai-Tai Fang of Beijing Normal University

– Hong Kong Baptist University United International College (UIC), Zhi-Ming Ma of the Academy of Math and Systems

Science, CAS, Marc A. Suchard of the UCLA Fielding School of Public Health and David Geffen School of Medicine at

UCLA, Lee-Jen Wei of Harvard University, and C. F. Jeff Wu of Georgia Institute of Technology will deliver keynote

presentations. There will be a special session in honor of the receipt(s) of the second Pao-Lu Hsu award. In addition, there

will be ample of invited and contributed sessions. All participants including invited speakers are responsible for paying

registration fees and booking hotel rooms directly from the hotels listed on the conference website.

The scientific program committee of the 2016 ICSA International Conference, co-chaired by Ming-Hui Chen of University

of Connecticut, Zhi Geng of Peking University, and Gang Li of University of California at Los Angeles, welcomes the

submission of invited session proposals. The deadline for submitting invited session proposals is May 1, 2016. All of the

invited session proposals are sent via email to Ming-Hui Chen at [email protected]. For conference logistics,

please directly contact Dong Han and Weidong Liu, the co-chairs of the local organizing committee. All inquiries should

be sent to Ms. Limin Qin at [email protected]. Please visit the conference website

http://www.math.sjtu.edu.cn/conference/2016icsa/

for more detailed information.

All of you are welcome to participant in this important ICSA conference and to visit Shanghai, one of beautiful and historical

cities in the world during December 19-22, 2016.

Page 29: Published by: International Chinese Statistical

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 21

ICSA Banquet at JSM 2015

Announcement: ICSA Annual Members Banquet at JSM 2015

The ICSA will be holding a banquet on August 12, 2015, at 7:30pm, at the China Harbor Restaurant, 2040

Westlake Avenue North, Seattle, WA 98109 (206-286-1688, http://www.chinaharborseattle.com/). China

Harbor is a family owned restaurant that offers the best in authentic Chinese cuisine located on the shores of

Lake Union. Charter busses will be provided to transport ICSA members from the Convention Center to the

restaurant. The banquet menu will include:

Appetizer Plate (海景四拼盘)/ Seafood Hot and Sour Soup (海鲜酸辣汤)/ Honey Walnut Prawns (核桃虾)/ General

Tso’s Chicken (左宗鸡) / Sweet & Sour Pork (甜酸肉) / Broccoli Beef(芥兰牛)/ Mixed Vegetables (素什锦)/ Fu

Chow Seafood Fried Rice (福州海鲜烩饭)/ Steamed Fish Filet in Wine Sauce (清蒸鱼片). Complementary soda &

tea will be served as well as Season Fresh Fruit. A full service bar will be available and the restaurant has live

music and dance.

Ying Q. Chen, ([email protected])

Page 30: Published by: International Chinese Statistical

CONTEMPORARYCLINICAL TRIALS

COMMUNICATIONS

www.elsevier.com/locate/issn/24518654

EDITORS:Dr. Zhezhen JinColumbia University, New York, NY USADr. Zheng SuDeerfield Institute, New York, NY USA

A NEWOPENACCESSSISTER JOURNAL OFCONTEMPORARYCLINICALTRIALSACCEPTING RESEARCHONBOTH RANDOMIZEDANDNON-RANDOMIZEDTRIALS

A NEWOPENACCESS JOURNAL

ES 3017 Contemporary Clinical Trials Comms Leaflet.qxp_Contemporary Clinical Trials Comms A4 Leaflet 27/05/2015 14:49 Page 1

Page 31: Published by: International Chinese Statistical

AIMS AND SCOPE

www.elsevier.com/locate/issn/24518654

Contemporary Clinical Trials Communications is an international peerreviewed open access journal that publishes articles pertaining to allaspects of clinical trials, including, but not limited to, design,conduct, analysis, regulation and ethics. Manuscripts submittedshould appeal to a readership drawn from a wide range ofdisciplines including medicine, life science, pharmaceutical science,biostatistics, epidemiology, computer science, managementscience, behavioral science, and bioethics.

Contemporary Clinical Trials Communications is unique in that it isoutside the confines of disease specifications, and it strives toincrease the transparency of medical research and reducepublication bias by publishing scientifically valid original researchfindings irrespective of their perceived importance, significance orimpact. Both randomized and non-randomized trials are within thescope of the Journal. Some common topics include trial designrationale and methods, operational methodologies and challenges,and positive and negative trial results. In addition to originalresearch, the Journal also welcomes other types of communicationsincluding, but are not limited to, methodology reviews, perspectivesand discussions.Through timely dissemination of advances in clinical trials, the goalof Contemporary Clinical Trials Communications is to serve as aplatform to enhance the communication and collaboration withinthe global clinical trials community that ultimately advances thisfield of research for the benefit of patients.

TO SUBMIT AMANUSCRIPT AND FOR

MORE INFORMATION, VISIT

EDITORS-IN-CHIEFZhezhen JinColumbia University, New York, NY, USAZheng SuDeerfield Institute, New York, NY, USA

ASSOCIATE EDITORSJulian AbramsColumbia University, New York, NY, USAVance BergerNational Institute of Health, Rockville, MD, USACindy CooperUniversity of Sheffield, Sheffield, UKBrian EverittKing's College, London, London, UKLuis Garcia-OrtizUniversity of Salamanca, Salamanca, SpainPei HeGenentech Inc, South San Francisco, CA, USALi-Shan HuangNational Tsinghua University, Taiwan, Hsinchu City,TaiwanYunzhi LinAbbVie, North Chicago, IL, USAXiaolong LuoCelgene Corporation, Summit, NJ, USAPrakash SatwaniColumbia University, New York, NY, USAConsolato SergiUniversity of Alberta, Edmonton, AB, CanadaYu ShenThe University of Texas, Houston, TX, USASay Beng TanNational University of Singapore, SingaporeCorrine VoilsDuke University, Durham, NC, USAXiaonan XueYeshiva University, Bronx, NY, USAAnny-Yue YinRoche (China) Holding Ltd., Shanghai, ChinaMing ZhuAbbVie, North Chicago, IL, USAChristos ZouboulisDessau Medical Center, Dessau, Germany

ES 3017 Contemporary Clinical Trials Comms Leaflet.qxp_Contemporary Clinical Trials Comms A4 Leaflet 27/05/2015 14:49 Page 2

Page 32: Published by: International Chinese Statistical

Monday, June 15. 10:00 AM-11:40 AM Scientific Program (�Presenting Author)

Scientific Program (June 15th - June 17th)

Monday, June 15. 8:20 AM - 9:40 AM

Keynote Session (Keynote)Room: Grand Ballroom A/B, Level 2Organizers: Executive Committee of the 2015 ICSA/Graybill JointConference.Chair: Naitee Ting, Boehringer-Ingelheim Pharmaceuticals Inc..

8:20 AM WelcomeNaitee Ting, Conference ChairWei Shen, President ICSADean Jan Nerger, College of Natural Sciences, ColoradoState University

8:40 AM Keynote LectureSusan Murphy. University of Michigan

9:40 AM Floor Discussion.

Monday, June 15. 10:00 AM-11:40 AM

Session 4: New Techniques for Functional and LongitudinalData Analysis (Invited)Room: 300, level 3Organizer: Guanqun Cao, Auburn University.Chair: Guanqun Cao, Auburn University.

10:00 AM Variable Selection Methods for Functional Regression Mod-elsNedret Billor. Auburn University

10:25 AM Structured Functional Principal Component Analysis in Mul-tilevel Functional Mixed Models for Physical Activity Data�Haochang Shou1, Vadim Zipunnikov2, CiprianCrainiceanu2 and Sonja Greven3. 1University of Penn-sylvania 2Johns Hopkins University 3Ludwig-Maximilians-Universit at Munchen

10:50 AM Exploration of Diurnal Patterns in Maize Leaf with RNA-sequencing DataWen Zhou1, �Peng Liu2, Lin Wang3 and ThomasBrutnell4. 1Colorado State University 2Iowa State Uni-versity 3Monsanto company 4Donald Danforth Plant ScienceCenter

11:15 AM Partial and Tensor Quantile Regressions in Functional DataAnalysis�Dengdeng Yu, Linglong Kong and Ivan Mizera. Universityof Alberta

11:40 AM Floor Discussion.

Session 5: Recent Advancements in Statistical MachineLearning (Invited)Room: 304, level 3Organizers: Jinyuan Chang, University of Melbourne; Wen Zhou,Colorado State University.Chair: Jinyuan Chang, University of Melbourne.

10:00 AM Sparse CCA: Minimax Rates and Adaptive Estimation�Chao Gao1, Zongming Ma2 and Harrison Zhou1. 1YaleUniversity 2University of Pennsylvania

10:25 AM Asymptotic Normality in Estimation of Large Ising Graphi-cal Models�Zhao Ren1, Cun-Hui Zhang2 and Harrison Zhou3.1University of Pittsburgh 1University of Pittsburgh 2RutgersUniversity 3Yale University

10:50 AM Optimal Tests of Independence with Applications to TestingMore Structures�Fang Han1 and Han Liu2. 1Johns Hopkins University2Princeton University

11:15 AM Bootstrap Tests on High Dimensional Covariance Matriceswith Applications to Understanding Gene Clustering�Wen Zhou1, Jinyuan Chang2 and Wenxin Zhou2.1Colorado State University 2University of Melbourne

11:40 AM Floor Discussion.

Session 15: Innovative Statistical Approaches in NonclinicalResearch (Invited)Room: 308, level 3Organizer: Alan Chiang, Eli Lilly and Company.Chair: Grace Li, Eli Lilly and Company.

10:00 AM Identifying Predictive Biomarkers in A Dose-ResponseStudy�Yuefeng Lu, Xiwen Ma and Wei Zheng. Sanofi-aventis U.S.LLC.

10:25 AM Bayesian Integration of In Vitro Biomarker Data to In VivoSafety Assessment�Ming-Dauh Wang and Alan Chiang. Eli Lilly and Com-pany

10:50 AM Functional Structural Equation Model for DTI Derived Re-sponses in Twin Study�Shikai Luo1, Hongtu Zhu2 and Rui Song1. 1North Car-olina State University 2The University of North Carolina atChapel Hill

11:15 AM Estimating Contamination Rates from Matched Tumor-normal Exome Sequencing Data�Hyonho Chun1 and Xiwen Ma2. 1Purdue University 2EliLilly and Company

11:40 AM Floor Discussion.

Session 37: Statistical Methods for Large Computer Experi-ments (Invited)Room: Grey Rock, level 2Organizer: Thomas Lee, University of Carlifornia, Davis.Chair: Chun-Yip Yau, The Chinese University of Hong Kong.

24 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 33: Published by: International Chinese Statistical

Scientific Program (�Presenting Author) Monday, June 15. 10:00 AM-11:40 AM

10:00 AM Uncertainty Propagation using Dynamic Discrepancy for aMulti-scale Carbon Capture System�K. Sham Bhat1, Curt Storlie1, David Mebane2 andPriyadarshi Mahapatra3. 1Los Alamos National Labora-tory 2West Virginia University 3URS Corporation

10:25 AM Bayesian Calibration of Computer Models with InformativeFailures�Peter Marcy and Curtis Storlie. Los Alamos National Lab-oratory

10:50 AM A Frequentist Approach to Computer Model Calibration�Raymond K. W. Wong1, Curtis B. Storlie2 and Thomas C.M. Lee3. 1Iowa State University 2Los Alamos NationalLaboratory 3University of California, Davis

11:15 AM Floor Discussion.

Session 40: Use of Biomarker and Genetic Data in Drug De-velopment (Invited)Room: 310, level 3Organizer: Xiaoxi Li, Biogen Idec.Chair: Yuan Wang, M. D. Anderson Cancer Center.

10:00 AM An Overview of Statistical Methods in Biomarker Evaluation�Dawei Liu, John Zhong, Kimberly Crimin, lakshmi Amar-avadi, Xiaoxi Li, Stacy Lindborg and Donald Johns. BiogenIdec

10:25 AM A Case Study of Integrating Scientific Knowledge with Sta-tistical Biomarker AnalysisSheng Feng. Biogen Idec

10:50 AM Intratumor Genetic Heterogeneity Analysis and Its Implica-tions in Personalized Medicine�Ronglai Shen and Venkatraman Seshan. Memorial Sloan-Kettering Cancer Center

11:15 AM Floor Discussion.

Session 42: New Methodology in Spatial and Spatio-Temporal Data Analysis (Invited)Room: 386, level 3Organizer: Yehua Li, Iowa State University.Chair: Zhengyuan Zhu, Iowa State University.

10:00 AM Estimation of Spatial Variation in Disease Risk from Uncer-tain Locations Using SIMEXDale Zimmerman. University of Iowa

10:25 AM Bayesian Estimates of CMB Gravitational LensingEthan Anderes. University of California, Davis

10:50 AM Bayesian Functional Data Models for Coupling High-dimensional LiDAR and Forest Variables over Large Geo-graphic Domains�Andrew Finley1, Sudipto Banerjee2, Yuzhen Zhou1 andBruce Cook3. 1Michigan State University 2University ofCalifornia, Los Angeles 3National Aeronautics and SpaceAdministration

11:15 AM Spatial Bayesian Hierarchical Model for Small Area Estima-tion of Categorical DataXin Wang1, Emily Berg1, �Zhengyuan Zhu1, Dongchu Sun2

and Gabriel Demuth1. 1Iowa State University 2Universityof Missouri-Columbia

11:40 AM Floor Discussion.

Session 48: Trends and Innovation in Missing Data Sensitiv-ity Analyses (Invited)Room: 312, level 3Organizer: Craig Mallinckrodt, Eli Lilly and Company.Chair: Lei Xu, Biogen Idec.

10:00 AM Missing Data Sensitivity Analyses for Continuous EndpointsUsing Controlled ImputationsCraig Mallinckrodt. Eli Lilly and Company

10:25 AM Sensitivity Analysis for Time-to-event Endpoints�Bohdana Ratitch, Ilya Lipkovich and Michael O’Kelly.Quintiles, Inc.

10:50 AM Analysis and Sensitivity Analysis of Incomplete CategoricalDataGeert Molenberghs1,2. 1Universiteit Hasselt 2KatholiekeUniversiteit Leuven

11:15 AM Discussant: H.M. James Hung, U.S. Food and Drug Admin-istration

11:40 AM Floor Discussion.

Session 60: Toward More Effective Identification ofBiomarkers and Subgroups for Development of TailoredTherapies (Invited)Room: 324, level 3Organizer: Lei Shen, Eli Lilly and Company.Chair: Yu Kong, Eli Lilly and Company.

10:00 AM Confidence Intervals for Assessing SNP Effects on Treat-ment Efficacy�Jason Hsu1, Ying Ding2, Grace Li3 and Steve Ruberg3.1The Ohio State University 2University of Pittsburgh 3EliLilly and Company

10:25 AM Identification of Biomarker Signatures Using Adaptive Elas-tic Net�Xuemin Gu1, Lei Shen2 and Yaoyao Xu3. 1Bristol-MyersSquibb Company 2Eli Lilly and Company 3AbbVie Inc.

10:50 AM Correcting Ascertainment Bias in Biomarker IdentificationShengchun Kong. Purdue University

11:15 AM Analysis Optimization for Biomarker and Subgroup Identifi-cationLei Shen. Eli Lilly and Company

11:40 AM Floor Discussion.

Session 69: Recent Developments in Empirical LikelihoodMethodologies: Diagnostic Studies, Goodness-of-Fit Test-ing, and Missing Values (Invited)Room: 372//374, level 3Organizers: Dongliang Wang, SUNY Upstate Medical University;Lili Tian, University at Buffalo .Chair: Dongliang Wang, SUNY Upstate Medical University.

10:00 AM Jackknife Empirical Likelihood Confidence Regions for theEvaluation of Continuous-scale Diagnostic Tests with Verifi-cation BiasBinhuan Wang1 and �Gengsheng Qin2. 1New York Univer-sity 2Georgia State University

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 25

Page 34: Published by: International Chinese Statistical

Monday, June 15. 1:00 PM - 2:40 PM Scientific Program (�Presenting Author)

10:25 AM Jackknife Empirical Likelihood Goodness-Of-Fit Tests ForVector U-statisticsFei Tan1, Qun Lin2, Wei Zheng1 and �Hanxiang Peng1.1Indiana University-Purdue University 2Eli Lilly and Com-pany

10:50 AM Jackknife Empirical Likelihood Interval Estimators for theGini Index�Dongliang Wang1, Yichuan Zhao2 and Dirk Gilmore2.1SUNY Upstate Medical University 2Georgia State Univer-sity

11:15 AM Jackknife Empirical Likelihood Inference with RegressionImputation and Survey Data�Ping-Shou Zhong1 and Sixia Chen2. 1Michigan State Uni-versity 2Westat

11:40 AM Floor Discussion.

Session 79: Recent Developments on Combining Inferencesand Hierarchical Models (Invited)Room: 306, level 3Organizer: Min-Ge Xie, Rutgers University.Chair: Cen Wu, Yale University.

10:00 AM Statistical Issues in Health Related Quality of Life researchMounir Mesbah. University Pierre et Marie Curie

10:25 AM ROC-based Meta Analysis with Individual Level Informa-tionLu Tian1, �Ying Lu1, Peter Countryman2, Julie Dicarlo2 andCharles Peterfy2. 1Stanford University 2Spire Sciences

10:50 AM Combining Nonparametric Inferences Using Data Depth andConfidence Distribution�Dungang Liu1, Regina Liu2 and Minge Xie2. 1Universityof Cincinnati 2Rutgers University

11:15 AM Latent Quality Models for Document Networks�Linda Tan1, Aik Hui Chan2 and Tian Zheng1. 1ColumbiaUniversity 2Naitional University of Singapore

11:40 AM Floor Discussion.

Session 80: Recent Advances in Development and Evaluationof Predictive Biomarkers (Invited)Room: 376//378, level 3Organizers: Haiwen Shi, U.S. Food and Drug Administration;Jingjing Ye, U.S. Food and Drug Administration.Chair: Xiaojing Wang, University of Connecticut.

10:00 AM Identifying Optimal Biomarker Combinations for TreatmentSelection through Randomized Controlled TrialsYing Huang. Fred Hutchinson Cancer Research Center

10:25 AM The Challenge in Making Inference about a Biomarker’s Pre-dictive CapacityHolly Janes. Fred Hutchinson Cancer Research Center

10:50 AM A Potential Outcomes Framework for Evaluating PredictiveBiomarkers�Zhiwei Zhang1, Lei Nie1, Guoxing Soon1 and Aiyi Liu2.1U.S. Food and Drug Administration 2National Institutes ofHealth

11:15 AM Discussant: Greg Campbell, U.S. Food and Drug Adminis-tration

11:40 AM Floor Discussion.

Session 81: What Are the Expected Professional BehaviorsAfter Statistics Degrees (Invited Panel)Room: 382, level 3Organizers: Bin Yu, University of California, Berkeley; Haoda Fu,Eli Lilly and Company.Chair: Haoda Fu, Eli Lilly and Company.

Panelists: Richard Davis, Columbia University

Susan Murphy, University of Michigan

Jean Opsomer, Colorado State University

11:40 AM Floor Discussion.

Session 90: Adaptive Designs and Personalized Medicine(Invited)Room: 322, level 3Organizer: Yingqi Zhao, University of Wisconsin-Madison.Chair: Yingqi Zhao, University of Wisconsin-Madison.

10:00 AM Interpretable and Parsimonious Treatment Regimes UsingDecision ListsYichi Zhang, �Eric Laber, Anastasios Tsiatis and Marie Da-vidian. North Carolina State University

10:25 AM Regression Analysis for Cumulative Incidence Function un-der Two-stage RandomizationIdil Yavuz1, �Yu Cheng2 and Abdus Wahed2. 1Dokuz EylulUniversity 2University of Pittsburgh

10:50 AM Optimal, Two Stage, Adaptive Enrichment Designs for Ran-domized Trials, using Sparse Linear Programming�Michael Rosenblum1, Xingyuan (Ethan) Fang2 and HanLiu2. 1Johns Hopkins University 2Princeton University

11:15 AM Floor Discussion.

Session C02: Design and Analysis of Clinical Trials(Contributed)Room: Virginia Dale, level 3Organizer: Peng-Liang Zhao, Sanofi-aventis U.S. LLC..Chair: Bo Huang, Pfizer Inc..

10:00 AM Sample Size Re-Estimate of BE Studies with Adaptive De-signPeng Roger Qu. Pfizer China R&D Center

10:15 AM Sequential Phase II Clinical Trial Design for MolecularlyTargeted Agents�Yong Zang1 and Ying Yuan2. 1Florida Atlantic University2M. D. Anderson Cancer Center

10:30 AM On Sensitivity Analysis for Missing Data using Control-based ImputationFrank Liu. Merck & Co.

10:45 AM Choosing Covariates for Adjustment in Non-Inferiority Tri-als Based on Influence and Disparity�Katherine Nicholas, Viswanathan Ramakrishnan and Va-lerie Durkalski. Medical University of South Carolina

26 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 35: Published by: International Chinese Statistical

Scientific Program (�Presenting Author) Monday, June 15. 1:00 PM - 2:40 PM

11:00 AM Statistical Assessment for Establishing Biosimilarity inFollow-On Biological Product�Jung-Tzu Liu1,2, Hsiao-Hui Tsou2,3, Chin-Fu Hsiao2,Yi-Hsuan Lai4, Chi-Tian Chen2, Wan-Jung Chang2 andChyng-Shyan Tzeng1. 1National Tsing Hua University2National Health Research Institutes 3China Medical Uni-versity 4Foxconn International Company

11:15 AM Floor Discussion.

Monday, June 15. 1:00 PM - 2:40 PM

Session 1: Best Practices for Delivery of Adaptive ClinicalTrials Illustrated with Case Studies (Invited)Room: 382, level 3Organizer: Zoran Antonijevic, Cytel Inc..Chair: Zoran Antonijevic, Cytel Inc..

1:00 PM DIA Adaptive Design Scientific Working Group Best Prac-tices Team: Objectives and Case StudiesEva Miller. inVentiv Health Clinical

1:25 PM An Adaptive Phase 3 Trial Resulting in FDA Approval ofCrofelemer�Lingyun Liu1, Zoran Antonijevic1, Cyrus Mehta1, PravinChaturvedi2 and Scott Harris2. 1Cytel Inc. 2Salix Pharma-ceuticals

1:50 PM Promising Zone Design; Methodology, Strategy, and Imple-mentationZoran Antonijevic. Cytel Inc.

2:15 PM Floor Discussion.

Session 2: Chemistry, Manufacturing, and Controls (CMC)in Pharmaceuticals: Current Statistical Challenges I(Invited)Room: 310, level 3Organizers: Richard K. Burdick, Amgen Inc.; Jorqe Quiroz, Abb-Vie Inc..Chair: Richard Burdick, Amgen Inc..

1:00 PM Statistical Methods for Analytical ComparabilityLeslie Sidor. Amgen Inc.

1:25 PM How Type I Error Impacts Quality System EffectivenessJeff Gardner. DataPharm Statistical & Data ManagementServices

1:50 PM Alternative Procedures for Shelf Life Estimation UtilizingMixed Models�Michelle Quinlan1, Walt Stroup2 and Dave Christopher3.1Novartis Pharmaceutical Corporation 2University ofNebraska-Lincoln 3Merck & Co.

2:15 PM Discussant: Laura Pack, Amgen Inc.

2:40 PM Floor Discussion.

Session 7: Scalable Multivariate Statistical Learning withMassive Data (Invited)Room: 386, level 3Organizer: Kun Chen, University of Connecticut.Chair: Kun Chen, University of Connecticut.

1:00 PM False Discovery Control under Unknown DependenceJianqing Fan1 and �Xu Han2. 1Princeton University2Temple University

1:25 PM Sparse CCA: Adaptive Estimation and Computational Barri-ersChao Gao1, �Zongming Ma2 and Harrison Zhou1. 1YaleUniversity 2University of Pennsylvania

1:50 PM A Class of Accelerated MM Algorithms for Scalable Opti-mizationYiyuan She. Florida State University

2:15 PM Innovated Interaction Screening for High-Dimensional Non-linear ClassificationYingying Fan, �Yinfei Kong, Daoji Li and Zemin Zheng.University of Southern California

2:40 PM Floor Discussion.

Session 9: SII Special Invited Session on Modern BayesianStatistics I (Invited)Room: 300, level 3Organizers: Ming-Hui Chen, University of Connecticut; HepingZhang, Yale University.Chair: Ming-Hui Chen, University of Connecticut.

1:00 PM Binary State Space Mixed Models with Flexible Link Func-tions: a Case Study on Deep Brain Stimulation on AttentionReaction TimeCarlos Abanto-Valle1, Dipak Dey2 and �Xun Jiang3.1Universidade Federal do Rio de Janeiro 2University of Con-necticut 3Amgen Inc.

1:25 PM Bayesian Semi-parametric Joint Modeling of BiomarkerData with a Latent Changepoint: Assessing the Tempo-ral Performance of Enzyme-Linked Immunosorbent Assay(ELISA) Testing for Paratuberculosis�Michelle Norris1, Wesley Johnson2 and Ian Gardner3.1California State University, Sacramento 2University of Cal-ifornia, Irvine 3University of Prince Edward Island

1:50 PM Inference Functions in High-Dimensional Bayesian Infer-enceJuhee Lee1 and �Steven Maceachern2. 1University of Cali-fornia, Santa Cruz 2The Ohio State University

2:15 PM Quantile Regression for Censored Mixed-Effects Modelswith Applications to HIV studies�Victor Hugo Lachos Davila1, Ming-Hui Chen2, Carlos A.Abanto-Valle3 and Caio L. Azevedo1. 1University of Camp-inas 2University of Connecticut 3Universidade Federal doRio de Janeiro

2:40 PM Floor Discussion.

Session 16: Statistical Advances for Genetic Data Analysis(Invited)Room: 312, level 3Organizer: Yuehua Cui, Michigan State University.Chair: Ping-Shou Zhong, Michigan State University.

1:00 PM Incorporating External Information to Improve Case-controlGenetic Association AnalysesHong Zhang1, Nilanjan Chatterjee2 and �Jinbo Chen3.1Fudan University 2National Institutes of Health 3Universityof Pennsylvania

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 27

Page 36: Published by: International Chinese Statistical

Monday, June 15. 1:00 PM - 2:40 PM Scientific Program (�Presenting Author)

1:25 PM Generalized Partial Linear Varying Index Coefficient Modelfor Gene-Environment Interactions�Xu Liu, Bin Gao and Yueyua Cui. Michigan State Univer-sity

1:50 PM Set-valued System Identification Approach to IdentifyingGenetic Variants in Sequencing StudiesGuolian Kang. St. Jude Children’s Research Hospital

2:15 PM A Penalized Robust Semiparametric Approach for Gene-Environment Interactions�Cen Wu1, Xingjie Shi2, Yuehua Cui3 and Shuangge Ma1.1Yale University 2Nanjing University of Finance and Eco-nomics 3Michigan State University

2:40 PM Floor Discussion.

Session 19: Recent Developments in the Theory and Applica-tions of Spatial Statistics (Invited)Room: 304, level 3Organizer: Juan Du, Kansas State University.Chair: Zhengyuan Zhu, Iowa State University.

1:00 PM Estimating a Low Rank Covariance Matrix for Spatial Data�Siddhartha Nandy, Chae-Young Lim and Tapabrata Maiti.Michigan State University

1:25 PM Computational Instability of Spatial Covariance Matrices�Wei-Ying Wu1 and Chae young Lim2. 1National Dong HwaUniversity 2Michigan State University

1:50 PM Statistical Method for Change-set AnalysisJun Zhu. University of Wisconsin-Madison

2:15 PM Floor Discussion.

Session 22: Clinical Trials with Multiple Objectives: Maxi-mizing the Likelihood of Success (Invited)Room: 308, level 3Organizers: Toshimitsu Hamasaki, National Cerebral and Cardio-vascular Center, Japan; Chin-Fu Hsiao, National Health ResearchInsititutes, Tawain.Chair: Chin-Fu Hsiao, National Health Research Insititutes, Tai-wan.

1:00 PM Statistical Challenges in Testing Multiple Endpoints in Com-plex Trial Designs�H.M. James Hung and Sue-Jane Wang. U.S. Food andDrug Administration

1:25 PM Group-Sequential Clinical Trials When Considering Multi-ple Outcomes as Co-Primary Endpoints�Toshimitsu Hamasaki1, Scott Evans2 and Koko Asakura1.1National Cerebral and Cardiovascular Center 2HarvardUniversity

1:50 PM Sample Size Determination for a Specific Region in Multi-regional Clinical Trials with Multiple Co-Primary Endpoints�Chin-Fu Hsiao1, Wong-Shian Huang1 and ToshimitsuHamasaki2. 1National Health Research Institutes 2NationalCerebral and Cardiovascular Center

2:15 PM Fallback Tests for Co-primary Endpoints�Robin Ristl1, Florian Frommlet1, Armin Koch2 and MartinPosch1. 1Medical University of Vienna 2Hannover MedicalSchool

2:40 PM Floor Discussion.

Session 26: Challenges in Analyzing Complex Data Using Re-gression Modeling Approaches (Invited)Room: 324, level 3Organizers: Fang-Chi Hsu, Wake Forest University School ofMedicine; Wei-Ting Hwang, University of Pennsylvania.Chair: Jun Yan, University of Connecticut.

1:00 PM Goodness-of-Fit Tests of Finite Mixture Regression ModelsJunwu Shen1, �Shou-En Lu2, Yong Lin2, Weichung joe Shih2

and Junfeng (Jim) Zhang3. 1Novartis Pharmaceutical Cor-poration 2Rutgers University 3Duke University

1:25 PM Comparing Methods of Modeling Individual Infancy GrowthCurves�Rui Xiao1, Sani Roy2, Alessandra Chesi2, Frank Mentch2,Rosetta Chiavacci2, Jonathan Mitchell1 and AndreaKelly2. 1University of Pennsylvania 2Children’s Hospitalof Philadelphia

1:50 PM Alzheimer’s Disease Early Prediction and Imaging GeneticsAnalyses Based on Large Scale Regularization�Fang-Chi Hsu, Mark Espeland and Ramon Casanova.Wake Forest University School of Medicine

2:15 PM Discussant:Wei-Ting Hwang, University of Pennsylvania

2:40 PM Floor Discussion.

Session 30: Tensor-Structured Statistical Modelling and In-ferences (Invited)Room: 376//378, level 3Organizer: Su-Yun Huang, Institute of Statistical Science,Academia Sinica.Chair: Yanyuan Ma, University of South Carolina.

1:00 PM Dimenstion Reduction for Tensor Structure DataI-Ping Tu. Institute of Statistical Science, Academia Sinica

1:25 PM Detection of Gene-Gene Interactions Using Tensor Regres-sionHung Hung1, Yu-Ting Lin2, Penweng Chen3, Chen-Chien Wang4, Su-Yun Huang2 and �Jung-Ying Tzeng5.1National Taiwan University 2Institute of Statistical Science,Academia Sinica 3National Chung Hsing University 4YahooInc. 5North Carolina State University

1:50 PM Rank Selection for Multilinear PCADai-Ni Hsieh, �Su-Yun Huang and I-Ping Tu. Institute ofStatistical Science, Academia Sinica

2:15 PM Discussion: Tensor-Structured Statistical Modelling and In-ferencesMong-Na Lo Huang. National Sun Yat-sen University

2:40 PM Floor Discussion.

Session 41: New Frontier of Functional Data Analysis (Invited)Room: 306, level 3Organizer: Yehua Li, Iowa State University.Chair: Yehua Li, Iowa State University.

28 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 37: Published by: International Chinese Statistical

Scientific Program (�Presenting Author) Monday, June 15. 1:00 PM - 2:40 PM

1:00 PM Making Patient-specific Treatment Decisions Based onFunctional and Imaging Data�Todd Ogden1, Adam Ciarleglio2, Eva Petkova2 and Thad-deus Tarpey3. 1Columbia University 2New York University3Wright State University

1:25 PM Quantifying Connectivity in Resting State fMRI with Func-tional Data AnalysisJinjiang He1, Xiaoke Zhang2, Owen Carmichael1, �Jane-Ling Wang1 and Hans-Georg Mueller1. 1University of Cal-ifornia, Davis 2University of Delaware

1:50 PM Optimal Estimation for the Functional Cox Model�Simeng Qu1, Jane-Ling Wang2 and Xiao Wang1. 1PurdueUniversity 2University of California, Davis

2:15 PM Robust and Gaussian Adaptive Mixed Models for CorrelatedFunctional Data, with Application to Event-Related PotentialData�Hongxiao Zhu1 and Jeffrey Morris2. 1Virginia Tech 2M.D. Anderson Cancer Center

2:40 PM Floor Discussion.

Session 54: Recent Development in Epigenetic Research(Invited)Room: Grey Rock, level 2Organizer: Yongseok Park, University of Pittsburgh.Chair: Yongseok Park, University of Pittsburgh .

1:00 PM A Hidden Markov Random Field Based Bayesian Methodfor the Detection of Long-range Chromosomal Interactionsin Hi-C DataZheng Xu1, Guosheng Zhang1, Fulai Jin2, Chen Mengjie1,Terry Furrey1, Patrick Sullivan1, Yun Li1, and �Ming Hu3.1The University of North Carolina at Chapel Hill 2CaseWestern Reserve University 3New York University

1:25 PM Base-resolution Methylation Patterns Accurately PredictTranscription Factor Bindings In Vivo�Tianlei Xu, Ben Li, Meng Zhao, Keith E. Szulwach, R. CraigStreet, Li Lin, Bing Yao, Feiran Zhang, Peng Jin, Hao Wu andZhaohui Qin . Emory University

1:50 PM Statistical Analysis of Illumina HumanMethylation450BeadArrays�Jie Liu and Kimberly Siegmund. University of SouthernCalifornia

2:15 PM Differential Methylation Analysis for BS-seq Data underGeneral Experimental Design�Yongseok Park1 and Hao Wu2. 1University of Pittsburgh2Emory University

2:40 PM Floor Discussion.

Session 64: Recent Development in Personalized Medicineand Survival Analysis (Invited)Room: ASCSU Senate Chambers, level 2Organizer: Rui Song, North Carolina State University.Chair: Peng Roger Qu, Pfizer China R & D Center.

1:00 PM Estimating the Optimal Dynamic Treatment Regime from aClassification Perspective: C-learningBaqun Zhang1 and �Min Zhang2. 1Renmin University ofChina 2University of Michigan

1:25 PM Parsimonious and Robust Treatment Strategies for TargetPopulations Using Clinical Trial Data�Yingqi Zhao1 and Donglin Zeng2. 1University ofWisconsin-Madison 2The University of North Carolina atChapel Hill

1:50 PM A Sieve Semiparametric Maximum Likelihood Approach forRegression Analysis of Bivariate Interval-censored FailureTime Data�Qingning Zhou1, Tao Hu2 and Jianguo Sun1. 1Universityof Missouri-Columbia 2Capital Normal University

2:15 PM Floor Discussion.

Session 67: New Advances in Adaptive Design and Analysisof Clinical Trials (Invited)Room: 322, level 3Organizers: Ming Tan, Georgetown University; Peter Zhang, Ot-suka Pharmaceutical Development & Commercialization Inc..Chair: Peter Zhang, Otsuka Pharmaceutical Development & Com-mercialization Inc..

1:00 PM Sensitivity Analyses for Missing Not at Random (MNAR) inClinical TrialsPeter Zhang. Otsuka Pharmaceutical Development & Com-mercialization Inc.

1:25 PM Moment-based Covariate Adjustment Method for TreatmentEffect Estimation in Randomized Clinical Trials�Xiaofei Wang1, Junling Ma2 and Stepen George1. 1DukeUniversity 2Shanghai University of Finance and Economics

1:50 PM On Design and Analysis of a Stratified Biomarker Time-to-Event Clinical Trial in the Presence of Measurement ErrorAiyi Liu. National Institutes of Health

2:15 PM Floor Discussion.

Session 75: Model Selection in Complex Data Settings (Invited)Room: 372//374, level 3Organizer: Hai Liu, Indiana University.Chair: Hai Liu, Indiana University.

1:00 PM Meta-analysis Based Variable Selection for Gene ExpressionDataQuefeng Li1, �Sijian Wang2, Menggang Yu2 and Jun Shao2.1The University of North Carolina at Chapel Hill 2Universityof Wisconsin-Madison

1:25 PM Structural Discovery for Joint Models of Longitudinal andSurvival Outcomes�Zangdong He, Wanzhu Tu and Zhangsheng Yu. IndianaUniversity

1:50 PM An Empirical Bayes Approach to Integrate Multiple GWASwith Gene Expressions from Multiple Tissues�Jin Liu1 and Can Yang2. 1Duke-NUS 2Hong Kong BaptistUniversity

2:15 PM Model Selection in Multivariate Semiparametric Regression�Zhuokai Li1, Hai Liu2 and Wanzhu Tu2. 1Duke University2Indiana University

2:40 PM Floor Discussion.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 29

Page 38: Published by: International Chinese Statistical

Monday, June 15. 3:00 PM - 4:40 PM Scientific Program (�Presenting Author)

Session 86: Cutting-Edge New Tools for Statistical Analysisand Modeling (Invited)Room: Virginia Dale, level 3Organizer: Yanwei Zhang, Pfizer Inc..Chair: Yanwei Zhang, Pfizer Inc..

1:00 PM Web-based Analytics for Business Decision MakingSam Weerahandi. Pfizer Inc.

1:25 PM A GUI Software for Synchronizing Study Design, StatisticalAnalyses, and Reporting into Simple ClicksYanwei Zhang. Pfizer Inc.

1:50 PM Bayesian Mechanism to Enhance Financial Value of ClinicalDevelopment PortfolioShu Han. Pfizer Inc.

2:15 PM An R Package Suite for Meta-analysis in Differentially Ex-pressed Gene Analysis�Jia Li1, Geroge C. Tseng2 and Xingbin Wang2. 1HenryFord Health System 2University of Pittsburgh

2:40 PM Floor Discussion.

Monday, June 15. 3:00 PM - 4:40 PM

Session 8: New Statistical Advance in Genomics and HealthScience Applications (Invited)Room: 312, level 3Organizer: Jin Liu, University of Illinois at Chicago.Chair: Kun Chen, University of Connecticut.

3:00 PM Linking Lung Airway Structure to Pulmonary Function viaHierarchical Feature Selection�Kun Chen1, Eric Hoffman2, Indu Seetharaman3, FeiranJiao2, Ching-Long Lin2 and Kung-Sik Chan2. 1Universityof Connecticut 2University of Iowa 3Kansas State University

3:25 PM Imputing Transcriptome of Inaccessible Tissues In and Be-yond the GTEx ProjectJiebiao Wang1, Eric Gamazon2, Barbara Stranger1, Haekyung Im1, Nancy Cox2, Dan l Nicolae1 and �Lin Chen1.1University of Chicago 2Vanderbilt University

3:50 PM Efficient Variance Component Estimation with the HasemanElston Approximate RegressionXiang Zhou. University of Michigan

4:15 PM Improved Ancestry Estimation for both Genotyping and Se-quencing Data using Projection Procrustes Analysis andGenotype Imputation�Chaolong Wang1, Xiaowei Zhan2, Liming Liang3, GoncaloAbecasis4 and Xihong Lin3. 1Genome Institute of Singa-pore 2The University of Texas Southwestern Medical Center3Harvard University 4University of Michigan

4:40 PM Floor Discussion.

Session 23: Issues Related to Subgroup Analysis in Con-firmatory Clinical Trials: Challenges and Opportunities(Invited)Room: 310, level 3

Organizers: Chin-Fu Hsiao, National Health Research Insititutes,Taiwan; Toshimitsu Hamasaki, National Cerebral and Cardiovascu-lar Center, Japan.Chair: Toshimitsu Hamasaki, National Cerebral and CardiovascularCenter, Japan.

3:00 PM A Statistical Decision Framework Applicable to Multipopu-lation Tailoring TrialsBrian Millen. Eli Lilly and Company

3:25 PM A Multiple Comparison Procedure for Subgroup Analyseswith Binary Endpoints�Dong Xi, Yanqiu Weng, Kapildeb Sen, Ekkehard Glimm,Willi Maurer and Frank Bretz. Novartis PharmaceuticalCorporation

3:50 PM Interaction Trees for Exploring Stratified and IndividualizedTreatment EffectsXiaogang Su. The University of Texas at El Paso

4:15 PM Considering Regional Difference in Design and Evaluationof MRCTs for Binary Endpoints�Chi-Tian Chen and Chin-Fu Hsiao. National Health Re-search Institutes

4:40 PM Floor Discussion.

Session 25: Spatial and Spatio Temporal Modeling in Envi-ronmental and Ecological Studies (Invited)Room: 300, level 3Organizers: Ephraim Hanks, Pennsylvania State University; JunZhu, University of Wisconsin-Madison.Chair: Mevin Hooten, Colorado State University.

3:00 PM Multivariate Spatial Modeling on Spheres�Juan Du1 and Chunsheng Ma2. 1Kansas State University2Wichita State University

3:25 PM Autoregressive Spatially-varying Coefficient Models for Pre-dicting Daily PM2.5 Using VIIRS Satellite AOD�Erin Schliep1, Alan Gelfand1 and David Holland2. 1DukeUniversity 2U.S. Environmental Protection Agency

3:50 PM Modeling Animal Abundance with A Semi-ParametricSpace-Time ModelDevin Johnson. The National Oceanic and AtmosphericAdministration

4:15 PM An Efficient Non-parametric Estimate for Spatially Corre-lated Functional Data�Yuan Wang, Kim-Anh Do, Jianhua Hu and Brian Hobbs.M. D. Anderson Cancer Center

4:40 PM Discussant: Yuan Wang, M. D. Anderson Cancer Center.

Session 35: Novel Designs and Applications of Adaptive Ran-domization in Medical Research (Invited)Room: 322, level 3Organizer: Jack Lee, M. D. Anderson Cancer Center.Chair: Brian Hobbs, M. D. Anderson Cancer Center.

3:00 PM Statistical Inference for Covariate Adaptive RandomizedClinical Trials with Survival EndpointsLu Wang1, Jing Ning2 and �Hongjian Zhu1. 1The Uni-versity of Texas School of Public Health 2M. D. AndersonCancer Center

30 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 39: Published by: International Chinese Statistical

Scientific Program (�Presenting Author) Monday, June 15. 3:00 PM - 4:40 PM

3:25 PM Biomarker-Stratified Adaptive Basket Designs for MultipleCancersLorenzo Trippa. Dana-Farber Cancer Institute

3:50 PM Outcome Adaptive Randomization for Comparative Effec-tiveness Clinical TrialsMei-Chiung Shih. VA Cooperative Studies Program

4:15 PM Worth Adapting? When and How to Apply Adaptive Ran-domization to Make More Bang for the Buck�J. Jack Lee and Yining Du. M. D. Anderson Cancer Center

4:40 PM Floor Discussion.

Session 47: New Development in Nonparametric Methodsand Big Data Analytics (Invited)Room: 386, level 3Organizer: Ping Ma, University of Georgia.Chair: Ping Ma, University of Georgia.

3:00 PM A Nonparametric Spectral-Temporal Model for High-energyAstrophysical SourcesRaymond Wong1, Vinay Kashyap2, �Thomas Lee3 and Davidvan Dyk4. 1Iowa State University 2Harvard-SmithsonianCenter for Astrophysics 3University of California, Davis4Imperial College

3:25 PM Multistage Adaptive Testing of Sparse SignalsWenguang Sun. University of Southern California

3:50 PM Variable Selection for Sufficient Dimension Reduction usingWeighted Leverage ScoreWenxuan Zhong. University of Georgia

4:15 PM Efficient Computation of Smoothing Splines via AdaptiveBasis SamplingPing Ma1, �Jianhua Huang2 and Nan Zhang2. 1Universityof Georgia 2Texas A&M University

4:40 PM Floor Discussion.

Session 50: Biostatistics and Health Sciences (Invited)Room: 324, level 3Organizer: Mounir Mesbah, University Pierre et Marie Curie.Chair: Mounir Mesbah, University Pierre et MarieCurie.

3:00 PM When to Initiate Combined Antiretroviral Therapy in HIV-infected Individuals to Reduce the Risk of AIDS or SevereNon-AIDS Morbidity Using Marginal Structural Model�Yassin Mazroui, Valerie Potard, Murielle Mary-Krause,Ophelia Godin and Donminique Costagliola. UniversityPierre et Marie Curie

3:25 PM Trace Elements Uptake and Effect of Two Steppic Medici-nal Species of a Mining Area on Their Soil Trace ElementContents versus Bulk SoilsOumeima Mebirouk1, Fatima-Zohra Afri-Mehennaoui2,Smaıl Mehennaoui3, Lila Sahli2 and �Oualida Rached1.1Ecole Nationale Superieure de Biotechnologie 2UniversityConstantine 3University Batna

3:50 PM Study of the Effect of Trace Metals from Old Antimony Mineon Biodiversity by Stepwise Regression.�Alima Bentellis and Oualida Rached. Ecole NationaleSuperieure de Biotechnologie

4:15 PM Floor Discussion.

Session 52: Advances in Survey Statistics (Invited)Room: 304, level 3Organizer: Jean Opsomer, Colorado State University.Chair: Jay Breidt, Colorado State University.

3:00 PM Quantile Regression Imputation for a Survey SampleEmily Berg and �Cindy Yu. Iowa State University

3:25 PM Triply Robust Inference in the Presence of Missing SurveyData�David Haziza1, Valery Dongmo Jiongo2 and PierreDuchesne1. 1Universite de Montreal 2Statistics Canada

3:50 PM Adaptive Post-stratification Using Monotonicity Constraints�Jean Opsomer, Jiwen Wu and Mary Meyer. Colorado StateUniversity

4:15 PM Floor Discussion.

Session 63: Adaptive Design and Sample Size Re-Estimation(Invited)Room: 308, level 3Organizer: Xiaohua Sheng, Sanofi Pasteur U.S..Chair: Xiaohua Sheng, Sanofi Pasteur U.S..

3:00 PM Methods for Flexible Sample-Size Design in Clinical Trials�Gang Li1, Weichung Shih2 and Yining Wang1. 1Johnson &Johnson 2Rutgers University

3:25 PM Blinded Sample Size Re-estimation in Trials with SurvivalOutcomes and Incomplete InformationThomas Cook. University of Wisconsin-Madison

3:50 PM SMART with Adaptive Randomization�Ken Cheung1, Bibhas Chakraborty2 and KarinaDavidson1. 1Columbia University 2Duke-NUS

4:15 PM Discussant: H.M. James Hung, U.S. Food and Drug Admin-istration

4:40 PM Floor Discussion.

Session 68: Design and Analysis in Drug Combination Stud-ies (Invited)Room: 372//374, level 3Organizer: Chenguang Wang, Johns Hopkins University.Chair: Zhiwei Zhang, U.S. Food and Drug Administration.

3:00 PM Design and Statistical Analysis of Multidrug Combinationsin Preclinical Studies and Clinical TrialsMing Tan. Georgetown University

3:25 PM Bayesian Hierarchical Monotone Regression I-splines forDose-Response Assessment and Drug-Drug InteractionAnalysis�Gary Rosner1, Violeta Hennessey2 and VeerabhadranBaladandayuthapani3. 1Johns Hopkins University 2AmgenInc. 3M. D. Anderson Cancer Center

3:50 PM A Bayesian Nonparametric Approach for Synergy Assess-ment in Drug Combination StudiesChenguang Wang. Johns Hopkins University

4:15 PM Discussant: Ying Yuan, M. D. Anderson Cancer Center

4:40 PM Floor Discussion.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 31

Page 40: Published by: International Chinese Statistical

Monday, June 15. 4:40 PM - 6:00 PM Scientific Program (�Presenting Author)

Session 74: Empirical Likelihoods for Analyzing ImcompleteData (Invited)Room: 376//378, level 3Organizer: Mei-Cheng Wang, Johns Hopkins University.Chair: Mei-Cheng Wang, Johns Hopkins University.

3:00 PM ANOVA for Longitudinal Data with Missing Values�Songxi Chen1 and Ping-Shou Zhong2. 1Iowa State Uni-versity 2Michigan State University

3:25 PM Calibration in Missing Data Analysis Through EmpiricalLikelihoodPeisong Han. University of Waterloo

3:50 PM Asymptotic Behavior of the Sample Average of Partial Like-lihood for the Cox ModelJian-Jian Ren. University of Maryland

4:15 PM Efficient Estimation of the Cox Model with Auxiliary Sub-group Survival Information�Chiung-Yu Huang1, Jing Qin2 and Huei-Ting Tsai3.1Johns Hopkins University 2National Institutes of Health3Georgetown University

4:40 PM Floor Discussion.

Session 77: Recent Innovative Methodologies and Applica-tions in Genetics & Pharmacogenomics (GpGx) (Invited)Room: Virginia Dale, level 3Organizer: Ziwen Wei, Merck & Co..Chair: Lynn Kuo, University of Connecticut.

3:00 PM Tree-based Rare Variants AnalysesChi Song and �Heping Zhang. Yale University

3:25 PM Composite Kernel Machine Regression Based on LikelihoodRatio Test and its Application on Genomic Studies�Ni Zhao and Michael Wu. Fred Hutchinson Cancer Re-search Center

3:50 PM Improving the Robustness of Variable Selection and Predic-tive Performance of Lasso and Elastic-net Regularized Gen-eralized Linear Models and Cox Proportional Hazard Models�Feng Hong and Viswanath Devanarayan. AbbVie Inc.

4:15 PM Floor Discussion.

Session 85: Advances in Nonparametric and SemiparametricStatistics (Invited)Room: 306, level 3Organizer: Chunming Zhang, University of Wisconsin-Madison.Chair: Jiayang Sun, Case Western Reserve University.

3:00 PM Quantile Regression for Extraordinarily Large DataStanislav Volgushev1 and �Guang Cheng2. 1Cornell Uni-versity 2Purdue University

3:25 PM A Validated Information Criterion to Determine the Struc-tural Dimension in Dimension Reduction Models�Yanyuan Ma1 and Xinyu Zhang2. 1University of SouthCarolina 2Chinese academy of sciences

3:50 PM Systematic Clustering and Network Structures: a New Non-parametric Approach that Reveals Unprecedented Structuresand Patterns, with Applications to Large CMS Data.Junheng Ma, �Jiayang Sun and Gq Zhang. Case WesternReserve University

4:15 PM Semiparametric Model Building for Regression Models withTime-Varying ParametersTing Zhang. Boston University

4:40 PM Floor Discussion.

Session 89: Recent Advances in Biostatistics (Invited)Room: Grey Rock, level 2Organizer: Yuping Zhang, University of Connecticut.Chair: Aiyi Liu, National Institutes of Health.

3:00 PM Promoting Similarity of Sparsity Structures in IntegrativeAnalysisShuangge Ma. Yale University

3:25 PM Graphical Models and its Application in Genomics�Zhandong Liu1, Genevera Allen2 and Ying-Wooi Wan1.1Baylor College of Medicine 2Rice University

3:50 PM Threshold Regression with Censored Covariates�Jing Qian1, Folefac Atem2 and Rebecca Betensky2.1University of Massachusetts 2Harvard University

4:15 PM Jointly Analyzing Spatially Correlated Visual Field Data toDetect Glaucoma Progression�Joshua Warren1, Jean-Claude Mwanza2, Angelo Tanna3

and Donald Budenz2. 1Yale University 2The Universityof North Carolina at Chapel Hill 3Northwestern University

4:40 PM Floor Discussion.

Session 93: Negotiation Skills Critical for Statistical CareerDevelopment (Invited Panel)Room: 382, level 3Organizer: Kelly Zou, Pfizer Inc..Chair: Kelly Zou, Pfizer Inc..

Panelists: Ivan S. F. Chan, Merck & Co.

Mary W. Gray, American University

Susan Murphy, University of Michigan

Wei Shen, Eli Lilly and Company

4:40 PM Floor Discussion.

Monday, June 15. 4:40 PM - 6:00 PM

Session P01: Poster Session (Poster)Room: Grand Ballroom C/D, level 3Organizer: Jun Yan, University of Connecticut.Chair: Jun Yan, University of Connecticut.

1: Correction for Confounding Effect in Random Forests Anal-ysis�Yang Zhao and Donghua Lou. Nanjing Medical University

2: Strategies of Genetic Risk Prediction with Lung CancerGWAS Data�Donghua Lou, Weiwei Duan, Zhibin Hu and Feng Chen.Nanjing Medical University

3: Hierarchical Model for Genome-wide Association Study�Honggang Yi, Hongmei Wo, Yang Zhao, Ruyang Zhang,Junchen Dai, Guangfu Jin and Hongxia Ma. Nanjing Med-ical University

32 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 41: Published by: International Chinese Statistical

Scientific Program (�Presenting Author) Tuesday, June 16. 10:00 AM - 11:40 AM

4: A Review of Nonparametric Methods for Testing Isotropy inSpatial Data�Zachary Weller and Jennifer Hoeting. Colorado State Uni-versity

5: A Bayes Testing Approach to Metagenomic Profiling in Bac-teria�Camilo Valdes1, Bertrand Clarke2, Adrian Dobra3 andJennifer Clarke2. 1University of Miami 2University ofNebraska-Lincoln 3University of Washington

6: A Dynamical Model for Networks of Neuron Spike Trains�Hongyu Tan, Phillip Chapman and Haonan Wang. Col-orado State University

7: Bio-insecticidal Effects of Two Plant Extracts (Marru-bium Vulgare Artemisia Herba-alba) on Culex Pipiens(Diptera:Culicidae) under Laboratory ConditionsAmel Aouati1 and �Selima Berchi2. 1Universit’e Constan-tine 3 2Ecole Nationale Superieure de Biotechnologie

8: Hypothesis Testing for an Extended Cox Model with Time-varying Coefficients�Takumi Saegusa, Chongzhi Di and Ying Chen. FredHutchinson Cancer Research Center

9: The Stragety for Selecting Target Population Using AdaptivePhase II/III Seamless Design Based on Time-to-event Data�Hao Yu, Dandan Miao and Feng Chen. Nanjing MedicalUniversity

Tuesday, June 16. 8:40 AM - 9:40 AM

Graybill Plenary Session (Plenary)Room: Grand Ballroom A/B, Level 2Organizers: Executive Committee of the 2015 ICSA/Graybill JointConference.Chair: Duane Boes, Colorado State University.

8:40 AM Graybill Plenary LectureRichard Davis. Columbia University

9:40 AM Floor Discussion.

Tuesday, June 16. 10:00 AM - 11:40 AM

Session 3: Chemistry, Manufacturing, and Controls (CMC)in Pharmaceuticals: Current Statistical Challenges II(Invited)Room: 308, level 3Organizers: Richard K. Burdick, Amgen Inc.; Jorqe Quiroz, Abb-Vie Inc..Chair: Jorqe Quiroz, AbbVie Inc..

10:00 AM Statistical Methods for Analytical Validation of Accuracyand PrecisionRichard Burdick. Amgen Inc.

10:25 AM Statistical Applications for Biosimilar Product DevelopmentRichard Montes. Hospira, Inc.

10:50 AM How to Set Up Biosimilarity Bounds in Biosimilar ProductDevelopment�Lanju Zhang. AbbVie Inc.

11:15 AM Discussant: Jorqe Quiroz, AbbVie Inc.

11:40 AM Floor Discussion.

Session 10: SII Special Invited Session on Modern BayesianStatistics II (Invited)Room: Grey Rock, level 2Organizers: Ming-Hui Chen, University of Connecticut; HepingZhang, Yale University.Chair: Heping Zhang, Yale University.

10:00 AM A Bayes Testing Approach to Metagenomic Profiling in Bac-teriaBertrand Clarke1, Camilo Valdes2, Adrian Dobra3 and�Jennifer Clarke1. 1University of Nebraska-Lincoln2University of Miami 3University of Washington

10:25 AM Nonparametric Bayesian Functional Clustering for Time-Course Microarray DataZiwen Wei1 and �Lynn Kuo2. 1Merck & Co. 2University ofConnecticut

10:50 AM A Bayesian Approach to Identify Genes and Gene-level SNPAggregates in a Genetic Analysis of Cancer DataFrancesco Stingo1, �Michael Swartz2 and MarinaVannucci3. 1M. D. Anderson Cancer Center 2The Uni-versity of Texas School of Public Health 3Rice University

11:15 AM Adjusting Nonresponse Bias in Small Area Estimation with-out Covariates via a Bayesian Spatial Model�Xiaoming Gao1, Chong He2 and Dongchu Sun2.1Missouri Department of Conservation 2University ofMissouri-Columbia

11:40 AM Floor Discussion.

Session 13: Recent Advance in Longitudinal Data Analyses(Invited)Room: 310, level 3Organizer: Yu Cheng, University of Pittsburgh.Chair: Ruosha Li, The University of Texas School of Public Health.

10:00 AM Integrative and Adaptive Weighted Group Lasso and Gener-alized Local Quadratic ApproximationQing Pan1 and �Yunpeng Zhao2. 1The George WashingtonUniversity 2George Mason University

10:25 AM A Dynamic Risk Prediction Model for Data with CompetingRisks�Chung-Chou Chang1 and Qing Liu2. 1University of Pitts-burgh 2Novartis Pharmaceutical Corporation

10:50 AM Simultaneous Inference of a Misclassified Outcome andCompeting Risks Failure Time Data�Sheng Luo1, Xiao Su1, Min Yi2 and Kelly Hunt2. 1TheUniversity of Texas at Houston 2M. D. Anderson CancerCenter

11:15 AM Copula-based Quantile Regression for Longitudinal Data�Huixia Wang1 and Xingdong Feng2. 1The George Wash-ington University 2Shanghai University of Finance and Eco-nomics

11:40 AM Floor Discussion.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 33

Page 42: Published by: International Chinese Statistical

Tuesday, June 16. 10:00 AM - 11:40 AM Scientific Program (�Presenting Author)

Session 20: Risk Prediction Modeling in Clinical Trials(Invited)Room: 312, level 3Organizer: Fenghai Duan, Brown University.Chair: Ying Huang, Fred Hutchinson Cancer Research Center.

10:00 AM Evaluating Calibration of Risk Prediction Models

Ruth Pfeiffer. National Institutes of Health

10:25 AM Statistical Considerations for Evaluating Prognostic ImagingBiomarkers

Zheng Zhang. Brown University

10:50 AM Risk Assessment for Patients with Hepatitis C: A ScoringSystem Approach�Weining Shen1, Jing Ning1, Ying Yuan1, Ziding Feng1 andAnna Lok2. 1M. D. Anderson Cancer Center 2University ofMichigan

11:15 AM Risk Prediction Modeling in the National Lung ScreeningTrial

Fenghai Duan. Brown University11:40 AM Floor Discussion.

Session 28: Go/No Go Decision Criteria and Probability ofSuccess in Pharmaceutical Drug Development (Invited)Room: 322, level 3Organizer: Bo Huang, Pfizer Inc..Chair: Bo Huang, Pfizer Inc..

10:00 AM Sample Size Allocation in a Dose-Ranging Trial Combinedwith PoC

Qiqi Deng and �Naitee Ting. Boehringer-Ingelheim Phar-maceuticals Inc.

10:25 AM Selecting Development Strategy with Biomarkers�Feng Gao, Yi Liu and Mingxiu Hu. Takeda

10:50 AM Backward Bayesian Go/No-Go in the Early Phases

Yin Yin. Parexel International

11:15 AM Evaluation of Program Success for Programs with MultipleTrials in Binary Outcomes�Meihua Wang, Guanghan Liu and Jerald Schindler. Merck& Co.

11:40 AM Floor Discussion.

Session 44: Funding Opportunities and Grant Applications(Invited Panel)Room: 382, level 3Organizer: Aiyi Liu, National Institutes of Health.Chair: Aiyi Liu, National Institutes of Health.

Panelists: Debashis Ghosh, University of Colorado at Denver

Hulin Wu, University of Rochester

Heping Zhang, Yale University

Li Zhu, National Institutes of Health

11:40 AM Floor Discussion.

Session 55: New Method Development for Survival Analysis(Invited)Room: 324, level 3Organizer: Limin Peng, Emory University.Chair: Jong Jeong, University of Pittsburgh.

10:00 AM Analysis of the Proportional Hazard Model for with SparseLongitudinal Covariates�Hongyuan Cao1, Matthew M. Churpek2, Donglin Zeng3

and Jason P. Fine3. 1University of Missouri-Columbia2University of Chicago 3The University of North Carolinaat Chapel Hill

10:25 AM Hypoglycemic Events Analysis via Recurrent Time-to-Event(HEART) ModelsHaoda Fu. Eli Lilly and Company

10:50 AM Accelerated Intensity Frailty Model for Recurrent EventsDataBo Liu1, Wenbin Lu1 and �Jiajia Zhang2. 1North CarolinaState University 2University of South Carolina

11:15 AM A New Flexible Association Measure for Semi-CompetingRisks Data�Jing Yang and Limin Peng. Emory University

10:40 AM Floor Discussion.

Session 56: Recent Developments in Statistical LearningMethods (Invited)Room: 300, level 3Organizer: Xingye Qiao, Binghamton University.Chair: Ganggang Xu, Binghamton University.

10:00 AM Multiclass Sparse Discriminant Analysis�Qing Mai1, Yi Yang2 and Hui Zou2. 1Florida State Univer-sity 2University of Minnesota

10:25 AM Composite Large Margin Classifiers with Latent Subclassesfor Heterogeneous Biomedical Data�Guanhua Chen1, Yufeng Liu2 and Michael Kosorok2.1Vanderbilt University 2The University of North Carolina atChapel Hill

10:50 AM Positive Definite Regularized Estimation of Large Covari-ance Matrices�Lingzhou Xue1, Shiqian Ma2 and Hui Zou3.1Pennsylvania State University 2The Chinese Universityof Hong Kong 3University of Minnesota

11:15 AM Feature Selection Utilizing the Whole Solution PathYang Liu1 and �Peng Wang2. 1Bowling Green State Uni-versity 2University of Cincinnati

11:40 AM Floor Discussion.

Session 57: Recent Developments on High-Dimensional In-ference in Biostatistics (Invited)Room: 304, level 3Organizer: Yumou Qiu, University of Nebraska-Lincoln.Chair: Yumou Qiu, University of Nebraska-Lincoln.

10:00 AM Profiling and Accounting for Heterogeneity in the Analysisof Cancer Sequencing DataMengjie Chen. The University of North Carolina at ChapelHill

34 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 43: Published by: International Chinese Statistical

Scientific Program (�Presenting Author) Tuesday, June 16. 10:00 AM - 11:40 AM

10:25 AM Optimal Detection of Weak Positive Dependence betweenTwo Mixture Distributions�Sihai Zhao1, Tony Cai2 and Hongzhe Li2. 1University ofIllinois at Urbana-Champaign 2University of Pennsylvania

10:50 AM Multiple Testing for Conditional Dependence by Quantile-Based Contingency Table�Jichun Xie1 and Ruosha Li2. 1Duke University 2The Uni-versity of Texas School of Public Health

11:15 AM Spurious Discoveries for High-dimensional DataJianqing Fan1, Qi-Man Shao2 and �Wen-Xin Zhou1.1Princeton University 2The Chinese University of HongKong

11:40 AM Floor Discussion.

Session 61: Design and Analysis Issues in Clinical Trials(Invited)Room: 376//378, level 3Organizer: Ye Shen, University of Georgia.Chair: Yichen Qin, University of Cincinnati.

10:00 AM Nonparametric Response Adaptive Randomization Proce-duresZhongqiang Liu1 and �Feifang Hu2. 1Renmin Universityof China 2The George Washington University

10:25 AM Ecological Momentary Assessment for Measuring Outcomein Clinical TrialsStephen Rathbun. University of Georgia

10:50 AM Robust Zero-Inflated Poisson/Negative Binomial Regressionfor Over-Dispersed Count Data�Yichen Qin1, Ye Shen2 and Yang Li3. 1University ofCincinnati 2University of Georgia 3Renmin University ofChina

11:15 AM Joint Modeling Tumor Burden and Time to Event Data inOncology Trials�Ye Shen1, Aparna Anderson2, Riwik Sinha3 and Yang Li4.1University of Georgia 2Bristol-Myers Squibb Company3Adobe Research India Labs 4Renmin University of China

11:40 AM Floor Discussion.

Session 66: Recent Advances in Empirical LikelihoodMethod (Invited)Room: 306, level 3Organizers: Fei Tan, Indiana University-Purdue University; Hanxi-ang Peng, Indiana University-Purdue University.Chair: Fei Tan, Indiana University-Purdue University.

10:00 AM Jackknife Empirical Likelihood for U-Statistics with Esti-mated Constraints�Fei Tan and Hanxiang Peng. Indiana University-PurdueUniversity

10:25 AM Jackknife Empirical Likelihood for Order-restricted Statisti-cal Inference with Missing Data�Heng Wang and Ping-Shou Zhong. Michigan State Uni-versity

10:50 AM Improving Estimation in Structural Equation Models: AnEasy Empirical Likelihood Approach�Shan Wang and Hanxiang Peng. Indiana University-Purdue University

11:15 AM Composite Empirical Likelihood�Nicole Lazar and Adam Jaeger. University of Georgia

11:40 AM Floor Discussion.

Session 71: Next Generation Functional Data (Invited)Room: 386, level 3Organizer: Jane-Ling Wang, University of Carlifornia, Davis.Chair: Jane-Ling Wang, University of Carlifornia, Davis.

10:00 AM Analysis of Clustered Longitudinal/Functional DataNaisyin Wang. University of Michigan

10:25 AM Functional Data Analysis for Quantifying Brain Connectiv-ity�Hans-Georg Mueller1, Alexander Petersen1 and OwenCarmichael2. 1University of California, Davis 2LouisianaState University

10:50 AM Functional Principal Component Analysis of Spatial-Temporal Point Processes with Applications in DiseaseSurveillance�Yehua Li1 and Yongtao Guan2. 1Iowa State University2University of Miami

11:15 AM Localized Functional Principal Component Analysis�Kehui Chen1 and Jing Lei2. 1University of Pittsburgh2Carnegie Mellon University

11:40 AM Floor Discussion.

Session 82: The Jiann-Ping Hsu Invited Session on Biostatis-tical and Regulatory Sciences (Invited)Room: Virginia Dale, level 3Organizer: Lili Yu, Georgia Southern University.Chair: Lili Yu, Georgia Southern University.

10:00 AM A Generalized Birth and Death Process for Modeling theFates of Gene DuplicationJing Zhao1, Ashley Teufel2, David Liberles2, Lili Yu3 and�Liang Liu1. 1University of Georgia 2Temple University3Georgia Southern University

10:25 AM A Nonparametric Approach for Partial Areas under the Re-ceiver Operating Characteristic Curve and Ordinal Domi-nance CurveHanfang Yang1, Kun Lu2 and �Yichuan Zhao3. 1RenminUniversity of China 2University of Chicago 3Georgia StateUniversity

10:50 AM Analysis of Longitudinal Multivariate Outcome Data fromCouples Cohort Studies: Application to HPV TransmissionDynamicsXiangrong Kong. Johns Hopkins University

11:15 AM Bayesian Nonlinear Model Selection for Gene RegulatoryNetworks�Yang Ni1, Francesco Stingo2 and Veera Baladandayuthapani2.1Rice University 2M. D. Anderson Cancer Center

11:40 AM Floor Discussion.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 35

Page 44: Published by: International Chinese Statistical

Tuesday, June 16. 1:00 PM - 2:40 PM Scientific Program (�Presenting Author)

Session C01: Disease Models, Observational Studies, andHigh Dimensional Regression (Invited)Room: ASCSU Senate Chamer, level 2Organizer: Peng-Liang Zhao, Sanofi-aventis U.S. LLC..Chair: Xiaoyu Jia, Boehringer-Ingelheim Pharmaceuticals Inc..

10:00 AM Evaluate the Most Accurate Animal Model With Applicationto Pediatric Medulloblastoma�Lan Gao1, Behrouz Shamsaei2 and Stan Pounds3.1University of Tennessee at Chattanooga 2The Universityof Tennessee at Chattanooga 3St. Jude Children’s ResearchHospital

10:15 AM A Generalized Mover-Stayer Model for Disease Progres-sions with Death in Consideration of Age at the Study Entry�Yi-Ran Lin1, Wei-Hsiung Chao2 and Chen-Hsin Chen1,3.1Institute of Statistical Science, Academia Sinica 2NationalDong Hwa University 3 National Taiwan University

10:30 AM Improving Cancer Mortality Rate Estimation UsingPopulation-specific Structure in Direct Age-standardization�Beverly Fu1 and Wenjiang Fu2. 1Okemos High School2University of Houston

10:45 AM An Augmented ADMM Algorithm for Linearly RegularizedStatistical Estimation Problems

Yunzhang Zhu. The Ohio State University

11:00 AM Public Health Impacts Following the World Trade Center At-tacks of September 11th 2001; Statistical analyses of datafrom residents of lower Manhattan, New York�L. laszlo Pallos, Vinicius Antao, Jay Sapp and Youn Shim.Agency for Toxic Substances and Disease Registry

10:40 AM Estimation of Discrete Survival Function through the Model-ing of Diagnostic Accuracy for Mismeasured Outcome Data�Hee-Koung Joeng1, Abidemi k. Adeniji2, Naitee Ting2 andMing-Hui Chen1. 1University of Connecticut 2Boehringer-Ingelheim Pharmaceuticals Inc.

11:15 AM Floor Discussion.

Tuesday, June 16. 1:00 PM - 2:40 PM

Session 12: Taiwan National Health Database (Invited)Room: 310, level 3Organizer: Kuang-Fu cheng, Taipei Medical University.Chair: Yen-Kuang Lin, Taipei Medical University.

1:00 PM Effective Analysis of Primary Preventive Anti-HBVMedicine to Prevent Hepatitis Reactivation in Cancer Pa-tients Undergoing Chemotherapy Using National HealthInsurance Data Base and Cancer Registry Data

Ruey-Kuen Hsieh and �Wen-Kuei Chien. Taipei MedicalUniversity

1:25 PM A Nationwide Cohort Study of Influenza Vaccine on StrokePrevention in the Chronic Kidney Disease Population�Chang-I Chen, Wen-Kuei Chien, Yen-Kuang Lin, Ruey-Kuen Hsieh, Chao-Feng Lin, Ju-Chi Liu and Hao-WengDeng. Taipei Medical University

1:50 PM Economic Movement and Mental Health: A Population-based Study.�Yen-Kuang Lin1 and Chen-Yin Lee2. 1Taipei Medical Uni-versity 2Mingdao University

2:15 PM Floor Discussion.

Session 21: The Application of Latent Variable and MixtureModels to the Biological Sciences (Invited)Room: 312, level 3Organizers: Youyi Fong, Fred Hutchinson Cancer Research Center;Nathan Vandergrift, Duke University.Chair: Nathan Vandergrift, Duke University.

1:00 PM The Role of Item Response Theory in Assessment and Eval-uation Studies�Li Cai and Lauren Harrell. University of California, LosAngeles

1:25 PM The application of Structural Equation Modeling toBiomarker data�Nathan Vandergrift, Sallie Permar and Barton Haynes.Duke University

1:50 PM Latent and Observed Variables in Kernel-Penalized Regres-sion Models

Timothy Randolph. Fred Hutchinson Cancer Research Cen-ter

2:15 PM A Mixture Model Approach to Estimating a NonlinearErrors-in-Variables Model for Serial Dilution Assay

Youyi Fong. Fred Hutchinson Cancer Research Center2:40 PM Floor Discussion.

Session 29: Machine Learning for Big Data Problems (Invited)Room: ASCSU Senate Chambers, level 2Organizer: Peng Huang, Johns Hopkins University.Chair: Hee-Koung Joeng, University of Connecticut.

1:00 PM A Scalable Integrative Model for Heterogeneous GenomicData Types under Multiple Conditions

Mai Shi and �Yingying Wei. The Chinese University ofHong Kong

1:25 PM Greedy Tree Learning of Optimal Personalized TreatmentRules�Ruoqing Zhu1, Yingqi Zhao2, Guanhua Chen3, ShuanggeMa1 and Hongyu Zhao1. 1Yale University 2University ofWisconsin-Madison 3Vanderbilt University

1:50 PM ROC Analysis for Multiple Markers with Tree-Based Classi-fication

Mei-Cheng Wang1 and �Shanshan Li2. 1Johns HopkinsUniversity 2Indiana University

2:15 PM Tissue Classification Through Imaging Texture Analysis�Peng Huang, Siva Raman, Linda Chu, Jamie Schroeder,Malcolm Brock, Franco Verde and Elliot Fishman. JohnsHopkins University

2:40 PM Floor Discussion.

36 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 45: Published by: International Chinese Statistical

Scientific Program (�Presenting Author) Tuesday, June 16. 1:00 PM - 2:40 PM

Session 32: Recent Development on Next Generation Se-quencing Based Data Analysis (Invited)Room: 322, level 3Organizers: Hongmei Jiang, Northwestern University; Lingling An,University of Arizona.Chair: Hongmei Jiang, Northwestern University.

1:00 PM Bayesian nonparametric models for Tumor Heterogeneityusing Next-Generation Sequencing Data�Yuan Ji1,2, Yanxun Xu3, Juhee Lee4, Subhajit Sengupta1,Peter Mueller3 and Yitan Zhu1. 1Northshore UniversityHealthSystem 2University of Chicago 3The University ofTexas at Austin 4University of California, Santa Cruz

1:25 PM Leveraging in Big Data AnalyticsPing Ma. University of Georgia

1:50 PM Investigating Microbial Co-occurrence Patterns Based onMe-tagenomic Compositional Data�Yuguang Ban1, Lingling An2 and Hongmei Jiang1.1Northwestern University 2University of Arizona

2:15 PM Rapid Alignment and Filtration for Accurate Pathogen Iden-tification in Clinical Samples Using Unassembled Sequenc-ing Data�W. Evan Johnson1, Solaiappan Manimaran1, ChangjinHong1, Keith Crandall2 and Eduardo Castro-Nallar2.1Boston University 2The George Washington University

2:40 PM Floor Discussion.

Session 34: Recent Advances in Genomics (Invited)Room: 324, level 3Organizer: Lynn Kuo, University of Connecticut.Chair: Ziwen Wei, Merck & Co..

1:00 PM Accounting For Gene Length in RNA-Seq Data�Patrick Harrington and Lynn Kuo. University of Connecti-cut

1:25 PM Integrating Diverse Genomics Data to Infer Regulations�Yuping Zhang1 and Hongyu Zhao2. 1University of Con-necticut 2Yale University

1:50 PM Phylogenetic Trait Evolution with Drift�Mandev Gill and Marc Suchard. University of California,Los Angeles

2:15 PM Floor Discussion.

Session 36: Lifetime Data Analysis (Invited)Room: 372//374, level 3Organizer: Mei-Ling Lee, University of Maryland.Chair: Yichuan Zhao, Gerogia State University.

1:00 PM Statistical Inference on Quantile Residual LifeJong Jeong. University of Pittsburgh

1:25 PM Onset Time of Chronic Pseudomonas Aeruginosa Infectionin Cystic Fibrosis Patients with Interval Censored DataWenjie Wang1, Huichuan Lai2, �Jun Yan1 and ZhuminZhang2. 1University of Connecticut 2University ofWisconsin-Madison

1:50 PM A Model for Time to Fracture with a Shock Stream Superim-posed on Progressive Degradation: the Study of OsteoporoticFractures�Xin He1, G. A. Whitmore2, Geok Yan Loo1, MarcHochberg3 and Mei-Ling Lee1. 1University of Mary-land 2McGill University 3University of Maryland BaltimoreCounty

2:15 PM Explained Variation in Correlated Survival DataGordon Honerkamp-Smith and �Ronghui Xu. University ofCalifornia, San Diego

2:40 PM Floor Discussion.

Session 38: New Approaches for Analyzing Time Series Data(Invited)Room: 304, level 3Organizer: Thomas Lee, University of Carlifornia, Davis.Chair: Raymond Wong, Iowa State University.

1:00 PM Spectral Analysis of Linear Time Series in Moderately HighDimensionsLili Wang1, Alexander Aue2 and �Debashis Paul2.1Zhejiang University 2University of California, Davis

1:25 PM High Order Corrected Estimator of Time-average VarianceConstant�Chun yip Yau and Kin Wai Chan. The Chinese Universityof Hong Kong

1:50 PM Floor Discussion.

Session 46: Recent Advances in Integrative Analysis of OmicsData (Invited)Room: 300, level 3Organizer: Qi Long, Emory University.Chair: Qi Long, Emory University.

1:00 PM A Bayesian Model for the Identification of Differentially Ex-pressed Genes in Daphnia Magna Exposed to Munition Pol-lutants�Marina Vannucci1, Alberto Cassese1,2 and MicheleGuindani2. 1Rice University 2M. D. Anderson Cancer Cen-ter

1:25 PM A Bayesian Approach to Biomarker Selection throughmiRNA Regulatory NetworksThierry Chekouo1, �Francesco Stingo1, James Doecke2 andKim-Ahn Do1. 1M. D. Anderson Cancer Center 2CSIRO

1:50 PM Testing Differential RNA-isoform Expression/Usage�Wei Sun1, Yufeng Liu1, James Crowley1, Ting-Huei Chen2,Hua Zhou3, Yichao Wu3 and Fei Zou1. 1The University ofNorth Carolina at Chapel Hill 2National Institutes of Health3North Carolina State University

2:15 PM Floor Discussion.

Session 51: Recent Developments in Analyzing Censored Sur-vival Data (Invited)Room: 376//378, level 3Organizer: Bin Nan, University of Michigan.Chair: Shou-En Lu, Rutgers University.

1:00 PM Stacking Survival ModelsDebashis Ghosh. University of Colorado at Denver

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 37

Page 46: Published by: International Chinese Statistical

Tuesday, June 16. 1:00 PM - 2:40 PM Scientific Program (�Presenting Author)

1:25 PM Estimation of Concordance Probability with Censored Re-gression Models�Zhezhen Jin and Xinhua Liu. Columbia University

1:50 PM Improving Efficiency in Biomarker Incremental Value Eval-uation under Two-phase Study Designs�Yingye Zheng1 and Tianxi Cai2. 1Fred Hutchinson CancerResearch Center 2Harvard University

2:15 PM Nonparametric Tests of Treatment Effect for a RecurrentEvent Process that TerminatesNabihah Tayob1 and �Susan Murray2. 1M. D. AndersonCancer Center 2University of Michigan

2:40 PM Floor Discussion.

Session 53: Innovative Statistical Methods in Genomics andGenetics (Invited)Room: Virginia Dale, level 3Organizer: Zhengqing Ouyang, The Jackson Laboratory.Chair: Ming Hu, New York University.

1:00 PM Statistical Analysis of Differential Alternative Splicing usingRNA-Seq DataMingyao Li. University of Pennsylvania

1:25 PM Integrating Auxillary Information in Complex Traits StudiesMarc Coram, Sophie Candille and �Hua Tang. StanfordUniversity

1:50 PM Detecting Nonlinear Associations in High-throughput Datawith Applications Clustering and Variable Selection�Tianwei Yu and Hesen Peng. Emory University

2:15 PM Hypothesis Test of Mediation Effect in Causal MediationModel with High-dimensional Mediators�Yen-Tsung Huang and Wen-Chi Pan. Brown University

2:40 PM Floor Discussion.

Session 70: Use of Simulation in Drug Development and De-cision Making (Invited)Room: 382, level 3Organizer: Fei Wang, Amgen Inc..Chair: Jack Lee, M. D. Anderson Cancer Center.

1:00 PM Simulations: The Future of Clinical Trial DesignBen Saville. Berry Consultants

1:25 PM Bayesian Application in Optimizing Probability of StudySuccess with Multiple Endpoints Setting�Grace Li, Honghua Jiang, Shen Lei, Karen Price, HaodaFu and David Manner. Eli Lilly and Company

1:50 PM Evaluation of Strategies for Designing Phase 2 Dose FindingStudiesCristiana Mayer. Johnson & Johnson

2:15 PM Discussant: Amy Xia, Amgen Inc.

2:40 PM Floor Discussion.

Session 73: Non-Parametrics and Semi-Parametrics: NewAdvances and Applications (Invited)Room: 306, level 3Organizer: Lily Wang, Iowa StateUniversity.Chair: Gang Li, Johnson & Johnson.

1:00 PM Single-index Models for Function-on-Function Regression�Guanqun Cao1 and Lily Wang2. 1Auburn University2Iowa State University

1:25 PM Free-knot Splines for Generalized Linear ModelsElla Revzin1 and �Jing Wang2. 1Coyote Logistics2University of Illinois at Chicago

1:50 PM White Noise Testing and Model Diagnostic Checking forFunctional Time SeriesXianyang Zhang. University of Missouri-Columbia

2:15 PM Collective Estimation of Multiple Bivariate Density Func-tions with Application to Angular-sampling-based ProteinLoop ModelingMehdi Maadooliat1, �Lan Zhou2, Seyed M. Najibi3, XinGao4 and Jianhua Huang2. 1Marquette University 2TexasA&M University 3Shahid Beheshti University 4King Abdul-lah University of Science and Technology

2:40 PM Floor Discussion.

Session 76: Advances in Statistical Methods of IdentifyingSubgroup in Clinical Studies (Invited)Room: Grey Rock, level 2Organizer: Xiaojing Wang, University of Connecticut.Chair: Yuefeng Lu, Sanofi-aventis U.S. LLC..

1:00 PM The Bias Correction in Comparing the Treatment Effect inDifferent Subgroups of Patients from a Randomized ClinicalTrial�Lu Tian1, Fei Jiang2 and LJ Wei2. 1Stanford University2Harvard University

1:25 PM A Regression Tree Approach to Identifying Subgroups withDifferential Treatment EffectsWei-Yin Loh. University of Wisconsin-Madison

1:50 PM Identifying Subgroups of Enhanced Predictive Accuracyfrom Longitudinal Biomarker Data Using Tree-based Ap-proaches: Applications to Monitoring Fetal Growth�Jared Foster, Danping Liu, Paul Albert and Aiyi Liu. Na-tional Institutes of Health

2:15 PM A Bayesian Approach For Subgroup AnalysisJames O. Berger1, �Xiaojing Wang2 and Lei Shen3. 1DukeUniversity 2University of Connecticut 3Eli Lilly and Com-pany

2:40 PM Floor Discussion.

Session 78: Analysis and Classification of High DimensionalData (Invited)Room: 308, level 3Organizer: Yichao Wu, North Carolina State University.Chair: Fangfang Wang, University of Illinois at Chicago.

1:00 PM Neyman-Pearson Classification under High-dimensionalSettingsAnqi Zhao, Yang Feng, Lie Wang and �Xin Tong. Universityof Southern California

1:25 PM Index Models for Functional Data�Peter Radchenko, Xinghao Qiao and Gareth James. Uni-versity of Southern California

38 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 47: Published by: International Chinese Statistical

Scientific Program (�Presenting Author) Wednesday, June 17. 8:40 AM - 10:20 AM

1:50 PM Stabilized Nearest Neighbor Classifier and Its TheoreticalPropertiesWei Sun1, �Xingye Qiao2 and Guang Cheng1. 1PurdueUniversity 2Binghamton University

2:15 PM Floor Discussion.

Session 92: Issues in Probabilistic Models for RandomGraphs (Invited)Room: 386, level 3Organizers: Mark Kaiser, Iowa State University; Zhengyuan Zhu,Iowa State University.Chair: Zhengyuan Zhu, Iowa State University.

1:00 PM Exponential-family Random Hypergraph Models for GroupRelations�Ryan Haunfelder, Haonan Wang and Bailey Fosdick. Col-orado State University

1:25 PM Exponential-family Random Graph Models with Local De-pendenceMichael Schweinberger. Rice University

1:50 PM Local Structure Graph Models with Higher-Order Depen-dence�Emily Casleton1, Mark Kaiser2 and Daniel Nordman2.1Los Alamos National Laboratory 2Iowa State University

2:15 PM Discussant: Mark Kaiser, Iowa State University

2:40 PM Floor Discussion.

Tuesday, June 16. 3:00 PM - 4:00 PM

Leadership Forum (Plenary Panel)Room: Grand Ballroom A/B, Level 2Organizers: Executive Committee of the 2015 Joint ICSA/GraybillConference.Chair: Wei Shen, Eli Lilly and Company.

Panelists: Gregory Campbell, U.S. Food and Drug Administration

Xiao-Li Meng, Harvard University

Janet Wittes, Statistics Collaborative Inc.

Wednesday, June 17. 8:40 AM - 10:20 AM

Session 33: Challenges of Quantile Regression in High-Dimensional Data Analysis: Theory and Applications(Invited)Room: 300, level 3Organizer: Linglong Kong, University of Alberta.Chair: Shengchun Kong, Purdue University.

8:40 AM Regularized Quantile Regression for Quantitative GeneticTraits�Chad He1, Linglong Kong2, Yanhua Wang1, Sijian Wang3,Timothy Chan4 and Eric Holland1. 1Fred HutchinsonCancer Research Center 2University of Alberta 3Universityof Wisconsin-Madison 4Memorial Sloan-Kettering CancerCenter

9:05 AM Globally Adaptive Quantile Regression with High Dimen-sional Data�Qi Zheng1, Limin Peng1 and Xuming He2. 1Emory Uni-versity 2University of Michigan

9:30 AM Focused Information Criterion and Model Averaging Basedon Weighted Composite Quantile Regression�Ganggang Xu1, Suojin Wang2 and Jianhua Huang2.1Binghamton University 2Texas A&M University

9:55 AM Bayesian Quantile Regression via Dirichlet Process Mixtureof Logistic Distributions�Chao Chang and Nan Lin. Washington University in St.Louis

10:20 AM Floor Discussion.

Session 39: Statistica Sinica Special Invited Session on Spa-tial and Temporal Data Analysis (Invited)Room: 382, level 3Organizer: Bo Li, University of Illinois at Urbana-Champaign.Chair: Scott Holan, University of Missouri-Columbia.

8:40 AM Likelihood Approximations for Big Nonstationary SpatialTemporal Lattice Data�Joseph Guinness and Montserrat Fuentes. North CarolinaState University

9:05 AM A Multivariate Gaussian Process Factor Model for HandShape During Reach-to-Grasp MovementsLucia Castellanos1, �Vincent Vu2, Sagi Perel1, AndrewSchwartz3 and Robert Kass1. 1Carnegie Mellon University2The Ohio State University 3University of Pittsburgh

9:30 AM A Covariance Parameter Estimation Method for Polar-Orbiting Satellite Data�Michael Horrell and Michael Stein. University of Chicago

9:55 AM Bayesian Analysis of Spatially-Dependent Functional Re-sponses with Spatially-Dependent Multi-Dimensional Func-tional Predictors�Scott Holan1, Wen-Hsi Yang2, Christopher Wikle1, D.Brenton Myers1 and Kenneth Sudduth3. 1University ofMissouri-Columbia 2CSIRO 3U.S. Department of Agricul-ture

10:20 AM Floor Discussion.

Session 49: Multi-Regional Clinical Trial Design and Analy-sis (Invited)Room: 308, level 3Organizer: Xuezhou Mao, Sanofi-aventis U.S. LLC..Chair: Xiaohua Sheng, Sanofi Pasteur U.S..

8:40 AM Design and Analysis of Multiregional Clinical Trials in Eval-uation of Medical Devices: A Two-component Bayesian Ap-proach for Targeted Regulatory Decision Making�Yunling Xu and Nelson Lu. U.S. Food and Drug Adminis-tration

9:05 AM Assessing Benefit and Consistency of Treatment Effect undera Discrete Random Effects Model in Multiregional ClinicalTrials�Hsiao-Hui Tsou1, K. K. Gordon Lan2, Jung-Tzu Liu1,Chin-Fu Hsiao1, Chi-Tian Chen1 and Chyng-Shyan Tzeng3.1National Health Research Institutes 2Johnson & Johnson3National Tsing Hua University

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 39

Page 48: Published by: International Chinese Statistical

Wednesday, June 17. 8:40 AM - 10:20 AM Scientific Program (�Presenting Author)

9:30 AM Multi-Regional Clinical Trials – Where We Have Been andWhere We Are GoingBruce Binkowitz. Merck & Co.

9:55 AM Discussant: Chin-Fu Hsiao, National Health Research Insti-tutes, Taiwan

10:20 AM Floor Discussion.

Session 58: Blinded and Unblinded Evaluation of AggregateSafety Data during Clinical Development (Invited)Room: 310, level 3Organizers: Greg Ball, AbbVie Inc.; Richard Entsuah, Merck &Co..Chair: Richard Entsuah, Merck & Co..

8:40 AM Continuous Safety Signal Monitoring with Blinded Data�Greg Ball1 and William Wang2. 1AbbVie Inc. 2Merck &Co.

9:05 AM How Should the Final Rule Affect DMCs?Janet Wittes. Statistics Collaborative

9:30 AM Implementation of the Investigational New Drug Safety Re-porting Requirements “Final Rule”Brenda Crowe. Eli Lilly and Company

9:55 AM Floor Discussion.

Session 65: New Strategies to Identify Disease Associated Ge-nomic Biomarkers (Invited)Room: 304, level 3Organizer: Wei Sun, The University of North Carolina at ChapelHill.Chair: Wei Sun, The University of North Carolina at Chapel Hill.

8:40 AM Discovering Disease Associated Molecular Interactions Us-ing Discordant CorrelationCharlotte Siska and �Katerina Kechris. University of Col-orado at Denver

9:05 AM Joint Analysis of Genomic Data from Different Sources us-ing Kernel Machine Regression with Multiple Kernels�Michael Wu and Ni Zhao. Fred Hutchinson Cancer Re-search Center

9:30 AM Transformed Low-rank ANOVA Models for High Dimen-sional Variable Selection�Jianhua Hu1 and Yoonsuh Jung2. 1M. D. Anderson Can-cer Center 2University of Waikato

9:55 AM Proper Use of Allele-Specific Expression Improves Statisti-cal Power for cis-eQTL Mapping with RNA-Seq Data�Yijuan Hu1, Wei Sun2, Jung-Ying Tzeng3 and CharlesPerou2. 1Emory University 2The University of North Car-olina at Chapel Hill 3North Carolina State University

10:20 AM Floor Discussion.

Session 84: Design More Efficient Adaptive Clinical TrialsUsing Biomarkers (Invited)Room: 322, level 3Organizer: Ying Yuan, M. D. Anderson Cancer Center.Chair: Yong Zang, Florida Atlantic University.

8:40 AM Sequential Designs for Individualized Dosing in Phase ICancer Clinical Trials�Xuezhou Mao1 and Ying Kuen Cheung2. 1Sanofi-aventisU.S. LLC. 2Columbia University

9:05 AM Stratification Free Biomarker Designs for Randomized Trialswith Adaptive EnrichmentNoah Simon. University of Washington

9:30 AM Bayesian Predictive Modeling for Personalized TreatmentSelection in OncologyJunsheng Ma, Francesco Stingo and �Brian Hobbs. M. D.Anderson Cancer Center

9:55 AM Optimal Marker-Adaptive Designs for Targeted TherapyBased on Imperfectly Measured BiomarkersYong Zang1, Suyu Liu2 and �Ying Yuan2. 1Florida AtlanticUniversity 2M. D. Anderson Cancer Center

10:20 AM Floor Discussion.

Session 87: Advanced Methods for Graphical Models (Invited)Room: 386, level 3Organizer: Yuping Zhang, University of Connecticut.Chair: Jun Li, University of Notre Dame.

8:40 AM Learning Causal Networks via Additive FaithfulnessKuang-Yao Lee1, Tianqi Liu1, �Bing Li2 and Hongyu Zhao1.1Yale University 2Pennsylvania State University

9:05 AM Statistical Modeling of RNase-seq for Genome-wide Infer-ence of RNA StructureZhengqing Ouyang. The Jackson Laboratory

9:30 AM Distance Shrinkage and Euclidean Embedding via Regular-ized Kernel EstimationMing Yuan. University of Wisconsin-Madison

9:55 AM Detecting Overlapping Communities in Networks with Spec-tral MethodsYuan Zhang, Elizaveta Levina and �Ji Zhu. University ofMichigan

10:20 AM Floor Discussion.

Session C03: Functional Data, Semi-parametric and Non-parametric Methods (Contributed)Room: 376//378, level 3Organizer: Peng-Liang Zhao, Sanofi-aventis U.S. LLC..Chair: Xin Wang, AbbVie Inc..

8:40 AM An Unbiased Measure of Integrated Volatility in the Fre-quency DomainFangfang Wang. University of Illinois at Chicago

8:55 AM Functional data analysis for density functions by transforma-tion to a Hilbert space�Alexander Petersen and Hans-Georg Muller. Universityof California, Davis

9:10 AM Cross-covariance Functions for Divergence-free and Curl-free Tangent Vector Fields on the Sphere�Minjie Fan1 and Tomoko Matsuo2. 1University of Califor-nia, Davis 2University of Colorado at Boulder

9:25 AM Empirical Likelihood-based Inference for Linear Compo-nents in Partially Linear ModelsHaiyan Su. Montclair State University

40 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 49: Published by: International Chinese Statistical

Scientific Program (�Presenting Author) Wednesday, June 17. 10:40 AM - 12:20 PM

9:40 AM Consistency of Bayesian Semiparametric Models throughJoint Density EstimationYuefeng Wu. University of Missouri-St. Louis

9: 55 AM Analysis of Water Quality in New Jersey�Kaitlyn Scrudato and Haiyan Su. Montclair State Univer-sity

10:10 AM Floor Discussion.

Wednesday, June 17. 10:40 AM - 12:20 PM

Session 6: Recent Advances in Analyzing Genomic Data(Invited)Room: 300, level 3Organizer: Hao Chen, University of California, Davis.Chair: Hua Tang, Stanford University.

10:40 AM Allele-specific Copy Number Profiling by Next-generationDNA Sequencing�Hao Chen1, John Bell2, Nicolas Zavala2, Hanlee Ji2 andNancy Zhang3. 1University of California, Davis 2StanfordUniversity 3University of Pennsylvania

11:05 AM A Statistical Approach to Prioritizing GWAS Results by In-tegrating Pleiotropy and Genomic Annotation Data�Dongjun Chung1, Can Yang2, Cong Li3, Joel Gelernter3

and Hongyu Zhao3. 1Medical University of South Carolina2Hong Kong Baptist University 3Yale University

11:30 AM GLAD: A Mixed-membership Model for Heterogeneous Tu-mor Subtype ClassificationHachem Saddiki1, Jon Mcauliffe2 and �Patrick Flaherty3.1Worcester Polytechnic Institute 2University of California,Berkeley 3University of Massachusetts

11:55 AM Learning Genetic Regulatory Networks Using RNA-seq data�Jie Peng1 and Ru Wang2. 1University of California, Davis2Guidewire

12:20 PM Floor Discussion.

Session 11: Emerging Issues in Time-to-Event Data (Invited)Room: 304, level 3Organizer: Qingxia Chen, Vanderbilt University.Chair: Yu Cheng, University of Pittsburgh.

10:40 AM Bayesian Path Specific Frailty Models for Multi-state Sur-vival Data with ApplicationsMario De castro1, �Ming-Hui Chen2 and Yuanye Zhang3.1Universidade de Sao Paulo 2University of Connecticut3Novartis Pharmaceutical Corporation

11:05 AM Quantile Association for Bivariate Survival Data�Ruosha Li1, Yu Cheng2, Qingxia Chen3 and JasonFine4. 1The University of Texas School of Public Health2University of Pittsburgh 3Vanderbilt University 4The Uni-versity of North Carolina at Chapel Hill

11:30 AM Robust Estimation for Clustered Failure Time Data: Appli-caton to Huntington’s Disease�Tanya Garcia1, Yanyuan Ma2, Yuanjia Wang3 and KarenMarder3. 1Texas A&M University 2University of SouthCarolina 3Columbia University

11:55 AM Floor Discussion.

Session 18: Statistical Methods for Sequencing Data Analysis(Invited)Room: 308, level 3Organizer: Yanming Di, Oregon State University.Chair: Yanming Di, Oregon State University.

10:40 AM Unit-free and Robust Detection of Differential Expressionfrom RNA-Seq DataHui Jiang. University of Michigan

11:05 AM Robust Estimation of Isoform Expression with RNA-SeqData�Jun Li1 and Hui Jiang2. 1University of Notre Dame2University of Michigan

11:30 AM Genetic Association Testing for Binary Traits in the Presenceof Population Structure�Duo Jiang1, Sheng Zhong2 and Mary sara Mcpeek2.1Oregon State University 2University of Chicago

11:55 AM Identification of Stably Expressed Genes from ArabidopsisRNA-Seq DataBin Zhuo, �Yanming Di, Sarah Emerson and Jeff Chang.Oregon State University

12:20 PM Floor Discussion.

Session 24: Recent Developments in Missing Data Analysis(Invited)Room: 386, level 3Organizer: Peisong Han, University of Waterloo.Chair: Peisong Han, University of Waterloo.

10:40 AM Variable Selection in the Presence of Missing Data: Resam-pling and Imputation�Qi Long1 and Brent Johnson2. 1Emory University2University of Rochester

11:05 AM Using Link-Preserving Imputation for Logistic Partially Lin-ear Models with Missing Covariates�Qixuan Chen1, Myunghee Paik2, Minjin Kim2 and CuilingWang3. 1Columbia University 2Seoul National University3Albert Einstein College of Medicine

11:30 AM Composite Likelihood Approach in Gaussian Copula Re-gression Models with Missing Data�Wei Ding and Peter Song. University of Michigan

11:55 AM Test the Reliability of Doubly Robust Estimation with Miss-ing Response Data�Baojiang Chen1 and Jing Qin2. 1University of NebraskaMedical Center 2National Institutes of Health

12:20 PM Floor Discussion.

Session 27: Bayesian Applications in Biomedical Studies(Invited)Room: 310, level 3Organizer: Xiaowen Hu, Colorado State University.Chair: Xiaowen Hu, Colorado State University.

10:40 AM Bayesian Functional Enrichment Analysis�Jing Cao1 and Song Zhang2. 1Southern Methodist Univer-sity 2The University of Texas Southwestern Medical Center

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 41

Page 50: Published by: International Chinese Statistical

Wednesday, June 17. 10:40 AM - 12:20 PM Scientific Program (�Presenting Author)

11:05 AM Bayesian Spatial Clustering Method and Its Applicatoin inRadiology�Song Zhang1 and Yin Xi2. 1The University of Texas South-western Medical Center 2Southern Methodist University

11:30 AM Adjusting for Heterogeneity in Infectivity in HIV PreventionClinical Trials�Jingyang Zhang1 and Elizabeth Brown1,2. 1Fred Hutchin-son Cancer Research Center 2University of Washington

11:55 AM Canonical Variate Regression�Chongliang Luo1, Jin Liu2, Dipak Dey1 and Kun Chen1.1University of Connecticut 2Duke-NUS

12:20 PM Floor Discussion.

Session 31: Adaptive Designs for Early-Phase Oncology Clin-ical Trials (Invited)Room: 312, level 3Organizer: Yuan Ji, NorthShore University HealthSystem & Uni-versity of Chciago.Chair: Yuan Ji, NorthShore University HealthSystem & Universityof Chciago.

10:40 AM Subgroup-Based Adaptive (SUBA) Designs for Multi-ArmBiomarker Trials�Yanxun Xu1, Lorenzo Trippa2, Peter Mueller3 and YuanJi4,5. 1Johns Hopkins University 2Harvard Univer-sity 3University of Texas at Austin 4NorthShore UniversityHealthSystem 5University of Chicago

11:05 AM A Curve-free Bayesian Decision-theoretic Design for Two-agent Phase I TrialsBee Leng Lee1, �Shenghua Fan2 and Ying Lu3. 1SanJose State University 2California State University, East Bay3Stanford University

11:30 AM Bayesian Dose-finding Designs for Combination of Molecu-larly Targeted Agents Assuming Partial Stochastic OrderingBeibei Guo1 and �Yisheng Li2. 1Louisiana State University2M. D. Anderson Cancer Center

11:55 AM Floor Discussion.

Session 45: Advances and Case Studies for Multiplicity Issuesin Clinical Trials (Invited)Room: 322, level 3Organizer: Guanghan Frank Liu, Merck & Co..Chair: Meihua Wang, Merck & Co..

10:40 AM Confidence Intervals for Multiple Comparisons ProceduresBrian Wiens. Portola Pharmaceuticals

11:05 AM Composite Endpoints - Some Common Misconceptions�David Li1 and Jin Xu2. 1Pfizer Inc. 2Merck & Co.

11:30 AM Multiplicity Adjustment in Vaccine Efficacy Trial withAdaptive Population-Enrichment DesignShu-Chih Su. Merck & Co.

11:55 AM Discussant: Lei Shen, Eli Lilly and Company

12:20 PM Floor Discussion.

Session 59: Design and Analysis of Non-Inferiority ClinicalTrials (Invited)Room: 324, level 3Organizers: Yongzhao Shao, New York University; Ming Zhou,Bristol-Myers Squibb Compnay.Chair: Zhezhen Jin, Columbia University.

10:40 AM Some Comments on the Three-Arm Non-inferiority TrialDesign�Ming Zhou and Sudeep Kundu. Bristol-Myers SquibbCompany

11:05 AM Non-inferiority Tests for Prognostic Models�Ning Xu and Yongzhao Shao. New York University

11:30 AM Discussant: Ming Hu, New York University11:55 AM Floor Discussion.

Session 62: Statistical Challenges in Economic Research In-volving Medical Costs (Invited)Room: 372//374, level 3Organizers: Yu Shen, M. D. Anderson Cancer Center; Ya-Chen TinaShih, M. D. Anderson Cancer Center.Chair: Jing Ning, M. D. Anderson Cancer Center.

10:40 AM Projecting Survival and Lifetime Costs from Short-TermSmoking Cessation TrialsDaniel Heitjan1,2. 1Southern Methodist University 2TheUniversity of Texas Southwestern Medical Center

11:05 AM A Flexible Model for Correlated Medical Costs, with Appli-cation to Medical Expenditure Panel Survey DataJinsong Chen1, �Lei Liu2, Tina Shih3, Daowen Zhang4

and Thomas Severini2. 1University of Illinois, Chicago2Northwestern University 3M. D. Anderson Cancer Center4North Carolina State University

11:30 AM A Bivariate Copula Random-Effects Model for Length ofStay and CostXiaoqin Tang1, Zhehui Luo2 and �Joseph Gardiner2.1Allegheny Health Network 2Michigan State University

11:55 AM Nonparametric Inference for the Joint Distribution of Recur-rent Marked Variables and Recurrent Survival TimeLaura Yee and �Gary Chan. University of Washington

12:20 PM Floor Discussion.

Session 83: Dose Response/Finding Studies in Drug Develop-ment (Invited)Room: Grey Rock, level 2Organizer: Guojun Yuan, Cubist Pharmaceuticals Inc..Chair: Naitee Ting, Boehringer-Ingelheim Pharmaceuticals Inc..

10:40 AM Calibration of Two-stage Continual Reassessment Method�Xiaoyu Jia1, Shing Lee2 and Ken Cheung2. 1Boehringer-Ingelheim Pharmaceuticals Inc. 2Columbia University

11:05 AM A Practical Application with Interim Analysis in a DoseRanging DesignXin Wang. AbbVie Inc.

11:30 AM Design Considerations in Dose Finding StudiesXin Zhao. Johnson & Johnson

11:55 AM Dose Response Relationship in a Phase 1b Dose RangingStudy in Subjects with Chronic Hepatitis C Virus InfectionDi An. Gilead Sciences, Inc.

42 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 51: Published by: International Chinese Statistical

Scientific Program (�Presenting Author) Session name

12:20 PM Floor Discussion.

Session 88: Advanced Development in Big Data AnalyticsTools (Invited)Room: 382, level 3Organizer: Yuping Zhang, University of Connecticut.Chair: Fenghai Duan, Brown University.

10:40 AM Clique-based Method for Social Network Clustering�Dipak Dey and Guang Ouyang. University of Connecticut

11:05 AM Sparse Partially Linear Additive ModelsYin Lou1, �Jacob Bien2, Rich Caruana3 and JohannesGehrke2. 1LinkedIn Corporation 2Cornell University3Microsoft Research

11:30 AM Clustering by Propagating Probabilities Between Data Points�Guojun Gan, Yuping Zhang and Dipak Dey. University ofConnecticut

11:55 AM Clustering Time Series: A PSLEX-Based Approach�Priya Kohli1, Nalini Ravishanker2 and Jane Harvill3.1Connecticut College 2University of Connecticut 3BaylorUniversity

12:20 PM Floor Discussion.

Session 91: Recent Developments of High-Dimensional DataInference and Its Applications (Invited)Room: 306, level 3Organizer: Wen Zhou, Colorado State University.Chair: Wen Zhou, Colorado State University.

10:40 AM Segmenting Multiple Time Series by Contemporaneous Lin-ear Transformation: PCA for Time Series�Jinyuan Chang1, Bin Guo2 and Qiwei Yao3. 1Universityof Melbourne 2Peking University 3London School of Eco-nomics

11:05 AM Projection Test for High-Dimensional Mean Vectors withOptimal DirectionRunze Li1, �Yuan Huang1, Lan Wang2 and Chen Xu1.1Pennsylvania State University 2University of Minnesota

11:30 AM Thresholding Tests for Signal Detection on High-Dimensional Count Distributions�Yumou Qiu1, Songxi Chen2 and Dan Nettleton2.1University of Nebraska-Lincoln 2Iowa State University

11:55 AM Projected Principal Component Analysis in Factor Models

Jianqing Fan1, Yuan Liao2 and �Weichen Wang1.1Princeton University 2University of Maryland

12:20 PM Floor Discussion.

Session C04: Multiple Comparisons, Meta-analysis, and Mis-measured Outcome Data (Contributed)Room: Virginia Dale, level 3Organizer: Peng-Liang Zhao, Sanofi-aventis U.S. LLC..Chair: Ye Shen, University of Georgia.

10:55 AM Generalized Holm’s Procedure for Multiple Testing Problem

Huajiang Li1, Yi Ma2 and �Hong Zhou3. 1Allergan, INC.2Quintiles, Inc. 3Arkansas State University

11:10 AM Generalized Confidence Interval Approach for CombiningMultiple Comparisons�Atiar Rahman and Ram Tiwari. U.S. Food and Drug Ad-ministration

11:25 AM Considerations for Two Correlated Cochran-Armitage TrendTests�Yihan Li, Su Chen, Ying Zhang and Yijie Zhou. AbbVieInc.

11:40 AM Goodness-of-fit Test for Meta-analysis�Zhongxue Chen1, Guoyi Zhang2 and Jing li1. 1IndianaUniversity 2University of New Mexico

11:55 AM Pitfalls in Assessing Relative Efficacy Across Trials

Xiao Sun. Merck & Co.

12:10 PM Floor Discussion.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 43

Page 52: Published by: International Chinese Statistical

Abstracts

Abstracts

Session 1: Best Practices for Delivery of Adaptive ClinicalTrials Illustrated with Case Studies

DIA Adaptive Design Scientific Working Group Best PracticesTeam: Objectives and Case StudiesEva MillerinVentiv Health [email protected] of the DIA ADSWG BP team are senior statisticians fromthe FDA and industry who endeavor to identify gaps in implemen-tation of adaptive trials, and promote the broader use of adaptivedesign by applying best practices. As we learned from the 2012AD Survey by Morgan, et.al and surveys undertaken by CBER andCDHR, Adaptive Designs have not grown to be as large a proportionof total clinical trials undertaken as advocates would have hoped.Barriers to adoption include concerns about regulatory acceptance,the time involved in planning, simulations, communications andteamwork, extra considerations for DMCs, and availability of ap-propriate software for planning, simulations, and implementation.In this talk, we summarize the committee’s objectives and currentpractices and evolving issues related to the use of adaptive trials,and consider what went well and what we learned from a numberof case studies: the LOTS trial, Raptor’s PRODYSBI, Merck PhaseII/III vaccine trial, the INHANCE trial, a population enrichment de-sign trial, a phase II/III seamless adaptive confirmatory trial withinterim treatment selection and the A-HeFT Trial. ADVENT andVALOR, while part of this committee’s review effort will be dis-cussed in detail by other speakers. The DIA ADSWG BP team hasas its objectives to:1. Understand AD trials that are categorized as “Less well-understood”, with regard to specific features or aspects of those tri-als that may have led them to be categorized in that way.2. Review these features in regard to design and/or the end-to-end trial execution processes, and identify areas where suggestionsand/or improvements may help in their acceptance.3. Acknowledging the importance of planning and simulations, asbest practices for ADs, the Subteam also reviews available softwarefor AD simulations and implementation. Features of software con-sidered in our review include: levels of user friendliness, ease ofuse, required user level of statistical and software usage experience,documentation, and validation.

An Adaptive Phase 3 Trial Resulting in FDA Approval of Cro-felemer�Lingyun Liu1, Zoran Antonijevic1, Cyrus Mehta1, PravinChaturvedi2 and Scott Harris21Cytel Inc.2Salix [email protected] diarrhea in HIV positive patients remains a serious un-met clinical need, even and especially in the age of highly activeanti-retroviral therapy (HAART). In 2013 Crofelemer was approvedby the FDA as a first-in-class anti-diarrheal agent indicated for thesymptomatic relief of non-infectious diarrhea in adult HIV patientson anti-retroviral therapy (ART). The safety and efficacy of crofele-mer were established through ADVENT, an innovative two-stage,seamless adaptive clinical trial with dose selection at the end of

stage 1. In this talk we will highlight the clinical, statistical, reg-ulatory and operational challenges of this adequate and well con-trolled trial. Its successful implementation reflects the high degreeof precision with which the trial was planned and executed. To ourknowledge this is the first example of an adaptive confirmatory trialwith dose selection that has proceeded all the way to NDA submis-sion and approval.

Promising Zone Design; Methodology, Strategy, and Implemen-tationZoran AntonijevicCytel [email protected]

The unblinded sample size re-assessment (uSSR) has been catego-rized as a “less-well understood” type of adaptive design in theCDER/CBER draft guidance on adaptive design. There has been,however, an increase in application of this type of design sincethe publication of this guidance. This design features a so-calledpromising zone at interim analysis. If data fall into this zone thatmeans that study is trending towards a positive efficacy, but the cur-rent sample size is not large enough to assure success with sufficientprobability. In this case the study design allows an option for a one-time increase in sample size based on pre-specified criteria. In thissession we will present a case study of a trial using the uSSR witha promising zone. The presentation will address statistical method-ology, clinical trial design, regulatory interactions and operationalchallenges.

Session 2: Chemistry, Manufacturing, and Controls(CMC) in Pharmaceuticals: Current Statistical Chal-lenges I

Statistical Methods for Analytical ComparabilityLeslie SidorAmgen [email protected]

In all manufacturing settings, there is an inherent drive to improveproduct through the reduction in process variation, implement-ing new technology, increasing efficiency, optimizing resources,and improving customer experience through innovation. In thepharmaceutical industry, these improvements come with added re-sponsibility to the patient such that product made under the post-improvement or post-change condition maintains the safety and effi-cacy of the pre-change product. Regulatory agencies also recognizethe importance in providing manufacturers the flexibility to improvetheir manufacturing processes and it is acknowledged that the extentand rigor of the evaluation should be appropriately adjusted basedon the magnitude of the change. To assess analytical comparability,there are a number of approaches that may be used and are the focusof this presentation.

How Type I Error Impacts Quality System EffectivenessJeff GardnerDataPharm Statistical & Data Management [email protected]

Univariate Shewhart control charts enjoy widespread use amongpharmaceutical and biotechnology manufacturers, particularly in

44 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 53: Published by: International Chinese Statistical

Abstracts

monitoring product quality at the end of the production work stream.A seldom-discussed aspect of using univariate Shewhart charts inthis way is the propagation of Type I error when multiple chartsare being used to monitor multiple product quality attributes. De-spite the increased risk of false signals, manufacturers integrate theuse of univariate Shewhart charts into their quality systems in sucha way as to initiate the quality investigation process whenever anout-of-control signal is observed. By doing so, companies inher-ently link the effectiveness of their quality systems to the overallType I error rate for their monitoring programs and suffer significantconsequences as a result. This presentation discusses the nature ofthese consequences and their organizational impact. The presenta-tion also presents strategies for reducing overall Type I error andmakes a case for these strategies as the best means of achievingmaximum quality investigation and CAPA effectiveness.

Alternative Procedures for Shelf Life Estimation UtilizingMixed Models�Michelle Quinlan1, Walt Stroup2 and Dave Christopher31Novartis Pharmaceutical Corporation2University of Nebraska-Lincoln3Merck & [email protected] life is the length of time a product is expected to remain withinthe established acceptance criteria. The traditional metric for as-sessing shelf life is the regression line. The criterion for determiningshelf life involves insuring that all future batches meet or exceed theestablished shelf life of the product with an acceptably high proba-bility. Given this criterion which involves prediction, batch effectsare most appropriately modeled as random. Two new procedureshave been developed which employ a model where batch effects areconsidered random: (1) utilizing batch specific Best Linear Unbi-ased Prediction (BLUP) and focusing on the batch with the shortestshelf life; and (2) utilizing BLUP combined with quantile regressiontechniques. Both procedures seek to estimate a suitably small quan-tile of the distribution of batch shelf lives while accounting for theinherent variability among batch shelf lives. The methodology andimplementation of these procedures for shelf life estimation are dis-cussed and the results are compared to previously developed shelflife estimation procedures.

Session 3: Chemistry, Manufacturing, and Controls(CMC) in Pharmaceuticals: Current Statistical Chal-lenges II

Statistical Methods for Analytical Validation of Accuracy andPrecisionRichard BurdickAmgen [email protected] recently proposed USP general information chapter numbered〈1210〉 was proposed as a companion chapter to 〈1225〉 Validationof Compendial Procedures. The purpose of 〈1210〉 is to providestatistical methods that can be used in the validation of analyti-cal methods. In this talk, we review some of these best practicesfor demonstration of accuracy and precision for analytical methods.Examples are provided for experimental designs that are typicallyemployed in method validation experiments. The importance ofpre-validation work is highlighted as well as the need to performa formal statistical test of hypotheses in order to demonstrate thatan analytical procedure is fit for use. Approaches are presented for

individual validation of accuracy and precision, as well as a simul-taneous validation of these two properties.

Statistical Applications for Biosimilar Product DevelopmentRichard MontesHospira, [email protected]

The approval of the first biosimilar by the FDA in March 2015 isan historic event for the U.S. healthcare system. The inevitable ad-vent of biosimilars in the U.S. brings challenges and opportunitiesin statistical applications. Demonstration of analytical biosimilar-ity between the biosimilar product and the reference product is theprimary new challenge especially amidst the evolving regulatorylandscape. The 2012 FDA draft biosimilars guidance documentspresent a stepwise approach starting with structural and functionalcharacterization of the biosimilar and reference products. RecentFDA feedback includes recommendations for a tiered statistical ap-proach – equivalence testing for Tier 1, quality range for Tier 2,and graphical comparison for Tier 3. The demonstration of analyt-ical biosimilarity through comparative analytical testing and robuststatistical analysis is an important component of the assessment oftotality of evidence to support biosimilar product approval. In addi-tion to the analytical biosimilarity assessment, the standard CMC re-quirements for new biologic product licensing applications are alsoneeded for biosimilar product applications. Requirements such asstability analysis for shelf-life estimation and specification settingrely heavily on statistical methodologies. As a consequence of theearly characterization work requisite for the analytical biosimilarityassessment, the available analytical data is considerably larger thanthe minimum of 3 stability lots required for a new biologic. Thelarger body of data in biosimilar product offers the statistician moreoptions to utilize statistical methodologies that results in improvedstatistical inference.This presentation will cover two main areas. First, the applicationof the FDA tiered statistical approach for evaluation of analyticalbiosimilarity data will be illustrated. Second, a statistical methodfor setting release and shelf-life specification limits taking into ac-count the random lot effects will be explored. Specifically, a hier-archical (multilevel) regression modeling will be used to estimatefixed and random effects parameters which are then used in simula-tion to quantify the limits. The simulation will be compared againstother ad hoc industry methods to calculate specification limits.

How to Set Up Biosimilarity Bounds in Biosimilar Product De-velopment�Lanju ZhangAbbVie [email protected]

Abstract is Pending.

Session 4: New Techniques for Functional and Longitu-dinal Data Analysis

Variable Selection Methods for Functional Regression ModelsNedret BillorAuburn [email protected]

In the last fifteen years, a substantial amount of attention has beendrawn to the field of functional data analysis. While the study of theprobabilistic tools for infinite dimensional variables started in thebeginning of the 20th century, the development of statistical mod-els and methods for functional data has only really been developed

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 45

Page 54: Published by: International Chinese Statistical

Abstracts

in the last two decades since many scientific fields involving ap-plied statistics have started measuring and recording massive con-tinuous data due to rapid technological advancements. Functionallinear regression is widely used technique to explore the relation-ship between a scalar response and functional predictors, where alarge number of functional predictors is collected on few subjectsare becoming increasingly common. In order to select importantfunctional predictors among many that are actually useful in pre-dicting the response, variable selection methods are utilized. In thistalk we review variable selection methods based on L1 regulariza-tion where selection of the important functional predictors and es-timation of the corresponding regression coefficient functions willbe done simultaneously. Further, the effect of outliers in variableselection procedures for functional regression models are discussedand some new developments in variable selection methods whichare resistant to outliers are proposed.

Structured Functional Principal Component Analysis in Multi-level Functional Mixed Models for Physical Activity Data�Haochang Shou1, Vadim Zipunnikov2, Ciprian Crainiceanu2 andSonja Greven3

1University of Pennsylvania2Johns Hopkins University3Ludwig-Maximilians-Universit at [email protected] measures for biomedical research in the past decade are col-lected in many modern observational studies to understand normaland abnormal mental status. Physical activity data using wearablecomputing sensors, for example, link the direct phenotype of hu-man behaviors with disease symptoms and diagnosis. These data,objectively assessed and are often recorded with higher samplingfrequency that capture real-time events, have also brought manychallenges for statistical modeling and analysis. More specifically,the challenges include: 1) the fundamental observational unit is of-ten a function of time or space that can be of high dimension; 2)there are underlying correlation structures induced by the samplingdesign; and 3) the measures could be noisy depending on the tech-niques, while individual levels of variability are of interest. To ac-count for the complexity, we propose a principal component analy-sis based approach that accounts for the natural inheritance of cor-relation structures from the sampling design. This is achieved byintroducing latent processes that capture explicit levels of variabil-ity using the same concept from standard mixed effects models, butby replacing random effects with random processes. Our methoddecomposes multilevel variations in functional models and achievesfeature extraction via level-specific spectral decomposition of latentprocess covariance operators. A computationally fast and scalableestimation procedure through rank preserved projection is devel-oped for high-dimensional data. We illustrate the approach withaccelerometry data from National Institute of Mental Health familystudies of affective spectrum disorders. Similar modeling approachcan also be used to correct measurement errors and improve relia-bility in the functional objects.

Exploration of Diurnal Patterns in Maize Leaf with RNA-sequencing DataWen Zhou1, �Peng Liu2, Lin Wang3 and Thomas Brutnell41Colorado State University2Iowa State University3Monsanto company4Donald Danforth Plant Science [email protected]

To study the diurnal patterns of gene expression profiles along maizeleaf development, a set of RNA-sequencing experiments were doneover 24 hours with samples taken every two hours from four differ-ent sections of maize leaves. We explore the diurnal patterns usinga semi-parametric method through smoothing splines and a flexiblemathematical model. Interesting features of the diurnal patterns aredescribed through geometric properties of the mathematical model.In this talk, we will provide some preliminary results of our analy-sis.

Partial and Tensor Quantile Regressions in Functional DataAnalysis�Dengdeng Yu, Linglong Kong and Ivan MizeraUniversity of [email protected]

In functional linear quantile regression model, we are interested inhow to effectively and efficiently extract the bases for estimatingfunctional coefficients. Therefore, we propose a prediction proce-dure using partial quantile covariance techniques to extract the func-tional bases effectively by sequentially maximizing the partial quan-tile covariance between the response and projections of functionalcovariates. Moreover, we develop an efficient algorithm for the pro-cedure. Under the homoscedasticity assumption, we further extendour method to functional composite quantile regression by using thecomposite quantile covariance, and obtain the corresponding algo-rithm. In functional linear quantile regression model, the functionalcoefficients may have multidimensional structure. To make efficientpredictions without losing the structure information, we also pro-pose a prediction procedure using tensor linear quantile regression.In addition, simulations and real data are studied to show the supe-riority of our proposed methods. This is a joint work with LinglongKong and Ivan Mizera at University of Alberta.

Session 5: Recent Advancements in Statistical MachineLearning

Sparse CCA: Minimax Rates and Adaptive Estimation�Chao Gao1, Zongming Ma2 and Harrison Zhou1

1Yale University2University of [email protected]

Canonical correlation analysis (CCA) is a classical and importantmultivariate technique for exploring the relationship between twosets of variables. It has applications in many fields including ge-nomics and imaging, to extract meaningful features as well as to usethe features for subsequent analysis. This talk presents the minimaxrates and adaptive estimation of leading sparse canonical directionswhen the ambient dimensions are high. We show that the mini-max rates do not depend on the marginal covariance matrices. Theoptimal rates can be achieved by a two-stage convex programmingprocedure.

Asymptotic Normality in Estimation of Large Ising GraphicalModels�Zhao Ren1, Cun-Hui Zhang2 and Harrison Zhou3

1University of Pittsburgh1University of Pittsburgh2Rutgers University3Yale [email protected]

The high dimensional graphical model, a powerful tool for study-ing conditional dependency relationship of random variables, has

46 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 55: Published by: International Chinese Statistical

Abstracts

attracted great attention in recent years. This paper investigates sta-tistical inference of each edge for some families of large graphicalmodels which include Ising graphical model and Gaussian graphicalmodel as two special cases. Unlike the Gaussian graphical model, ingeneral there is no explicit correspondence between the structure ofthe graph and precision matrix of the underlying data. Hence this in-ference problem is different from inference of precision matrix andis very challenging. In this paper, we propose a novel estimator ofeach edge and show that, under certain sparsity assumption, our esti-mator is asymptotically normal and has parametric square-root ratein a large graphical model. Our proof applies a linearization ideaand a novel projection procedure which is motivated by statisticalinference in high dimensional regression. A careful analysis of thisnew methodology relaxes the commonly imposed sparsity assump-tion, uniform signal strength condition, bounded maximum neigh-borhood weight and incoherence condition in the literature of largeIsing graphical model. This is a joint work with Cun-Hui Zhang andHarrison Zhou.

Optimal Tests of Independence with Applications to TestingMore Structures�Fang Han1 and Han Liu2

1Johns Hopkins University2Princeton [email protected]

We consider the problem of testing mutual independence of all en-tries in a d-dimensional random vector based on n independent ob-servations with d possibly larger than n. For this, we consider twofamilies of distribution-free test statistics that converge weakly toan extreme value type I distribution. We further study the powersof the corresponding tests against the sparse alternative and justifycertain optimality. As important examples, we show that the testsbased on Kendall’s tau and Spearman’s rho are rate optimal tests ofindependence. For further generalization, we consider acceleratingthe rate of convergence via approximating the exact distributions ofthe test statistics. We also study the tests of two more structuralhypotheses: m-dependence and data homogeneity. For these, wepropose two rank-based tests and show their optimality.

Bootstrap Tests on High Dimensional Covariance Matrices withApplications to Understanding Gene Clustering�Wen Zhou1, Jinyuan Chang2 and Wenxin Zhou2

1Colorado State University2University of [email protected]

Recent advancements in genomic study and clinical research havedrew growing attention to understanding how relationships amonggenes, such as dependencies or co-regulations, vary between differ-ent biological states. Complex and unknown dependency amonggenes, along with the large number of measurements, imposesmethodological challenge in studying genes relationships betweendifferent states. Starting from an interrelated problem, we proposea bootstrap procedure for testing the equality of two unspecifiedcovariance matrices in high dimensions, which turns out to be animportant tool in understanding the change of gene relationshipsbetween states. The two-sample bootstrap test takes maximum ad-vantage of the dependence structures given the data, and gives riseto powerful tests with desirable size in finite samples. The theo-retical and numerical studies show that the bootstrap test is pow-erful against sparse alternatives and more importantly, it is robustagainst highly correlated and nonparametric sampling distributions.Encouraged by the wide applicability of the proposed bootstrap test,

we design a gene clustering algorithm to understand gene clusteringstructures. We apply the bootstrap test and gene clustering algo-rithm to the analysis of a human asthma dataset, for which someinteresting biological implications are discussed.

Session 6: Recent Advances in Analyzing Genomic Data

Allele-specific Copy Number Profiling by Next-generation DNASequencing�Hao Chen1, John Bell2, Nicolas Zavala2, Hanlee Ji2 and NancyZhang3

1University of California, Davis2Stanford University3University of [email protected]

The progression and clonal development of tumors often involveamplifications and deletions of genomic DNA. Estimation of allele-specific copy number, which quantifies the number of copies of eachallele at each variant loci rather than the total number of chromo-some copies, is an important step in giving a more complete por-trait of tumor genomes and the inference of their clonal history. Wepropose a novel method, falcon, for finding somatic allele-specificcopy number changes by next generation sequencing of tumors withmatched normal. Falcon is based on a change-point model on a bi-variate mixed Binomial process, which explicitly models the copynumbers of the two chromosome haplotypes and corrects for localallele-specific coverage biases. By using the Binomial distributionrather than a normal approximation, falcon more effectively poolsevidence from sites with low coverage. We applied this method inthe analysis of a pre-malignant colon tumor sample and late-stagecolorectal adenocarcinoma from the same individual. The allele-specific copy number estimates obtained by falcon allow us to drawdetailed conclusions regarding the clonal history of the individual’scolon cancer.

A Statistical Approach to Prioritizing GWAS Results by Inte-grating Pleiotropy and Genomic Annotation Data�Dongjun Chung1, Can Yang2, Cong Li3, Joel Gelernter3 andHongyu Zhao3

1Medical University of South Carolina2Hong Kong Baptist University3Yale [email protected]

Results from Genome-Wide Association Studies (GWAS) haveshown that complex diseases are often affected by many geneticvariants with small or moderate effects. Identification of these riskvariants remains a very challenging problem. Hence, there is a needto develop more powerful statistical methods to leverage availableinformation to improve upon traditional approaches that focus ona single GWAS dataset without incorporating additional data. Inthis presentation, I will discuss our novel statistical approach, GPA(Genetic analysis incorporating Pleiotropy and Annotation), to in-crease statistical power to identify risk variants through joint anal-ysis of multiple GWAS data sets and annotation information. Ourapproach is motivated by the observations that (1) accumulating ev-idence suggests that different complex diseases share common riskbases, i.e., pleiotropy; and (2) functionally annotated variants havebeen consistently demonstrated to be enriched among GWAS hits.GPA can integrate multiple GWAS datasets and functional annota-tions to identify association signals, and it can also perform hypoth-esis testing to test the presence of pleiotropy and the enrichment of

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 47

Page 56: Published by: International Chinese Statistical

Abstracts

functional annotation. I will discuss the power of GPA with its ap-plication to real GWAS datasets with various functional annotationsand the simulation studies.

GLAD: A Mixed-membership Model for Heterogeneous TumorSubtype ClassificationHachem Saddiki1, Jon Mcauliffe2 and �Patrick Flaherty31Worcester Polytechnic Institute2University of California, Berkeley3University of [email protected] analyses of many solid cancers have demonstrated exten-sive genetic heterogeneity between as well as within individual tu-mors. However, statistical methods for classifying tumors by sub-type based on genomic biomarkers generally entail an all-or-nonedecision, which may be misleading for clinical samples contain-ing a mixture of subtypes and/or normal cell contamination. Wehave developed a mixed-membership classification model, calledGLAD, that simultaneously learns a sparse biomarker signature foreach subtype as well as a distribution over subtypes for each sam-ple. We demonstrate the accuracy of this model on simulated data,in-vitro mixture experiments, and clinical samples from the CancerGenome Atlas (TCGA) project. We show that many TCGA samplesare likely a mixture of multiple subtypes.

Learning Genetic Regulatory Networks Using RNA-seq data�Jie Peng1 and Ru Wang2

1University of California, [email protected] this talk we will discuss constructing genetic regulatory net-works by graphical models using RNA-seq data. Since RNA-seqare counts data, many commonly used methods for graphical mod-els are not directly applicable as they assume multivariate normaldistributions. We will tackle this problem by modeling the joint dis-tribution of RNA-seq counts through a hierarchical model. We willdiscuss algorithms for fitting this model as well as an application toa data set consisting of RNA-seq libraries for 76 introgression linesof tomato.

Session 7: Scalable Multivariate Statistical Learning withMassive Data

False Discovery Control under Unknown DependenceJianqing Fan1 and �Xu Han2

1Princeton University2Temple [email protected]

Multiple hypothesis testing is a fundamental problem in high di-mensional inference, with wide applications in many scientificfields. In genome-wide association studies, tens of thousands ofhypotheses are tested simultaneously to find if any genes are asso-ciated with some traits. In practice, these tests are correlated. Falsediscovery control under general covariance dependence is a verychallenging and important open problem in the modern research.In this talk, we extend our principal factor approximation approach(PFA) that was designed for the known covariance matrix case toa more general situation where the covariance dependence is un-known. In practice, this unknown dependence has to be estimatedfirst, and the estimation accuracy can greatly affect the convergenceof FDP or even violate its consistency. We will give conditions onthe dependence structures and estimation procedures such that the

estimate of FDP is consistent. Such dependence structures includesparse covariance matrices and strong dependence matrices, whichencompass most practical situations. The finite sample performanceof our procedure is critically evaluated by various simulation stud-ies. Our approach is further illustrated by some real data in breastcancer research.

Sparse CCA: Adaptive Estimation and Computational BarriersChao Gao1, �Zongming Ma2 and Harrison Zhou1

1Yale University2University of [email protected]

Canonical correlation analysis (CCA) is a classical and importantmultivariate technique for exploring the relationship between twosets of variables. It has applications in many fields including ge-nomics and imaging, to extract meaningful features as well as to usethe features for subsequent analysis. This paper considers adaptiveand computationally tractable estimation of leading sparse canon-ical directions when the ambient dimensions are high. Three in-trinsically related problems are studied to fully address the topic.First, we establish the minimax rates of the problem under predic-tion loss. Separate minimax rates are obtained for canonical direc-tions of each set of random variables under mild conditions. Thereis no structural assumption needed on the marginal covariance ma-trices as long as they are well conditioned. Second, we proposea computationally feasible two-stage estimation procedure, whichconsists of a convex programming based initialization stage and agroup-Lasso based refinement stage, to attain the minimax rates un-der an additional sample size condition. Finally, we provide evi-dence that the additional sample size condition is essentially nec-essary for any randomized polynomial-time estimator to be consis-tent, assuming hardness of the Planted Clique detection problem.The computational lower bound is faithful to the Gaussian modelsused in the paper, which is achieved by a novel construction of thereduction scheme and an asymptotic equivalence theory for Gaus-sian discretization that is necessary for computational complexityto be well-defined. As a byproduct, we also obtain computationallower bound for the sparse PCA problem under the Gaussian spikedcovariance model. This bridges a gap in the sparse PCA literature.

A Class of Accelerated MM Algorithms for Scalable Optimiza-tionYiyuan SheFlorida State [email protected]

Due to the explosion of large-scale datasets in statistical applica-tions, people often favor first-order optimization methods to obtainan estimator in complex learning tasks. We study a class of esti-mators following an MM algorithm design by use of Bregman di-vergences and shrinkage functions. Perhaps interestingly, in thishigh dimensional nonconvex setting, a series of acceleration tech-niques are still effective and result in fast convergence rates. Non-asymptotic results on the statistical accuracy and computational ac-curacy are presented.

Innovated Interaction Screening for High-Dimensional Nonlin-ear ClassificationYingying Fan, �Yinfei Kong, Daoji Li and Zemin ZhengUniversity of Southern [email protected]

This paper is concerned with the problems of interaction screeningand nonlinear classification in high-dimensional setting. We pro-pose a two-step procedure, IIS-SQDA, where in the first step an

48 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 57: Published by: International Chinese Statistical

Abstracts

innovated interaction screening (IIS) approach based on transform-ing the original p-dimensional feature vector is proposed, and in thesecond step a sparse quadratic discriminant analysis (SQDA) is pro-posed for further selecting important interactions and main effectsand simultaneously conducting classification. Our IIS approachscreens important interactions by examining only p features insteadof all two-way interactions of order O(p2). Our theory shows thatthe proposed method enjoys sure screening property in interactionselection in the high-dimensional setting of p growing exponentiallywith the sample size. In the selection and classification step, we es-tablish a sparse inequality on the estimated coefficient vector forQDA and prove that the classification error of our procedure can beupper-bounded by the oracle classification error plus some smallerorder term. Extensive simulation studies and real data analysis showthat our proposal compares favorably with existing methods in in-teraction selection and high-dimensional classification.

Session 8: New Statistical Advance in Genomics andHealth Science Applications

Linking Lung Airway Structure to Pulmonary Function via Hi-erarchical Feature Selection�Kun Chen1, Eric Hoffman2, Indu Seetharaman3, Feiran Jiao2,Ching-Long Lin2 and Kung-Sik Chan2

1University of Connecticut2University of Iowa3Kansas State [email protected] human lung airway is a complex inverted tree-like structureresulting from repeated segmental bifurcations/trifurcations for upto 28 segmental generations starting from the trachea. The airwaymeasurements extracted from CT lung images for each lung com-prise a large number of features for each airway segment, e.g., seg-mental wall thickness, airway diameter, lumen area, parent-childbranch angles, etc. The wealth of lung airway data provides aunique opportunity for advancing our understanding of the funda-mental structure-function relationships within the lung. An impor-tant problem is to construct and identify significant lung airwayfeatures in normal subjects and connect these to standardized pul-monary function test data such as the percent predicted forced ex-piratory volume in one second (FEV1%). Because of many uniquefeatures of the high-dimensional lung airway measurements, cus-tomized dimension reduction approaches for constructing and se-lecting interpretable airway features are required. In particular, theproblem is complicated by the fact that a particular airway featuremay be an important predictor only when it pertains to segments ofcertain generations. Thus, the key is an efficient, consistent methodfor simultaneously conducting group selection (lung airway featuretypes) and within-group variable selection (airway generations), i.e.,bi-level selection. Here we streamline a comprehensive procedureto processing the lung airway data via imputation, normalization,transformation and groupwise principal component analysis, andthen adopt a new composite penalized regression approach for con-ducting bi-level feature selection. As a prototype of the compositepenalization approaches, the proposed composite bridge regressionmethod is shown to admit an efficient algorithm, enjoy bi-level ora-cle properties, and outperform several existing methods when groupmisspecification exists. We analyze the CT lung image data from acohort of 132 subjects with normal lung function. Our results showthat, lung function in terms of FEV1% is promoted by having a less

dense and more homogeneous lung comprising an airway whosesegments enjoy more heterogeneity in wall thicknesses, larger meandiameters, lumen areas and branch angles. These data hold the po-tential of defining more accurately the normal subject populationwith borderline atypical lung functions that are clearly influencedby many genetic and environmental factors.

Imputing Transcriptome of Inaccessible Tissues In and Beyondthe GTEx ProjectJiebiao Wang1, Eric Gamazon2, Barbara Stranger1, Hae kyungIm1, Nancy Cox2, Dan l Nicolae1 and �Lin Chen1

1University of Chicago2Vanderbilt [email protected]

Gene expression and its regulation largely depend on cell context.However, due to tissue accessibility, large-scale gene expressionstudies often measure expression profiles in the peripheral wholeblood or its derivatives. In order to synthesize new knowledgeabout the organization of gene expression across human tissues,the Genotype-Tissue Expression (GTEx) project collected the tran-scriptome data in a wide variety of human tissues from a largenumbers of post-mortem donors. By analyzing data from nine se-lected model tissues in the GTEx pilot project, we show that in-accessible/uncollected transcriptome data can be imputed by har-nessing rich information in the GTEx, and that it is feasible to useGTEx data as a reference and impute inaccessible/uncollected tis-sues in future studies. We propose a multi-tissue imputation ap-proach ‘Robust Imputation of Multi-tissue Expression incorporat-ing EQTLs’ (RIMEE), evaluate its performance and compare it toexisting imputing methods, and suggest cost-effective strategies forfuture multi-tissue expression studies.

Efficient Variance Component Estimation with the Haseman El-ston Approximate RegressionXiang ZhouUniversity of [email protected]

Linear mixed models (LMMs) have attracted considerable recent in-terest, and have been widely applied in geneticstudies to dissect thegenetic architecture of many common diseases and complex traits.However, maximal likelihood estimation of the variance compo-nents in LMMs is computationally expensive and requires a stepthat scales cubically with the sample size, hence is not applicableto large samples. Here, we present a method of moment approachto address this computational bottleneck. Our method not only iscomputationally fast and scales quadratically with the sample size,but also requires only summary statistics, allowing it to be appliedto large consortium studies where raw genotype and phenotype dataare often unavailable. With realistic simulations, we show that ourmethod produces unbiased estimates with only small lose of effi-ciency compared with maximal likelihood estimates. With two realdata applications, we show that our method is useful in partition-ing phenotypic variance into different chromosomes or functionalgenomic annotations. Our method is implemented in the GEMMAsoftware package, freely available at www.xzlab.org/software.html.

Improved Ancestry Estimation for both Genotyping and Se-quencing Data using Projection Procrustes Analysis and Geno-type Imputation�Chaolong Wang1, Xiaowei Zhan2, Liming Liang3, GoncaloAbecasis4 and Xihong Lin3

1Genome Institute of Singapore2The University of Texas Southwestern Medical Center

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 49

Page 58: Published by: International Chinese Statistical

Abstracts

3Harvard University4University of [email protected]

Accurate estimation of individual ancestry is important in geneticassociation studies, especially when a large number of samples arecollected from multiple sources. However, existing approaches de-veloped for genome-wide SNP data do not work well with mod-est amounts of genetic data, such as in targeted sequencing or ex-omechip genotyping experiments. We propose a statistical frame-work to estimate individual ancestry in a principal component an-cestry map generated by a reference set of individuals. This frame-work extends and improves upon our previous method for estimat-ing ancestry using low-coverage sequence reads (LASER 1.0) toanalyze either genotyping or sequencing data. In particular, we in-troduce a projection Procrustes analysis approach that uses high-dimensional principal components to estimate ancestry in a low-dimensional reference space. Using extensive simulations and em-pirical data examples, we show that our new method (LASER 2.0),combined with genotype imputation on the reference individuals,can substantially outperform LASER 1.0 in estimating fine-scalegenetic ancestry. Specifically, LASER 2.0 can accurately estimatefine-scale ancestry within Europe using either exomechip geno-types or targeted sequencing data with off-target coverage as lowas 0.05X. Under the framework of LASER 2.0, we can estimate in-dividual ancestry in a shared reference space for samples assayed atdifferent loci or by different techniques. Therefore, our ancestry es-timation procedure will accelerate discovery in disease associationstudies not only by helping model ancestry within individual stud-ies but also by facilitating combined analysis of genetic data frommultiple sources.

Session 9: SII Special Invited Session on ModernBayesian Statistics I

Binary State Space Mixed Models with Flexible Link Functions:a Case Study on Deep Brain Stimulation on Attention ReactionTimeCarlos Abanto-Valle1, Dipak Dey2 and �Xun Jiang3

1Universidade Federal do Rio de Janeiro2University of Connecticut3Amgen [email protected]

State space models (SSM) for binary time series data usinga flexibleskewed link functions are introduced in this paper. Commonly usedlogit, cloglog and loglog links are prone to link misspecification be-cause of their fixed skewness. Here we introduce two flexible linksas alternatives, they are the generalized extreme value (GEV) andthe symmetric power logit (SPLOGIT) links. Markov chain MonteCarlo (MCMC) methods for Bayesian analysis of SSM with theselinks are implemented using the JAGS package, a freely availablesoftware. Model comparison relies on the deviance information cri-terion (DIC). The flexibility of the proposed model is illustrated tomeasure effects of deep brain stimulation (DBS) on attention of amacaque monkey performing a reaction-time task. Empirical re-sults showed that the flexible links fit better over the usual logit andcloglog links.

Bayesian Semi-parametric Joint Modeling of Biomarker Datawith a Latent Changepoint: Assessing the Temporal Perfor-mance of Enzyme-Linked Immunosorbent Assay (ELISA) Test-

ing for Paratuberculosis�Michelle Norris1, Wesley Johnson2 and Ian Gardner31California State University, Sacramento2University of California, Irvine3University of Prince Edward [email protected]

We develop a joint model for longitudinal biomarker data with onebinary and one continuous variable for the purpose of quantifyingthe diagnostic capabilities of the data. We treat the no gold standardcase where the actual timing of infection is unknown and must beestimated from the data. We incorporate random effects to allow forsubject-specific, post-infection trajectories and model the randomeffects distribution using a nonparametric Dirichlet Process mixtureto allow additional flexibility. The model is applied to the prob-lem of diagnosing Johne’s Disease in cattle. Applying the model tothese data showed that there are two clusters of cows in the data, onehaving a slower serology reaction to infection and the other havinga more rapid reaction.

Inference Functions in High-Dimensional Bayesian InferenceJuhee Lee1 and �Steven Maceachern2

1University of California, Santa Cruz2The Ohio State [email protected]

Nonparametric Bayesian models, such as those based on the Dirich-let process or its many variants, provide a flexible class of modelsthat allow us to fit widely varying patterns in data. Typical uses ofthe models include relatively low dimensional driving terms to cap-ture global features of the data along with a nonparametric structureto capture local features. The models are particularly good at han-dling outliers, a common form of local behavior, and examinationof the posterior often shows that a portion of the model is chasingthe outliers. This suggests the need for robust inference to discountthe impact of the outliers on the overall analysis. We advocate theuse of inference functions to define relevant parameters that are ro-bust to the deficiencies in the model and illustrate their use in twoexamples.

Quantile Regression for Censored Mixed-Effects Models withApplications to HIV studies�Victor Hugo Lachos Davila1, Ming-Hui Chen2, Carlos A. Abanto-Valle3 and Caio L. Azevedo1

1University of Campinas2University of Connecticut3Universidade Federal do Rio de [email protected]

HIV RNA viral load measures are often subjected to some up-per and lower detection limits depending on the quantification as-says. Hence, the responses are either left or right censored. Lin-ear/nonlinear mixed-effects models, with slight modifications to ac-commodate censoring, are routinely used to analyze this type ofdata. Usually, the inference procedures are based on normality (orelliptical distribution) assumptions for the random terms. However,those analyses might not provide robust inference when the distri-bution assumptions are questionable. In this paper, we discuss afully Bayesian quantile regression inference using Markov ChainMonte Carlo (MCMC) methods for longitudinal data models withrandom effects and censored responses. Compared to the conven-tional mean regression approach, quantile regression can character-ize the entire conditional distribution of the outcome variable, andis more robust to outliers and misspecification of the error distribu-tion. Under the ssumption that the error term follows an asymmetricLaplace distribution, we develop a hierarchical Bayesian model and

50 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 59: Published by: International Chinese Statistical

Abstracts

obtain the posterior distribution of unknown parameters at the pthlevel, with the median regression (p=0.5) as a special case. The pro-posed procedures are illustrated with two HIV AIDS studies on vi-ral loads that were initially analyzed using the typical normal (cen-sored) mean regression mixed-effects models, as well as a simula-tion study.

Session 10: SII Special Invited Session on ModernBayesian Statistics II

A Bayes Testing Approach to Metagenomic Profiling in BacteriaBertrand Clarke1, Camilo Valdes2, Adrian Dobra3 and �JenniferClarke11University of Nebraska-Lincoln2University of Miami3University of [email protected]

Using next generation sequencing (NGS) data, we use a multino-mial with a Dirichlet prior to detect the presence ofbacteria in ametagenomic sample via marginal Bayes testing for each bacterialstrain. The NGS reads per strain are counted fractionally with eachread contributing an equal amount to each strain it might represent.The threshold for detection is strain-dependent and we apply a cor-rection for the dependence amongst the (NGS) reads by finding theknee in a curve representing a tradeoff between detecting too manystrains and not enough strains. As a check, we evaluate the jointposterior probabilities for the presence of two strains of bacteriaand find relatively little dependence. We apply our techniques totwo data sets and compare our results with the results found by theHuman Microbiome Project. We conclude with a discussion of theissues surrounding multiple corrections in a Bayes context.

Nonparametric Bayesian Functional Clustering for Time-Course Microarray DataZiwen Wei1 and �Lynn Kuo2

1Merck & Co.2University of [email protected]

Time-course microarray experiments track gene expression levelsacross several time points. They provide valuable insights intogenome-wide dynamic aspects of generegulations. We focus ongene clustering analysis in this paper. We explore a nonparamet-ric Bayesian method for constructing clusters in a functional spacefrom the characteristics of gene profiles. In particular, we modeleach gene profile using a B-spline basis. So each gene is character-ized by the basis coefficients of the spline fitting. Then we place aDirichlet process prior on the basis coefficients to determine clus-ters of the genes. We essentially construct a hierarchical Dirichletprocesses mixing model that assigns genes into the same cluster ifthey share the same latent basis coefficients. A simulation study isconducted to compare the proposed method to the K-means clus-tering method, a model-based clustering method (MCLUST), anda two-stage version of them in terms of the adjusted Rand index.We show our new method has better adjusted Rand index numberamong all these methods. We apply this nonparametric Bayesianclustering method to a real data set with 6 time points to gain furtherinsights into how genes with similar profiles are clustered togetherand we find their functional annotation in Gene-Ontology groupsusing GOstats.

A Bayesian Approach to Identify Genes and Gene-level SNP Ag-gregates in a Genetic Analysis of Cancer DataFrancesco Stingo1, �Michael Swartz2 and Marina Vannucci31M. D. Anderson Cancer Center2The University of Texas School of Public Health3Rice [email protected]

Complex diseases, such as cancer, birth defects, and cardiovasculardisease, arise from complex interplay of multiple genetic factors.Univariate analyses to identify genes cannot model this complex in-terplay, and small contributors to risk could be missed. With com-plex diseases multiple genetic markers, known as single nucleotidepolymorphisms (SNPs), work in concert rather than isolation to af-fect risk. As a result, researchers investigating complex diseasesturn to multivariate methods that can analyze groups of SNPs. Whenconsidering multiple SNPs simultaneously, we can also capitalizeon additional biological information, such as biological groupingand genetic correlation. In this talk we will discuss a novel Bayesianmodeling approach to identify SNPs associated with disease. Wecombine pathway based approaches to analyze multiple SNPs ina region of interest. Our model uses a gene level score based onSNP allele frequencies and use the linear modeling framework tomodel association between the SNP level scores and disease risk.Our gene scores use weights based on genotype frequencies so rarergenotypes have more weight in the score. We also employ Markovrandom field priors that accounts for genetic correlation structures.This method was motivated by a lung cancer data set, and a basicintroduction to relevant genetic concepts will be included in the talk.

Adjusting Nonresponse Bias in Small Area Estimation withoutCovariates via a Bayesian Spatial Model�Xiaoming Gao1, Chong He2 and Dongchu Sun2

1Missouri Department of Conservation2University of [email protected]

Sometimes a survey sample is drawn from a large area even if theestimate of interest is at a smaller subdomain level. This strategy,however necessary, may cause small sample problems. The esti-mation problem is further complicated by survey nonresponse. Webuild a Bayesian hierarchical spatial model that takes into accountboth small sample size and nonresponse. This Bayesian modelgives the estimates of marginal satisfaction rates at subdomainseven when there is no covariate available via modeling the phase-specific response rates and conditional satisfaction rates given re-sponse status at subdomains. This method is illustrated using datafrom the 2001 Missouri Deer Hunter Attitude Survey. Satisfaction,in this survey, refers to whether respondents were satisfied with theMissouri Department of Conservation’s deer management program.The estimated satisfaction rates are lower after adjusting for non-response bias compared to the satisfaction rates based only on re-sponses.

Session 11: Emerging Issues in Time-to-Event Data

Bayesian Path Specific Frailty Models for Multi-state SurvivalData with ApplicationsMario De castro1, �Ming-Hui Chen2 and Yuanye Zhang3

1Universidade de Sao Paulo2University of Connecticut3Novartis Pharmaceutical [email protected]

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 51

Page 60: Published by: International Chinese Statistical

Abstracts

Multi-state models can be viewed as generalizations of both thestandard and competing risks models for survival data. Modelsfor multi-state data have been the theme of many recent publishedworks. Motivated by bone marrow transplant data, we propose aBayesian model using the gap times between two successive eventsin a path of events experienced by a subject. Path specific frailtiesare introduced to capture the dependence structure of the gap timesin the paths with two or more states. Under improper prior dis-tributions for the parameters, we establish propriety of the posteriordistribution. An efficient Gibbs sampling algorithm is developed fordrawing samples from the posterior distribution. An extensive sim-ulation study is carried out to examine the empirical performance ofthe proposed approach. A bone marrow transplant data set is ana-lyzed in detail to further demonstrate the proposed methodology.

Quantile Association for Bivariate Survival Data�Ruosha Li1, Yu Cheng2, Qingxia Chen3 and Jason Fine41The University of Texas School of Public Health2University of Pittsburgh3Vanderbilt University4The University of North Carolina at Chapel [email protected] survival data arise frequently in familial association stud-ies of chronic disease onset, as well as in clinical trials and obser-vational studies with multiple time to event endpoints. The associ-ation between two event times is often scientifically important. Inthis paper, we examine the association via a novel quantile associa-tion measure, which describes the dynamic association as a functionof the quantile levels. The quantile association measure is free ofmarginal distributions, allowing direct evaluation of the underlyingassociation pattern at different locations of the event times. We pro-pose a nonparametric estimator for quantile association, as well asa semiparametric estimator that is superior in smoothness and effi-ciency. The proposed methods possess desirable asymptotic prop-erties including uniform consistency androot-n convergence. Theydemonstrate satisfactory numerical performances under a range ofdependence structures. An application of our methods suggests in-teresting association patterns between time to myocardial infarctionand time to stroke in an atherosclerosis study.

Robust Estimation for Clustered Failure Time Data: Applicatonto Huntington’s Disease�Tanya Garcia1, Yanyuan Ma2, Yuanjia Wang3 and Karen Marder31Texas A&M University2University of South Carolina3Columbia [email protected] important goal in clinical and statistical research is estimatingthedistribution for clustered failure times, which have a natural intra-class dependency and are subject to censoring. We propose to han-dle these inherent challenges with a novel approach that does notimpose restrictive modeling or distributional assumptions. Rather,using a logit transformation, we relate the distribution for clusteredfailure times to covariates and a random, subject-specific effect suchthat the covariates are modeled with unknown functional forms, andthe random effect is distribution-free and potentially correlated withthe covariates. Over a range of time points, the model is shown tobe reminiscent of an additive logistic mixed effect model. Sucha structure allows us to handle censoring via pseudo-value regres-sion and to develop semiparametric techniques that completely fac-tor out the unknown random effect. We show both theoretically andempirically that the resulting estimator is consistent for any choice

of random effect distribution and for any dependency structure be-tween the random effect and covariates. Lastly, we illustrate themethod’s utility in an application to the Cooperative Huntington’sObservational Research Trial data, where our method provides newinsights into differences between motor and cognitive impairmentevent times in subjects at risk for Huntington’s disease.

Session 12: Taiwan National Health Database

Effective Analysis of Primary Preventive Anti-HBV Medicineto Prevent Hepatitis Reactivation in Cancer Patients Undergo-ing Chemotherapy Using National Health Insurance Data Baseand Cancer Registry DataRuey-Kuen Hsieh and �Wen-Kuei ChienTaipei Medical [email protected]

Introduction: Use of anti-HBV medications to prevent hepati-tis reactivation for cancer patients with HBV carrier state duringchemotherapy was reimbursed in Taiwan since 2009 for all cancertypes. This universal coverage policy was not embraced by the Can-cer Society but suggested by Hepatology Society. This analysis us-ing national insurance data base and cancer registry data base willtry to delineate the effectiveness of this policy in patients with com-mon solid tumors (Breast, lung and colorectal).Methods: Data from the National health insurance data base (2006-2012), Cancer registry data (2006-2012) and Cause of Death Data(2006-2012) were used. We followed the patients, aged 10 andabove, who have common solid tumors (Breast, lung and colorec-tal) undergoing chemotherapy to see whether the primary preventiveanti-HBV medicine can reduce hepatitis activation and hepatitis-related death.Analysis: Chi-Square tests were applied to compare the hepati-tis activation rate, hepatitis-related death rate between people startchemotherapy before and after policy change. We further assessthe association between use of anti-HBV medication and hepatitisactivation/hepatitis-related death to see whether use of anti-HBVmedication as prophylaxis is effective.Results and conclusion: Before the reimbursement, there aren=1,147 cases of hepatitis reactivation among n=49,368 cases ofpatients with lung, breast and colon cancer and had chemotherapy.After the universal reimbursement, there are n=707 Cases of hep-atitis reactivation among n=10,503 patients with there three can-cers and had chemotherapy. Before the reimbursement, there aren=13,719 cases cancer related deaths, n=37 cases hepatitis relateddeaths due to reactivated hepatitis among n=14,499 cases of patientswith lung, breast and colon cancer and had chemotherapy. After theuniversal reimbursement, there are n=5,406 cases of cancer relateddeaths, n=11 cases hepatitis related deaths due to hepatitis reacti-vation among n= 5,702 patients with there three cancers and hadchemotherapy. There were not improvement hepatitis related deathafter reimbursement. 1. Use of anti-HBV medication as prophylaxisis effective in reducing hepatitis reactivation, reducing hepatitis re-activation rate from 2.32 to 1.74. 2. There is no significant improve-ment in reducing hepatitis related death in this group of patients,number of death decreasing from 37(0.26) to 11(0.19). 3. Thereis no effect in cancer related survival, increasing from 94.62% to94.81% 4. Cost effectiveness should be considered before imple-mentation of a universal coverage.

A Nationwide Cohort Study of Influenza Vaccine on Stroke Pre-

52 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 61: Published by: International Chinese Statistical

Abstracts

vention in the Chronic Kidney Disease Population�Chang-I Chen, Wen-Kuei Chien, Yen-Kuang Lin, Ruey-KuenHsieh, Chao-Feng Lin, Ju-Chi Liu and Hao-Weng DengTaipei Medical [email protected]: Patients with chronic renal disease (CKD) are known tobe at risk of vascular events. Whether vaccinating CKD patientswith influenza vaccination to reduce the risk of stroke has not beenfully revealed. Method: We conducted a retrospective population-based study within a nationwide longitudinal database in Taiwanfrom 1999 to 2008. CKD patients were selected according to diag-nosis claims and were divided into 2 subgroups: with and withoutvaccination. Furthermore, analysis was also conducted for CKD pa-tients with 1, 2, 3 cumulative numbers of vaccination. The primaryoutcome was hospitalization for stroke. Cox proportional hazardsregression was used to measure the hazard ratios (HR) of stroke.Results: 2247 CKD patients with at least one vaccination and 2159patients with no vaccination during study period were followed. Theadjusted HR of influenza vaccination was 0.37 (95Conclusion: Influenza vaccination was suggested to protect CKDpatients more than 55 years-old from the risk of stroke. A cumula-tive effect of vaccination was also found in the present study.

Economic Movement and Mental Health: A Population-basedStudy.�Yen-Kuang Lin1 and Chen-Yin Lee21Taipei Medical University2Mingdao [email protected] study examines the potential relationship between various eco-nomic indices and the incidences of mental disorder using NationalHealth Insurance Research Database (NHIRD) during 2000-2008.As of 2007, approximately 98.4% of Taiwanese were enrolled inthe NHIRD. Daily observations of 3285 from the Taiwan Stock Ex-change Capitalization Weighted Stock Index, NHIRD, and Execu-tive Yuan were retrieved. We examined the association among thestock index, housing index and the mental disorder incidence. Ingeneral, we found that the higher incidence of mental health dis-order were associated with the increasing housing index and lowerstock index. We also stratified the study sample based on their sex,age and urbanization levels. Both gender follow the similar pat-tern. During 2000 to 2008, although the rising economic indicesmay bring fortune, but the sacrifice of the mental health could bethe cost we have to pay.

Session 13: Recent Advance in Longitudinal Data Analy-ses

Integrative and Adaptive Weighted Group Lasso and General-ized Local Quadratic ApproximationQing Pan1 and �Yunpeng Zhao2

1The George Washington University2George Mason [email protected] clinical outcomes are often collected in genomic stud-ies, where selection methods accounting for dynamic genetic effectsare desirable. We model the biomarker effects by smoothing splines,and select the coefficients by group lasso with novel weight func-tions based on the extremum of the biomarker effects over time.In addition to the common practice treating weights as adaptivefunctions whose values depend on some first-stage estimates, we

further propose an integrative group lasso method to treat the loss,penalty and weightfunctions as an integrative whole, where param-eters in all three are jointly estimated in one step. While the adap-tive group lasso can be solved by standard local quadratic approxi-mation, guidelines for more general local quadratic approximationsare developed to optimize the integrative group lasso. Consistencyand sparsistency are proved for both adaptive and integrative meth-ods, while the integrative version is more general in requiring fewerassumptions. Both procedures show superior specificity and biasover unweighted group lasso in simulation studies. The methodsare illustrated with the GWAS from the Epidemiology and Inter-vention of Diabetes Complication trial. To accommo- date morecandidate markers, 23 chromosomes are analyzed separately, andthe estimates are pooled in selecting common tuning parameters.

A Dynamic Risk Prediction Model for Data with CompetingRisks�Chung-Chou Chang1 and Qing Liu2

1University of Pittsburgh2Novartis Pharmaceutical [email protected]

Risk prediction modeling has been widely used for assessing theeffects of change in risk factors on the absolute risk of disease in-cidence or disease progression, weighing the risk and benefits ofan intervention, and designing future prevention trials, yet most ofsuch models to date are applicable only to information observed ator before study baseline. Recent dynamic risk prediction models in-corporate information after study baseline into modeling, which aremore capable of handling the effects of disease progression. To fur-ther advance such modeling approach, we proposed a risk predictionmodel which not only incorporates longitudinally updated informa-tion but also can account for the effect of competing risks. Withpredictors of current patient characteristics, history of the diseaseprofile, and future longitudinally collected information our modelcan be applied to the planning of personalized medicine. Our pro-posed model has several advantages over currently used risk predic-tion models in addressing competing risks. First, it is robust againstviolations of the proportional subdistributional hazards assumption.Second, the model enables users to make predictions with a set oflandmark points (i.e., prediction baseline time points) in one step.Third, the proposed model can incorporate various types of time-varying information. Finally, our model is not computationally in-tensive and can be easily implemented with existing statistical soft-ware. The performance of our model was assessed via simulations.We also demonstrated the use of our model with a data set from amulticenter clinical trial for breast cancer patients.

Simultaneous Inference of a Misclassified Outcome and Com-peting Risks Failure Time Data�Sheng Luo1, Xiao Su1, Min Yi2 and Kelly Hunt21The University of Texas at Houston2M. D. Anderson Cancer [email protected]

Ipsilateral breast tumor relapse (IBTR) often occurs in breast can-cer patients after their breast conservationtherapy. The IBTR statusclassification (true local recurrence versus new ipsilateral primarytumor) is subject to error and there is no widely accepted gold stan-dard. Time to IBTR is likely informative for IBTR classification be-cause new primary tumor tends to have a longer mean time to IBTRand is associated with improved survival as compared with the truelocal recurrence tumor. Moreover, some patients may die frombreast cancer or other causes in a competing risk scenario during

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 53

Page 62: Published by: International Chinese Statistical

Abstracts

the follow-up period. Because the time to death can be correlatedto the unobserved true IBTR status and time to IBTR (if relapseoccurs), this terminal mechanism is non-ignorable. In this paper,we propose a unified framework that addresses these issues simul-taneously by modeling the misclassified binary outcome without agold standard and the correlated time to IBTR, subject to dependentcompeting terminal events. We evaluate the proposed framework bya simulation study and apply it to a real data set consisting of 4477breast cancer patients. The adaptive Gaussian quadrature tools inSAS procedure NLMIXED can be conveniently used to fit the pro-posed model. We expect to see broad applications of our model inother studies with a similar data structure.

Copula-based Quantile Regression for Longitudinal Data�Huixia Wang1 and Xingdong Feng2

1The George Washington University2Shanghai University of Finance and [email protected] and prediction in quantile regression for longitudinal dataare challenging without parametric distributional assumptions. Wepropose a new semiparametric approach that uses copula to accountfor intra-subject dependence and approximates the marginal distri-butions of longitudinal measurements, conditionally on covariates,through regression of quantiles. The proposed method is flexible,and it can provide not only efficient estimation of quantile regres-sion coefficients but also prediction intervals for a new subject giventhe prior measurements and covariates. The properties of the pro-posed estimator and prediction are established theoretically, and as-sessed numerically through simulation study and the analysis of aPennsylvania nursing home data.

Session 15: Innovative Statistical Approaches in Nonclin-ical Research

Identifying Predictive Biomarkers in A Dose-Response Study�Yuefeng Lu, Xiwen Ma and Wei ZhengSanofi-aventis U.S. [email protected] problem of retrospectively identifying subgroups usingbiomarkers has recently been of great interests. So far most researchhas been focused on two-group study design. For studies with mul-tiple doses, either partial data is used or multiple dose groups arepooled, leading to sub-optimal results. We will present a frame-work to deal with multi-dose studies, and compare the new methodwith ad-hoc methods through simulations.

Bayesian Integration of In Vitro Biomarker Data to In VivoSafety Assessment�Ming-Dauh Wang and Alan ChiangEli Lilly and [email protected] effects of pharmaceuticals on electrocardiograms ECGs)are one of the most critical drug safety concerns facing drug devel-opment companies today. It is recognized that a better understand-ing of the interrelationship between preclinical measures of cardio-vascular safety biomarkers as well as a more integrated approach torisk assessment could dramatically speed the development of safeand effective medicines for patients in need. Using the Health andEnvironmental Sciences Institute of the International Life SciencesInstitute (ILSI/HESI) dataset, we constructed a Bayesian repeatedanalysis of covariance model to directly assess the cardiovascularrisk in QT interval.With prior distributions derived from in vitro

hERG ionic current concentration, posteriors for treatment effectswere obtained and the likelihood of risk was calculated. Sensitivityof the proposed Bayesian analysis to prior selection and comparisonwith the classical method were characterized. The results demon-strate that Bayesian integration of in vitro and in vivo biomarkersprovides an effective preclinical risk assessment for QT interval andcan reduce unnecessary animal exposure in toxicology studies.

Functional Structural Equation Model for DTI Derived Re-sponses in Twin Study�Shikai Luo1, Hongtu Zhu2 and Rui Song1

1North Carolina State University2The University of North Carolina at Chapel [email protected]

Motivated by analyzing massive functional data from large biomed-ical studies, we propose a class of functional structural equationmodel (FSEM) for modeling fixed effect process for characterizingthe association between functional responses and covariates of inter-est and for modeling random effect processes for capturing spatialcorrelations of functional responses from twin studies. We developan efficient estimation procedure to estimate the varying coefficientfunctions and the spatial covariance operators. We also systemat-ically carry out the theoretical analysis of FSEM. First, we estab-lish the weak convergence of the maximum likelihood estimate offixed effect functions. Second, we propose a pointwise testing pro-cedure to the existence of genetic effect at each fixed tract point viaweighted likelihood ratio test and a global testing procedure to theexistence of global genetic effect along the entire tract. We alsopropose a resampling procedure for approximating the p-values forboth tests. Third, we establish the asymptotic normality for the es-timated spatial covariance kernels. We conduct extensive MonteCarlo simulations to examine the finite-sample performance of theestimation and inference procedures. Finally, two real data sets areanalyzed to illustrate the application of our theoretical results.

Estimating Contamination Rates from Matched Tumor-normalExome Sequencing Data�Hyonho Chun1 and Xiwen Ma2

1Purdue University2Eli Lilly and [email protected]

The accuracy of next-generation sequencing (NGS) technology isgreatly compromised, when tumor-normal samples from individualsare contaminated by each others. This contamination from samplemixture is quite common in cancer study, since differentiating puretumor cells from normal cells is difficult in a sample preparationstep. Hence, the sample contamination needs to be quantified froma NGS tumor-normal dataset, before researchers study somatic mu-tation signals. In order to estimate contamination rates from tumor-normal pair exome sequencing data, we propose to use a mixturemodel. Our approach requires only a pair of tumor-normal samples,and it can easily applied to any deep-sequenced recapture dataset asan initial quality check tool.

Session 16: Statistical Advances for Genetic Data Analy-sis

Incorporating External Information to Improve Case-controlGenetic Association AnalysesHong Zhang1, Nilanjan Chatterjee2 and �Jinbo Chen3

1Fudan University2National Institutes of Health

54 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 63: Published by: International Chinese Statistical

Abstracts

3University of [email protected] the standard logistic regression analyses of case-control geneticassociation studies, it has been shown in empirical studies that ad-justing for informative covariates could lead to decreased power.This result conflicts with that for clinical trials, where adjusting forbaseline covariates always leads to increased power. We offer theo-retical explanations for this dilemma, and propose a unified solutionto this problem. For rare phenotypes, we conclude that the mostpractical strategy is the standard analyses without adjusting for co-variates. For common phenotypes, our proposed solution can leadto meaningful power improvement compared with the analysis ig-noring covariates.

Generalized Partial Linear Varying Index Coefficient Model forGene-Environment Interactions�Xu Liu, Bin Gao and Yueyua CuiMichigan State [email protected] studies have suggested the joint effect of simulta-neous exposures to multiple environments on disease risk. How-ever, how environmental mixtures as a whole jointly modify geneticeffect on disease risk is still largely unknown. Given the impor-tance of gene-environment (GimesE) interactions on many com-plex diseases, rigorously assessing the interaction effect betweengenes and environmental mixtures as a whole could shed novel in-sights into the etiology of complex diseases. For this purpose, wepropose a generalized partial linear varying-index coefficient model(GPLVICM) to capture the genetic effect on disease risk modulatedby multiple environments as a whole. GPLVICM is semiparametricin nature which allows different index loading parameters in differ-ent index functions. We estimate the parametric parameters by aprofile procedure, and the nonparametric index functions by a B-spline backfitted kernel method. Under some regularity conditions,the proposed parametric and nonparametric estimators are shown tobe consistent and asymptotically normal. We propose a generalizedlikelihood ratio (GLR) test to rigorously assess the linearity of thejoint environmental effect, while apply a parametric likelihood testto detect linear GimesE interaction effect. The finite sample per-formance of the proposed method is examined through simulationstudies and is further illustrated through a real data analysis.

Set-valued System Identification Approach to Identifying Ge-netic Variants in Sequencing StudiesGuolian KangSt. Jude Children’s Research [email protected] genetic association studies that involve a binary or an orderedcategorical phenotype, the standard logistic regression (LG) or or-dered LG (oLG) model is commonly used to identify genetic as-sociations. However, these approaches can lose power or cannotcontrol type I error rate if the phenotype is derived from dichotomiz-ing/categorizing a continuous phenotype following a normal distri-bution or from complicated unobservable or unobserved continuousvariables or if the genetic mutations are rare. We propose a set-valued (SV) system model, which is a generalized form of LG, oLG,Probit (Probit) regression, or ordered Probit (oPRB) regression, tobe considered as a method for discovering genetic variants, espe-cially rare genetic variants in next generation sequencing studies.We propose a new set-valued system identification (SVSI) methodto estimate all the underlying key system parameters for the SVmodel and compare it with LG in the setting of genetic associationstudies for a binary phenotype and with LG (a regrouped pheno-

type), oLG and oPRB for an ordered categorical phenotype. Fora binary phenotype, simulations showed that the SV method main-tained Type I error control and had similar or greater power thanthe LG method which is robust to different distributions of noise:logistic, normal or t distributions. Additionally, the SV associationparameter estimate was 2.7-46.8 fold less variable than the LG log-odds ratio association parameter estimate. Less variability in theassociation parameter estimate translates to greater power and ro-bustness across the spectrum of minor allele frequencies (MAFs),and these advantages are the most pronounced for rare variants. Forinstance, in a simulation that generated data from an additive logis-tic model with odds ratio of 7.4 for a rare single nucleotide poly-morphism with a MAF of 0.005 and a sample size of 2300, the SVmethod had 60% power whereas the LG method had 25% powerat the ?=10-6 level. Consistent with these simulation results, theset of variants identified by the LG method was a subset of thoseidentified by the SV method in two example analyses. For an or-dered categorical phenotype, simulations and two examples showedthat SV and LG accurately controlled the Type I error rate even ata significance level of 10-6 but not oLG and oPRB in some cases.LG had significantly smaller power than the other three methodsdue to disregarding of the ordinal nature of the phenotype, and SVhad similar or greater power than oLG and oPRB. Thus, we recom-mend that the SV model with SVSI be used in SNP-based geneticassociation studies, especially for detecting rare variants or given asmall sample size such as for some rare pediatric cancer genomicsprojects.

A Penalized Robust Semiparametric Approach for Gene-Environment Interactions

�Cen Wu1, Xingjie Shi2, Yuehua Cui3 and Shuangge Ma1

1Yale University

2Nanjing University of Finance and Economics

3Michigan State University

[email protected]

In genetic and genomic studies, gene-environment (GimesE) inter-actions have important implications. Some of the existing GimesEinteraction methods are limited by analyzing a small number of Gfactors at a time, by assuming linear effects of E factors, by as-suming no data contamination, and by adopting ineffective selectiontechniques. In this study, we propose a new approach for identify-ing important GimesE interactions. It jointly models the effects ofall E and G factors and their interactions. A partially linear varyingcoefficient model (PLVCM) is adopted to accommodate possiblenonlinear effects of E factors. A rank-based loss function is used toaccommodate possible data contamination. Penalization, which hasbeen extensively used with high-dimensional data, is adopted forselection. The proposed penalized estimating approach can auto-matically determine if a G factor has an interaction with an E factor,main effect but not interaction, or no effect at all. The proposedapproach can be effectively realized using a coordinate descent al-gorithm. Simulation shows that it has satisfactory performance andoutperforms the direct competitor that is not robust. The proposedapproach is used to analyze a lung cancer study with gene expres-sion measurements and clinical variables. It identifies genes withimportant implications.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 55

Page 64: Published by: International Chinese Statistical

Abstracts

Session 18: Statistical Methods for Sequencing DataAnalysis

Unit-free and Robust Detection of Differential Expression fromRNA-Seq DataHui JiangUniversity of [email protected] high-throughput sequencing of transcriptomes (RNA-Seq) hasrecently become one of the most widely used methods for quanti-fying gene expression levels due to its decreasing cost, high accu-racy and wide dynamic range for detection. However, the nature ofRNA-Seq makes it nearly impossible to provide absolute measure-ments of transcript concentrations. Several units or data summa-rization methods for transcript quantification have been proposed toaccount for differences in transcript lengths and sequencing depthsacross genes and samples. However, none of these methods can re-liably detect differential expression directly without further propernormalization. We propose a statistical model for joint detection ofdifferential expression and data normalization. Our method is inde-pendent of the unit in which gene expression levels are summarized.We also introduce an efficient algorithm for model fitting. Due tothe L0-penalized likelihood used by our model, it is able to reliablynormalize the data and detect differential expression in some caseswhen more than half of the genes are differentially expressed in anasymmetric manner. The robustness of our proposed approach isdemonstrated with simulations.

Robust Estimation of Isoform Expression with RNA-Seq Data�Jun Li1 and Hui Jiang2

1University of Notre Dame2University of [email protected] gene and isoform expression is one of the primary tasksfor RNA-Seq experiments. Given a sequence of counts representingnumbers of reads mapped to different positions (exons and junc-tions) of isoforms, methods based on Poisson generalized linearmodels (GLM) with the identity link function have been proposed toestimate isoform expression levels from these counts. These Pois-son based models have very limited ability in handling the overdis-persion in the counts brought by various sources, and some of themare not robust to outliers. We propose a negative binomial basedGLM with identity link, and use a set of robustified quasi-likelihoodequations to make it resistant to outliers. An efficient and reliablenumeric algorithm has been identified to solve these equations. Insimulations, we find that our approach seems to outperform exist-ing approaches. We also find evidence supporting this conclusion inreal RNA-Seq data.

Genetic Association Testing for Binary Traits in the Presence ofPopulation Structure�Duo Jiang1, Sheng Zhong2 and Mary sara Mcpeek21Oregon State University2University of [email protected] genetic association mapping, failure to properly adjust for pop-ulation structure can lead to severely inflated type I error and lossof power. Meanwhile, adjustment for relevant covariates is oftendesirable and sometimes necessary to protect against spurious as-sociation and to improve power. Many recent methods to accountfor population structure and covariates are based on linear mixedmodels (LMM), primarily designed for quantitative traits. For bi-nary traits, however, LMM is often a misspecified model and can

lead to power loss. We develop a new method for binary trait asso-ciation testing using a quasi-likelihood framework, which exploitsthe dichotomous nature of the trait by modelling covariate effectson a logit scale and achieves computationally efficiency through es-timating equations. We show through simulation studies that ourmethod provides power improvement over the linear mixed methodapproach in a variety of population structure settings and trait mod-els. Applied to an association analysis for Crohn’s disease in theWTCCC data, our methodidentifies 18 significantly associations,one of which has not been previously reported.

Identification of Stably Expressed Genes from ArabidopsisRNA-Seq DataBin Zhuo, �Yanming Di, Sarah Emerson and Jeff ChangOregon State [email protected]

We examine RNA-Seq data from many different experiments car-ried out by different labs and identify genes that are stably expressedacross biological samples, experiment conditions, and labs. We fita random-effect model to the read counts for each gene and decom-pose the total variance to into between-sample, between-conditionand between lab variance components. Identifying stably expressedgenes is useful for count normalization. The variance componentanalysis is a first step towards understanding the sources and natureof the RNA-Seq count variation.

Session 19: Recent Developments in the Theory and Ap-plications of Spatial Statistics

Estimating a Low Rank Covariance Matrix for Spatial Data�Siddhartha Nandy, Chae-Young Lim and Tapabrata MaitiMichigan State [email protected]

We are interested in estimating a low rank covariance matrix forspatial data. We consider the spatial covariance matrix of the pro-cess is decomposed into two components: a diagonal matrix com-ing from a measurement error process and a low rank covariancematrix which has a non-stationary structure. We propose a two-stepapproach using group LASSO type shrinkage estimation techniquefor estimating the rank of the covariance matrix and the matrix it-self. A block coordinate descent method for a block multi-convexfunction under regularizing constraints is utilized to implement theproposed approach.

Computational Instability of Spatial Covariance Matrices�Wei-Ying Wu1 and Chae young Lim2

1National Dong Hwa University2Michigan State [email protected]

The computing a covariance matrix is seen very often in statistics.For example, Gaussian likelihood function involves the covariancematrix. Spatial prediction called Kriging involves the computationof the inverse of a spatial covariance matrix. For the computation ofa spatial covariance matrix, numerically unstable results are foundwhen the observation locations are getting dense. In this talk, weinvestigate why and when computational instability in calculatingMat’ern covariance matrix makes maximum likelihood estimator(MLE) or Kriging unreasonable in the ill-conditioned sense. Also,some possible approaches to relax such computational instabilityare also discussed.

56 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 65: Published by: International Chinese Statistical

Abstracts

Statistical Method for Change-set AnalysisJun ZhuUniversity of [email protected]

Here we consider a change-set method for grouping spatial unitson a lattice to have similar characteristics within a group and dis-tinction among groups. This may be viewed as an extension of theexisting change-point analysis for one-dimensional data in space ortime. In particular, we propose an entropy measure and establish aquasi-likelihood procedure for parameter estimation and statisticalinference. The large sample properties of our method are estab-lished and the finite sample properties are investigated in a simu-lation study. For illustration, our method is applied to analyze acounty-based socio-economic data set.

Session 20: Risk Prediction Modeling in Clinical Trials

Evaluating Calibration of Risk Prediction ModelsRuth PfeifferNational Institutes of [email protected]

Statistical models that predict disease incidence, disease recurrenceor mortality following disease onset have broad public health andclinical applications. Before a model can be recommended forpractical use, its performance characteristics need to be under-stood. General criteria to evaluate prediction models for dichoto-mous outcomes include predictive accuracy, proportion of variationexplained, calibration and discrimination. Most recent validationstudies have emphasized calibration and discrimination. A modelis called “well calibrated” (or unbiased) when the predicted prob-abilities agree with observed risk in subsets of the population andoverall. I propose and study novel criteria to assess the calibrationof models that predict risk of disease incidence and compare theirperformance to standard methods to assess model calibration. I il-lustrate the methods with models that predict incidence of breastcancer and response to Hepatitis C treatment.

Statistical Considerations for Evaluating Prognostic ImagingBiomarkersZheng ZhangBrown [email protected]

Biomarkers derived from quantitative imaging modalities such asKtrans from DCE, rCBV from DSC and ADC from DWI are beingevaluated in various cancer clinical trials to determine their abilitiesin predicting treatment effect. For binary outcome such as patho-logical response, area under the ROC curve (AUC) has been a pre-ferred method to determine the marker’s predictive performance.However, odds ratio (OR) from logistic regression has also beenroutinely used in this situation, although the relationship betweenOR and AUC is not very well documented. In addition, since mostof the imaging markers are continuous, choice has to be made onwhat form of the data (continuous and binary) should be used in thelogistic regression model. If we choose to model the biomarker asa binary data, a pre-specified threshold has to be chosen.In this talk,we will present simulation study results on the relationship betweenAUC and OR, as well as impact of threshold on the odds ratio esti-mation. We demonstrate that OR is not suitable for evaluating theprognostic marker, whereas AUC is a better choice.

Risk Assessment for Patients with Hepatitis C: A Scoring Sys-

tem Approach�Weining Shen1, Jing Ning1, Ying Yuan1, Ziding Feng1 and AnnaLok21M. D. Anderson Cancer Center2University of [email protected]

Modern genomic technologies have generated a large number ofbiomarkers for early-phase detection and prognosis of diseases. Amajor challenge is how to identify informative risk factors to con-struct a score system for predicting the likelihood of developingdiseases. In this talk, I will introduce a time-dependent receiveroperating characteristic based method to construct a score systemthat combines informative biomarkers and other baseline informa-tion. The proposed methods bypass the need to model the outcomes,and can be extended to accommodate data from complex clinicaltrial designs (e.g., nested case-control design). Theoretical prop-erties (e.g., selection consistency and asymptotic normality) of theproposed estimators are established. We apply the method to datafrom the Hepatitis C Antiviral Long-term Treatment against Cirrho-sis (HALT-C) Trial.

Risk Prediction Modeling in the National Lung Screening TrialFenghai DuanBrown [email protected]

Introduction: In the National Lung Screening Trial (NLST) a 20%relative reduction in lung cancer mortality was observed using re-duced dose helical computed tomography (CT) relative to chest-x-ray screening in older smokers. This sub-study aims to determinehow the observed nodules and the associated features can influencelung cancer diagnosis. Methods: In 26,455 participants who under-went at least one CT screen, sensitivity, specificity, positive predic-tive value, and negative predictor value for lung cancer were de-termined separately for different types of nodules. Relative riskof lung cancer was determined as the ratio of lung cancer in thenodule-detected group compared to the non-nodule group. The ap-proach of a two-stage modeling was then applied to determine howthe observed nodules and the associated features can influence lungcancer diagnosis. In Stage 1, a Cox proportional hazards model wasfitted at the participant level to assess if the presence of a nodule atbaseline increases the hazards of developing lung cancer. The time-varying effect of the nodule type and other clinical variables werealso evaluated in this stage. In Stage 2, a generalized linear mixedmodel was fitted on the observed nodules to determine how the as-sociated nodule features can affect the probability of lung cancerdiagnosis in the same lobe. Conclusions: Clinical and nodule fea-tures can be used to better stratify risk of lung cancer diagnosis andimprove CT screening performance.

Session 21: The Application of Latent Variable and Mix-ture Models to the Biological Sciences

The Role of Item Response Theory in Assessment and Evalua-tion Studies�Li Cai and Lauren HarrellUniversity of California, Los [email protected]

Item response theory (IRT) modeling, as a latent variable model-ing approach, grew out of the factor analysis tradition within psy-chometrics. Its usefulness in educational, psychological, mentaland behavioral health assessment studies has been broadly estab-lished. “Proceduralized” IRT as a routine data analytic method has

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 57

Page 66: Published by: International Chinese Statistical

Abstracts

also seen inclusion in new releases of mainstream statistical soft-ware packages such as SAS and Stata. On the other hand, in bothbiomedical and social sciences, treatment evaluation studies haverelied heavily on models that grew out of the regression modelingframework. In this paper we present new IRT models that integratemeasurement and regression modeling.

The application of Structural Equation Modeling to Biomarkerdata�Nathan Vandergrift, Sallie Permar and Barton HaynesDuke [email protected] Equation Modeling (SEM) is a covariance structure anal-ysis method that allows for simultaneous dimensionality reductionand regression parameter testing. With the increase in use of multi-plex assays the need for different modeling approaches to the anal-ysis of biomarker data is acute. SEM has not been used with anyfrequency in the biological sciences. We will lay out the reasonsthat bio-medical researchers may wish to adopt SEM. We will alsopresent a SEM analysis of biomarker data from a Mother-to-InfantTransmission (MTCT) study to show its practical utility. In the anal-ysis we find evidence of an underlying mechanism for autologousneutralization which led to the isolation via flow cytometry of thoseantibodies from the sera of a non-transmitting mother (Permar, et al.In Press).

Latent and Observed Variables in Kernel-Penalized RegressionModelsTimothy RandolphFred Hutchinson Cancer Research [email protected] tests for the (global) association between a high-dimensional vector of measurements, x, and a phenotype, y, may beformulated in the context of a reproducing kernel Hilbert space, H.A relevant H is one spanned by the eigenvectors of an appropriately-chosen kernel, K; i.e., a matrix of similarities between pairs of theobserved vectors. In this sense H represents a vector space of latentstructure, and a kernel-based test statistic is defined in terms of howwell y is predicted with respect to this structure. However, a globaltest of this type does not reveal insight about individual elementsof x. In this talk, we consider a framework for choosing K (equiva-lently, H) that can provide both a powerful score test and an estimateof individual associations between the elements of x and y. Moti-vation is provided by examples from microbiome and metabolomicdata analyses.

A Mixture Model Approach to Estimating a Nonlinear Errors-in-Variables Model for Serial Dilution AssayYouyi FongFred Hutchinson Cancer Research [email protected] dilution assays with continuous experimental outcomes areoften used to quantify substance in a biomedical sample. The de-sign and analysis of serial dilution assay data has so far primar-ily been based on estimating dilution-response curves, the nonlin-ear relationship between sample dilutions and experimental out-comes. Motivated by the analysis of binding antibody multiplexassays (BAMA) that has played an important role in the study ofimmune responses to HIV-1 infection and vaccination, we study thenonlinear functional relationship between experimental outcomesmeasured at two different dilutions, which we call paired responsecurve (prc, Fong et al. 2015). Paired response curve model is a non-linear errors-in-variables model. In this talk, we take a structural

approach and treat the incidental parameters as random effects froma finite-dimensional mixing distribution. We describe an algorithmfor finding the MLE, and study its performance through simulationstudies and real data illustration.

Session 22: Clinical Trials with Multiple Objectives:Maximizing the Likelihood of Success

Statistical Challenges in Testing Multiple Endpoints in ComplexTrial Designs�H.M. James Hung and Sue-Jane WangU.S. Food and Drug [email protected]

Methodology for testing multiple endpoints is increasingly chal-lenging as a clinical development program is more advanced. Ingroup sequential designs, interim analyses are mostly based onthe primary endpoint with a pre-specified alpha-spending strategy.When the trial stops to declare a win on the primary endpoint, howto test a secondary endpoint is challenging as noted in Hung etal (2007) and subsequently stipulated in Glimm et al (2010) andTamhane et al (2010). In another context, stopping the trial shouldarguably be based on a harder endpoint, such as mortality, whichis often at best a secondary endpoint. Statistical testing for the pri-mary endpoint and the secondary endpoint is also quite challengingin this scenario. When the secondary endpoint requires a greateramount of statistical information, more than one trial may need tobe integrated in a pre-specified plan to improve statistical power fortesting. This presents additional challenges to statistical testing. Inthis presentation, I shall discuss the methodological challenges andstipulate some viable approaches to the problems.

Group-Sequential Clinical Trials When Considering MultipleOutcomes as Co-Primary Endpoints�Toshimitsu Hamasaki1, Scott Evans2 and Koko Asakura1

1National Cerebral and Cardiovascular Center2Harvard [email protected]

We discuss the decision-making frameworks for clinical trials withmultiple outcomes as co-primary endpoints in a group-sequentialsetting. The decision-making frameworks can account for flexibil-ities, such as a varying number of analyses, equally or unequallyspaced increments of information, and fixed or adaptive Type I er-ror allocation among endpoints. The frameworks can provide ef-ficiency, that is, potentially fewer trial participants, than the fixedsample size designs. We investigate the operating characteristicsof the decision-making frameworks and provide guidance on con-structing efficient group-sequential strategies in clinical trials withmultiple co-primary endpoints.

Sample Size Determination for a Specific Region in Multi-regional Clinical Trials with Multiple Co-Primary Endpoints�Chin-Fu Hsiao1, Wong-Shian Huang1 and Toshimitsu Hamasaki21National Health Research Institutes2National Cerebral and Cardiovascular [email protected]

To accelerate the drug development process and shorten approvaltime, the design of multi-regional clinical trials (MRCTs) incorpo-rates subjects from many countries/regions around the world underthe same protocol. After showing the overall efficacy of a drug in allglobal regions, one can also simultaneously evaluate the possibilityof applying the overall trial results to all regions and subsequently

58 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 67: Published by: International Chinese Statistical

Abstracts

support drug registration in each of them. Most of the recent ap-proaches for design and evaluation of MRCTs are only concernedwith one primary endpoint. However, in some therapeutic areas(e.g. Alzheimer’s disease), the clinical efficacy of a new treatmentmay be characterized by a set of possibly correlated endpoints be-cause patients’ responses to the treatment may comprise several dif-ferent aspects. We focus on a specific region and establish statisticalcriteria for consistency between the region of interest and overall re-sults in MRCTs with multiple co-primary endpoints. More specif-ically, four criteria are considered for each endpoint. Two criteriaare to assess whether the treatment effect in the region of interest isas large as that of the other regions or of the regions overall, whilethe other two criteria are to assess the consistency of the treatmenteffect of the specific region with other regions or the regions overall.Sample size required for the region of interest can also be evaluatedbased on these four criteria.

Fallback Tests for Co-primary Endpoints�Robin Ristl1, Florian Frommlet1, Armin Koch2 and Martin Posch1

1Medical University of Vienna2Hannover Medical [email protected] efficacy of a treatment is measured by co-primary endpoints,efficacy is claimed only if for each endpoint an individual statisticaltest is significant at level alpha. While such a strategy controls thefamily-wise type I error rate (FWER), it is often strictly conserva-tive and allows for no inference if not all null hypotheses can berejected. This situation can be unsatisfying, especially in settingssuch as rare diseases where optimal use of the available informa-tion is of high importance.We therefore investigate fallback tests,which are defined as uniform improvements of the classical test forco-primary endpoints. They reject whenever the classical test re-jects but allow for inference also in settings where only a subset ofendpoints show a significant effect. Similar to the fallback tests forhierarchical testing procedures (Wiens et al. 2005) these fallbacktests for co-primary endpoints allow one to continue testing even ifthe primary objective of the trial was not met. Examples of fall-back tests for two and three co-primary endpoints are investigatedthat control the FWER in the strong sense under the assumptionof multivariate normal test statistics with arbitrary correlation ma-trix. The power of the considered fallback tests is investigated in asimulation study. The discussion on the fallback procedures for co-primary endpoints is illustrated with a clinical trial in a rare diseaseand an example for a diagnostic trial. This work has been fundedby the FP7-HEALTH-2013-INNOVATION-1 project Advances inSmall Trials Design for Regulatory Innovation and Excellence (AS-TERIX) Grant Agreement No. 603160.

Session 23: Issues Related to Subgroup Analysis in Con-firmatory Clinical Trials: Challenges and Opportunities

A Statistical Decision Framework Applicable to Multipopula-tion Tailoring TrialsBrian MillenEli Lilly and [email protected] promise of tailored therapies has resulted in increased attentionon the evaluation of treatment effects in focused subpopulations inclinical trials. Such assessments present new opportunities for drugdevelopment and, ultimately, patients and prescribers. Statisticalconsiderations enabling such treatment assessments in confirmatory

clinical trials will be presented in this talk. Particular attention willbe paid to contrasting the properties and operating characteristics ofvarious approaches to support appropriate inference.

A Multiple Comparison Procedure for Subgroup Analyses withBinary Endpoints�Dong Xi, Yanqiu Weng, Kapildeb Sen, Ekkehard Glimm, WilliMaurer and Frank BretzNovartis Pharmaceutical [email protected]

It is a common practice in clinical trials to investigate the efficacyand safety of a test treatment in subgroups of patients in additionto the overall study population. However, the findings of subgroupsanalyses do not provide confirmatory evidence unless they are spec-ified a priori. We consider the confirmatory subgroup analysis forPhase II trials in oncology, where the hypotheses of interest aretested in the overall study population and the pre-specified sub-population. Since these trials are usually conducted as single-armdesigns with binary endpoints, e.g., overall response rate (ORR),a multiple comparison procedure is proposed to ensure the strongcontrol of the familywise Type I error rate as well as to take intoaccount the discreteness of data and the correlation between popu-lations.

Interaction Trees for Exploring Stratified and IndividualizedTreatment EffectsXiaogang SuThe University of Texas at El [email protected]

Assessing heterogeneous treatment effects has become agrowing in-terest in many application fields including personalized medicine.Concerning experimental data collected from randomized trials, weexpand Interaction Trees (IT; Su et al., 2009), to explore stratifiedand individual treatment effects in a variety of ways. As an alterna-tive to greedy search, a smooth sigmoid surrogate (SSS) method isfirst used to speed up IT. On the basis of IT, causal inference at dif-ferent levels are then made. More specifically, an aggregated group-ing procedure stratifies data into refined groups where the treatmenteffect remains homogeneous. Ensembles of IT models can provideprediction for individual treatment effects and it compares favorablyto the traditional ‘separate regression’ methods. Besides, a recentinfinitesimal jackknife method of Wager Hastie, and Efron (2014)is adopted to obtain the standard errors for the individualized treat-ment effects. In order to extract meaningful interpretations, we alsomade available several other features such as variable importanceand partial dependence plot. An empirical illustration of the pro-posed techniques is made via an analysis of quality of life (QoL)data among breast cancer survivors.

Considering Regional Difference in Design and Evaluation ofMRCTs for Binary Endpoints�Chi-Tian Chen and Chin-Fu HsiaoNational Health Research [email protected]

Multiregional clinical trial (MRCT) is a clinical trial of global col-laboration for pharmaceutical product development. The planningand implementation of MRCT is conducted at the same time acrosscountries/regions based on a common protocol. The key issue liesin how to address the possible geographic variations of efficacyand safety for the global drug development since it is known thatthe regional difference among regions may have impact upon amedicine’s effect. Therefore, the heterogeneity of treatment effectdue to the regional difference should be considered in design and

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 59

Page 68: Published by: International Chinese Statistical

Abstracts

analysis of MRCT. In this presentation, we focus on binary end-points and address heterogeneous treatment effect (response rate)across regions by a power-function distribution. Then, the designand evaluation of MRCT is based on a Bernoulli-power-functionmodel. After showing the efficacy of a drug in various local re-gions, the consistency between the specific region and entire groupis confirmed according to the concept of the method 1 proposed byJapanese MHLW. Finally, the proportion of subjects in the specificregion is determined by ensuring that the assurance probability ofthe consistent criterion reaches a desired level, say 80% or 90%.

Session 24: Recent Developments in Missing Data Analy-sis

Variable Selection in the Presence of Missing Data: Resamplingand Imputation�Qi Long1 and Brent Johnson2

1Emory University2University of [email protected]

In the presence of missing data, variable selection procedures needto be tailored to missing data mechanisms and statistical approachesused for handling missing data. We focus on the mechanism ofmissing at random and variable selection procedures that can becombined with imputation methods. We investigate a general re-sampling approach that combines bootstrap imputation and stabilityselection, the latter of which was developed for fully observed data.The proposed approach is general and can be applied to a wide rangeof settings including general missing data patterns. Our simulationstudies show that the proposed approach achieves the best or closeto the best performance compared to several existing methods forboth low-dimensional and high-dimensional problems. In particu-lar, it is not very sensitive to tuning parameter values. The proposedapproach is further illustrated using two real data examples, one fora low-dimensional problem and the other for a high-dimensionalproblem.

Using Link-Preserving Imputation for Logistic Partially LinearModels with Missing Covariates�Qixuan Chen1, Myunghee Paik2, Minjin Kim2 and Cuiling Wang3

1Columbia University2Seoul National University3Albert Einstein College of [email protected]

To handle missing data one needs to specify auxiliary models suchas the probability of observation or imputation model. Doubly ro-bust (DR) method uses both auxiliary models and produces con-sistent estimation when either of the model is correctly specified.While the DR method in estimating equation approaches could beeasy to implement in the case of missing outcomes, it is compu-tationally cumbersome in the case of missing covariates especiallyin the context of semiparametric regression models. In this paper,we propose a new kernel-assisted estimating equation method forlogistic partially linear models with missing covariates with appli-cations to two-phase studies. We replace the conditional expectationin the DR estimating function with an unbiased estimating functionconstructed using the conditional mean of the outcome given the ob-served data, and impute the missing covariates using the so calledlink-preserving imputation models to simplify the estimation. Theproposed method is valid when the nonresponse model is correctly

specified and is more efficient than the kernel-assisted inverse prob-ability weighting estimator. It is doubly robust under missing com-pletely at random or when the regression coefficients of the missingparametrically modeled covariates are equal to zero. The proposedestimator is consistent and asymptotically normal. We evaluate thefinite sample performance in terms of efficiency and robustness, andillustrate the application of the proposed method to the health insur-ance data using the 2011-2012 National Health and Nutrition Ex-amination Survey, in which data were collected in two phases.

Composite Likelihood Approach in Gaussian Copula Regres-sion Models with Missing Data�Wei Ding and Peter SongUniversity of [email protected]

Misaligned missing data occur in many large-scale studies due tosome impediments in data collection such as policy restriction,equipment limitation and budgetary constraint. By misaligned miss-ingness we mean a missing data pattern in which two sets of vari-ables are measured from disjoint subgroups of subjects with nooverlapped observations. An analytic challenge arising from theanalysis of such data is that some of correlation parameters relatedto those misaligned variables are not point identifiable but possi-bly partially identifiable. This parameter identification issue hindersus from utilizing classical multivariate models in the data analysis.To overcome this difficulty, we propose a composite likelihood ap-proach based on marginal distributions of variables with full ob-servations, so that the resulting pseudo likelihood is free of anyunidentifiable parameters. After obtaining estimates of the pointidentifiable parameters, we further estimate the parameter range forpartially identifiable parameters. For implementation, we developan effective peeling optimization procedure to obtain estimates ofpoint identifiable parameters. We investigate the performance of theproposed composite likelihood method through simulation studies,with comparisons to the classical maximum likelihood estimationobtained from both EM algorithm and multiple imputation strategy.The proposed method is illustrated by one data example from ourcollaborative project.

Test the Reliability of Doubly Robust Estimation with MissingResponse Data�Baojiang Chen1 and Jing Qin2

1University of Nebraska Medical Center2National Institutes of [email protected]

In statistical inference one has to make sure that the underlyingregression model is correctly specifiedotherwise the resulting es-timation may be biased. Model checking is an important methodto detect any departure of the regression model from the true one.Missing data is a ubiquitous problem in social and medical stud-ies. If the underlying regression model is correctly specified, re-cent researches show great popularity of the doubly robust estimatesmethod for handling missing data because of its robustness to themisspecification of either the missing data model or the conditionalmean model, i.e. the model for the conditional expectation of trueregression model conditioning on the observed quantities. However,little work has been devoted to the goodness of fit test for doubly ro-bust estimates method. In this paper, we propose a testing methodto assess the reliability of the estimator derived from the doublyrobust estimating equation with possibly missing response and al-ways observed auxiliary variables. Numerical studies demonstratethat the proposed test can control type I errors well. Furthermore the

60 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 69: Published by: International Chinese Statistical

Abstracts

proposed method can detect departures from model assumptions inthe marginal mean model of interest powerfully. A real dementiadata set is used to illustrate the method for the diagnosis of modelmisspecification in the problem of missing response with an alwaysobserved auxiliary variable for cross-sectional data.

Session 25: Spatial and Spatio-Temporal Modeling in En-vironmental and Ecological Studies

Multivariate Spatial Modeling on Spheres�Juan Du1 and Chunsheng Ma2

1Kansas State University2Wichita State [email protected] spatial processes on spheres are often used to modellarge scale spatial data sets with multiple attributes observed on aglobe, such as those in geophysical and atmospheric studies. Todirectly construct valid and applicable covariance or variogram ma-trix functions on spheres, we first find characterizations of thosestructures. For intrinsically stationary vector random processes onspheres, the variogram matrix function on a sphere can be repre-sented in terms of an infinite sum of the products of positive def-inite matrices and ultraspherical polynomials. The non-stationarybut axially symmetric case is also explored. Some parametric vari-ogram or covariance matrix models are derived on spheres via dif-ferent constructional approaches. A simulation study is conductedto illustrate the implementation of the proposed model in estima-tion and cokriging, whose performance is compared with that usinglinear model of coregionalization.

Autoregressive Spatially-varying Coefficient Models for Pre-dicting Daily PM2.5 Using VIIRS Satellite AOD�Erin Schliep1, Alan Gelfand1 and David Holland2

1Duke University2U.S. Environmental Protection [email protected] is high demand for accurate air quality information in humanhealth analyses. The sparsity of ground monitoring stations acrossthe United States motivates the need for advanced statistical modelsto predict air quality metrics, such as PM2.5, at unobserved loca-tions. Remote sensing technologies have potential to expand ourknowledge of PM2.5 spatial patterns beyond what we can predictfrom current PM2.5 monitoring networks. Data from satellites havean additional advantage in not requiring extensive emission inven-tories necessary for most atmospheric models that have been usedin earlier data fusion models of air pollution. Statistical modelscombining monitoring station data with satellite-obtained aerosoloptical depth (AOD) have been proposed in the literature with vary-ing levels of accuracy in predicting PM2.5. The benefit of usingAOD is that satellites provide complete gridded spatial coverage.The challenges of these models, however, is that (1) the correlationbetween the two data sources varies both in time and in space, (2)the data are temporally and spatially misaligned, and (3) there areextensive missing data in both data sources. We propose a hierar-chical autoregressive spatially-varying coefficients model to jointlymodel the two data sources addressing the foregoing modeling chal-lenges. Additionally, we apply formal model comparison on com-peting models in terms of model fit and out of sample predictionof PM2.5. The models are applied to daily observations of PM2.5

and AOD in the summer of months of 2013 across the conterminousUnited States. Most notably, we find slight in-sample improvement

incorporating AOD into our autoregressive model but little out-of-sample predictive improvement during this time period.

Modeling Animal Abundance with A Semi-Parametric Space-Time ModelDevin JohnsonThe National Oceanic and Atmospheric [email protected]

We consider a model-based clustering approach to examining abun-dance trends in a metapopulation. Our proposed trend analysis in-corporates a clustering method that is an extension of the classicDirichlet process prior, which allows for inclusion of distance co-variates between sites. This approach has two main benefits: (1)nonparametric spatial association of trends and (2) reduced dimen-sion of the spatio-temporal trend process. We present a transdimen-sional Gibbs sampler for making Bayesian inference that is efficientin the sense that all of the full conditionals can be directly sam-pled from save one. To demonstrate the proposed method we exam-ine long term trends in northern fur seal pup production at nineteenrookeries in the Pribilof Islands, Alaska. There was strong evidencethat clustering of similar year-to-year deviation from linear trendswas associated with whether rookeries were located on the same is-land. Clustering of local linear trends did not seem to be stronglyassociated with any of the distance covariates. In the fur seal trendsanalysis an overwhelming proportion of the MCMC iterations pro-duced a 73–79% reduction in the dimension of the spatio-temporaltrend process, depending on the number of cluster groups.

An Efficient Non-parametric Estimate for Spatially CorrelatedFunctional Data�Yuan Wang, Kim-Anh Do, Jianhua Hu and Brian HobbsM. D. Anderson Cancer [email protected]

We consider the functional data being observed at multiple unitson a subject and data observed on neighboring units are correlated.A weighted kernel smoothing estimate is designed to leverage thespace and time correlation. Asymptotic results are derived for theproposed estimate, and there is a unique most efficient estimate thatachieves minimum asymptotic variance. A simultaneous predictionfor individual curves using discrete samples are discussed. Simula-tion studies has illustrated the improved performance of the reducedmean square error for the weighted estimate. We apply the methodsto the perfusion computed tomography data where subjects weremeasured sequentially over time and several regions of interest wereacquired for each subject. We show the proposed method remainsgood performance when reducing the number of scans. The generalmethod offers potential to improve estimation and prediction per-formance in the case of sparse observation. The method is attractivefor many biomedical application that utilizes biomarkers to identifyfeatures intrinsic to a particular disease at multiple interdependentsites within an organ.

Session 26: Challenges in Analyzing Complex Data UsingRegression Modeling Approaches

Goodness-of-Fit Tests of Finite Mixture Regression ModelsJunwu Shen1, �Shou-En Lu2, Yong Lin2, Weichung joe Shih2 andJunfeng (Jim) Zhang3

1Novartis Pharmaceutical Corporation2Rutgers University3Duke [email protected]

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 61

Page 70: Published by: International Chinese Statistical

Abstracts

Evaluating the goodness-of-fit (GOF) for a mixture regressionmodel can be challenging due to the complexity in the model struc-ture. In this paper, we propose a GOF test to evaluate the fit of finitemixture regression models, according to the principle of cumulativeresiduals. Specifically, we define the cumulative pseudo-residualsbased on the score functions of a normal mixture regression modelfor independent observations, and develop GOF tests to evaluate thefit of the regression models for the component specific means andthe mixing proportions, with respect to the link function and thefunctional form of a covariate. The proposed tests are extended tothe two-component normal mixture regression models for depen-dent observations, where the dependence of observations is mod-elled by random effects. Extensive simulation studies showed thatthe proposed GOF tests maintained the type I error rate and hada reasonable power to detect model deviations. The data from theBeijing Health Effects of Air Pollution Reduction Trial (HEART)were used to illustrate the proposed methodologies.

Comparing Methods of Modeling Individual Infancy GrowthCurves�Rui Xiao1, Sani Roy2, Alessandra Chesi2, Frank Mentch2, RosettaChiavacci2, Jonathan Mitchell1 and Andrea Kelly21University of Pennsylvania2Children’s Hospital of [email protected]

Comparing methods of modeling individual infancy growth curves-Rui Xiao, Sani M. Roy, Alessandra Chesi, Frank Mentch, RosettaChiavacci, Jonathan A. Mitchell, Andrea Kelly, Hakon Hakonarson,Struan F.A. Grant, Babette S. Zemel, Shana E. McCormack Recentstudies have reported that rapid growth in infancy is associated witha greater risk of subsequent obesity as well as other health outcomesin later life. Therefore it is important to adequately characterize in-dividual infancy growth patterns, which may promote effective earlyinterventions of obesity. We used a longitudinal cohort with healthy,non-preterm infants (n=2114) in the Genetic Causes for ComplexPediatric Disorders study (The Children’s Hospital of Philadelphia)with at least six body mass index (BMI) measurements from birthto 30.25 months of age. In this study, we present several com-monly used methods of modeling individual BMI growth curvesin this cohort, including 1) Individual polynomial regression, 2)Penalized cubic spline mixed-effect model, 3) Parametric mixed-effect model, and 4) SuperImposition by Translation and Rotation(SITAR) model, a shape invariant model (SIM) with a single fittedcurve. We compared their performance through AIC and residualstandard deviation (RSD).

Alzheimer’s Disease Early Prediction and Imaging GeneticsAnalyses Based on Large Scale Regularization�Fang-Chi Hsu, Mark Espeland and Ramon CasanovaWake Forest University School of [email protected]

Study of Alzheimer’s disease (AD) has entered into a new era whenbiological measures from multiple platforms, such as neuroimag-ing and genetics, are being collected to help deepen understandingof the disease and improve prevention and diagnosis. Because itis believed that pathological processes that lead to AD start manyyears before the clinical symptoms are observed, the problem ofearly detection and risk assessment is a major focus of AD research.High-dimensional data pose a serious challenge to developing mod-els for AD risk prediction. We discuss here models we used forAD risk assessment based on structural magnetic resonance imaging(sMRI- voxel) and genome wide single-nucleotide polymorphism

(SNP) data from AD Neuroimaging Initiative (ADNI) using largescale regularization. Sparsity learning based on elastic net regular-ization was used to develop anatomical scores of AD risk that wecalled AD Pattern Similarity (AD-PD) scores. Using ADNI data,we validated the scores by relating them to different cognitive states(e.g. cognitively normal, mild cognitive impairment (MCI) and AD)and to times of MCI to AD conversion. We also show results ofGWAS analyses when the AD-PS scores were used as quantitativetraits. In addition, we show that very large classifications problemscould be solved successfully in SNP space using the same approach.Thus, large scale regularization approaches could be useful in sum-marizing the MRI data and identifying genetic markers that predictAD risk.

Session 27: Bayesian Applications in Biomedical Studies

Bayesian Functional Enrichment Analysis�Jing Cao1 and Song Zhang2

1Southern Methodist University2The University of Texas Southwestern Medical [email protected]

Functional enrichment analysis is used in high-throughput dataanalysis to provide functional interpretation for a list of genes orproteins that share a common property, for example, genes thatare differentially expressed (DE). The hypergeometric P-value iscommonly used in the enrichment analysis to investigate whethergenes from pre-defined functional groups, e.g., as represented byGene Ontology (GO) annotations, are enriched in the DE gene list.The hypergeometric P-value has three limitations: 1) it is com-puted independently for each GO term and thus neglects the inter-relationship among neighboring terms; 2) the P-value has a sizeconstraint, i.e., a lower limit determined by the size of GO term,which makes it biased towards selecting larger (less-specific) GOterms; and 3) overlapping genes in GO terms are repeatedly used inthe calculation of the P-value. In this paper, we propose a Bayesianmodel based on the non-central hypergeometric distribution to over-come the above limitations. The dependence structure among GOterms is incorporated through a prior on the non-centrality parame-ter. The resulting measure for enrichment is a posterior probabilitywhich does not have the size constraint. Also, the overlapping infor-mation is removed from the likelihood function. We show that thismethod, with the above improvements, can detect moderate but con-sistent enrichment signals and identify sets of closely-related andbiologically-meaningful GO terms rather than individual isolatedGO terms.

Bayesian Spatial Clustering Method and Its Applicatoin in Ra-diology�Song Zhang1 and Yin Xi21The University of Texas Southwestern Medical Center2Southern Methodist [email protected]

Kidney cancer is among the ten most common cancers in human.The dynamic contrast-enhanced MRI (DCE-MRI) takes advantageof the interaction between a contrast agent and adjacent water pro-tons which generates brighter signals in the scan image. In thisstudy, we propose a novel Bayesian spatial clustering method basedon a mixture of multivariate normal distribution. A latent condi-tional regression (CAR) process is employed to account for the spa-tial correlation among clustering indexes. The proposed method isdemonstrated to provide smoother and more accurate clustering of

62 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 71: Published by: International Chinese Statistical

Abstracts

pixels. A simulation study and a real application example are pre-sented.

Adjusting for Heterogeneity in Infectivity in HIV PreventionClinical Trials�Jingyang Zhang1 and Elizabeth Brown1,2

1Fred Hutchinson Cancer Research Center2University of [email protected]

The focus of this work is to estimate the effectiveness of a candi-date intervention at an individual exposure to HIV. Unlike the over-all effectiveness for the population estimated by the Cox propor-tional hazard model, the individual-level effectiveness accounts forthe variations in the exposure process among study participants. Wepropose a Bayesian hierarchical model to estimate the individual-level effectiveness by adjusting for the heterogeneity in risk withmulti-state processes. This model allows a time-varying magnitudeof exposure process for each participant. A simulation is conductedto assess the model performance and a data set from an HIV pre-vention trial is applied for illustration.

Canonical Variate Regression�Chongliang Luo1, Jin Liu2, Dipak Dey1 and Kun Chen1

1University of [email protected]

In many fields, multi-view datasets, measuring multiple distinct butinterrelated sets of characteristics on the same set of subjects, to-gether with data on certain outcomes or phenotypes, are routinelycollected. For example, in cancer study, microscopic organ tissuemeasurements, genetic profiles, and organ function test results mayall be available. The objective in such a problem is often twofold:both to explore the association structures of multiple sets of mea-surements and to develop a parsimonious model for predicting thefuture outcomes. We study a unified canonical variate regressionframework to tackle the two problems simultaneously, allowingthem to flexibly borrow strength from each other and hence rein-force each other. The proposed criterion integrates multiple canon-ical correlation analysis with predictive modeling, balancing be-tween the association strength of the canonical variants and theirjoint predictive power on the outcomes. Moreover, the proposedcriterion seeks multiple sets of canonical variates simultaneously toenable the examination of their joint effects on the outcomes, andis able to handle multivariate and non-Gaussian outcomes througha general loss function formulation. The approach thus success-fully bridges the gap between an unsupervised canonical correla-tion analysis and a generalized reduced-rank regression method attwo extremes. An efficient algorithm based on the ideas of variablesplitting and Lagrangian multipliers is developed. Simulation stud-ies show superior performance of the proposed approach comparedto existing alternative methods. We showcase the effectiveness ofthe proposed approach in an alcohol dependence study.

Session 28: Go/No Go Decision Criteria and Probabilityof Success in Pharmaceutical Drug Development

Sample Size Allocation in a Dose-Ranging Trial Combined withPoCQiqi Deng and �Naitee TingBoehringer-Ingelheim Pharmaceuticals [email protected]

In recent years, pharmaceutical industry has experienced many chal-lenges in discovering and developing new drugs, including longclinical development timelines with significant investment risks. Inresponse, many sponsors are working to speed up the clinical devel-opment process. One strategy is to combine the Proof of Concept(PoC) and the dose-ranging clinical studies into a single trial at theearly Phase II development. One important question in designingthis trial is how to calculate the sample size for such a study. Inmost of the early Phase II development programs, the budget con-cerns and ethical concerns may limit the total sample size for thetrial. This manuscript discusses various ways of allocating the sam-ple size to each treatment group, under a given total sample size.

Selecting Development Strategy with Biomarkers�Feng Gao, Yi Liu and Mingxiu [email protected]

Using biomarker to define patient subpopulation with enhancedtreatment effect can play important roles in drug development.When a biomarker is fully established in a disease setting, thedecision is to choose between the whole population and somebiomarker-defined subpopulations or to incorporate both in the trial.This presentation will discuss how statisticians may influence thedecision making process and the decision itself in developmentstrategy. We will describe a simulation tool for assisting the de-velopment team and the company management to make better deci-sions.

Backward Bayesian Go/No-Go in the Early PhasesYin YinParexel [email protected]

This work assists sponsors to decide early phase Go/No-Go crite-ria based on the ultimate efficacy or safety target which is usu-ally clearer for Phase 3. Based on the definition of success forPhase 3, prior information, cost of later phases, this work graphi-cally presents the quantitative relationships between the followingfactors: true effect, early study result, study designs (e.g., samplesize, duration, or dose), target probability of success (PoS), ex-pected financial loss, expected probability of terminating a poten-tially successful asset. An example demonstrates how to accomplishthese objectives for an exponential model describing the trajectoryof weight loss. An Excel Workbook calculates PoS, probability ofwrong Go, probability of wrong No-Go, and expected quantitativeconsequences along with a variety of relationships and trends when-ever an exponential model is appropriate, e.g., for a dose responsestudy. The sponsor can optimize the Go/No-Go criteria based on theupper limit of the expected loss. It can also be generalized for othernonlinear models.

Evaluation of Program Success for Programs with Multiple Tri-als in Binary Outcomes�Meihua Wang, Guanghan Liu and Jerald SchindlerMerck & [email protected]

A late stage clinical development program typically contains mul-tiple trials. Conventionally, the program’s success or failure maynot be known until the completion of all trials. Nowadays, interimanalyses are often used to allow evaluation for early success and/orfutility for each individual study by calculating conditional power,predictive power and other indexes. It presents a good opportunityfor us to estimate the probability of program success (POPS) forthe entire clinical development earlier. The sponsor may abandon

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 63

Page 72: Published by: International Chinese Statistical

Abstracts

the program early if the estimated POPS is very low, and thereforepermit the resource savings and reallocation to other products. Weprovide a method to calculate probability of success (POS) at indi-vidual study level and also POPS for clinical programs with multi-ple trials in binary outcomes. Methods for calculating variation andconfidence measures of POS and POPS, and timing for interim anal-ysis will be discussed and evaluated through simulations. We alsoillustrate our approaches on historical data retrospectively from acompleted clinical program for depression. The features and limi-tations of the proposed approaches will be discussed, as well as theoperational challenges in practical implementations.

Session 29: Machine Learning for Big Data Problems

A Scalable Integrative Model for Heterogeneous Genomic DataTypes under Multiple ConditionsMai Shi and �Yingying WeiThe Chinese University of Hong [email protected] key problem in biology is how the same copy of a genome withina person can give rise to hundreds of cell types. Plentiful convincingevidence indicates multiple elements, such as transcription factorbinding, histone modification, and DNA methylation, all contributeto the regulation of gene expression levels in different cell types.Therefore, it is crucial to understand how these heterogeneous regu-latory elements collaborate together, how the cooperation at a givengenomic region changes across diverse cell lines, as well as howsuch dynamic cooperation patterns across cell lines vary along thewhole genome. Here, we propose a scalable hierarchical proba-bilistic generative model to cluster genomic regions according tothe dynamic changes of their open chromatin and DNA methylationstatus across cell types. The model will overcome the exponentialgrowth of parameter space as the number of cell types integrated in-creases. The fitted results of the model will provide a genome-wideregion-specific, cell-line-specific open chromatin and DNA methy-lation landscape map.

Greedy Tree Learning of Optimal Personalized TreatmentRules�Ruoqing Zhu1, Yingqi Zhao2, Guanhua Chen3, Shuangge Ma1 andHongyu Zhao1

1Yale University2University of Wisconsin-Madison3Vanderbilt [email protected] propose a subgroup identification approach for detecting opti-mal personalized treatment rules. The problem is transformed intoa weighted classification problem, and the subgroups are identifiedthrough a subject-weighted classification tree. A greedy tree con-struction process adopts a newly proposed method, reinforcementlearning trees, to pursuit signals in high-dimensional settings. Themethod is also extended to right censored survival data by utilizingthe accelerated failure time model and introducing double weightingto the classification trees. The performance of the proposed methodis demonstrated via simulation studies and analyses of the CancerCell Line Encyclopedia (CCLE) data.

ROC Analysis for Multiple Markers with Tree-Based Classifi-cationMei-Cheng Wang1 and �Shanshan Li21Johns Hopkins University2Indiana University

[email protected]

Multiple biomarkers are frequently observed or collected for de-tecting or understanding a disease. In this talk, we extend tools ofROC analysis from univariate marker setting to multivariate markersetting for evaluating predictive accuracy of biomarkers using atree-based classification rule. Using an arbitrarily combined and-orclassifier, an ROC function together with a weighted ROC function(WROC) and their conjugate counterparts are introduced for exam-ining the performance of multivariate markers. Specific features ofthe ROC and WROC functions and other related statistics are dis-cussed in comparison with those familiar properties for univariatemarker. Nonparametric methods are developed for estimating theROC and WROC functions, and area under curve (AUC) and con-cordance probability. With emphasis on population average perfor-mance of markers, the proposed procedures and inferential resultsare useful for evaluating marker predictability based on multivari-ate marker measurements with different choices of markers, and forevaluating different and-or combinations in classifiers.

Tissue Classification Through Imaging Texture Analysis�Peng Huang, Siva Raman, Linda Chu, Jamie Schroeder, MalcolmBrock, Franco Verde and Elliot FishmanJohns Hopkins [email protected]

Cancer development and progression is associated with intratumoralheterogeneity. Tumor diagnosis and staging are typically based onlesion’s anatomic appearance and extent of tumor spread. A limita-tion of current tumor diagnosis methods for all imaging modalitiesis that image interpretation is based on visual process, yet many im-age features cannot be visualized by naked eyes. We used imagetexture analysis to evaluate image intensity and the position of thepixels within an image to derive texture features to quantify intra-tumoral heterogeneity. Random forest was used to analyze theseimaging features along with clinical and demographic characteris-tics. We illustrate our methodology using data from lung, kidney,liver, and pancreatic cancer studies.

Session 30: Tensor-Structured Statistical Modelling andInferences

Dimenstion Reduction for Tensor Structure DataI-Ping TuInstitute of Statistical Science, Academia [email protected]

Dimension reduction is one key step in statistical analysis for highdimensional data. When eachobservation is a matrix or a higher or-der tensor, the traditional approach is to vectorize the data beforeexecuting reduction algorithms. This approach often leads to anextremely high dimensional problem which comes along with in-tensive computations and inefficient estimations. High order SVDand Multilinear principal component analysis (MPCA) are thus pro-posed for tensor structure data. They reduce each mode space ofthe tensor separately and thus reduce the computations significantly.One criticism to the new approach is that, unlike PCA, the projecteddata in the reduced space are still correlated. To this end, we pro-pose a two stage dimension reduction method, called structure PCA(SPCA). SPCA employs MPCA on the tensor data first, and thenapplies PCA on the vectorized projected core scores from MPCA.A successful application of SPCA on a cryo-electron microscopyimage data will also be presented.

64 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 73: Published by: International Chinese Statistical

Abstracts

Detection of Gene-Gene Interactions Using Tensor RegressionHung Hung1, Yu-Ting Lin2, Penweng Chen3, Chen-Chien Wang4,Su-Yun Huang2 and �Jung-Ying Tzeng5

1National Taiwan University2Institute of Statistical Science, Academia Sinica3National Chung Hsing University4Yahoo Inc.5North Carolina State [email protected] an efficient and computationally feasible approach to dealwith the curse of high-dimensionality is a daunting challenge facedby modern biological science. The problem becomes even moresevere when the interactions are the research focus. To improvethe performance of statistical analyses, we propose a sparse andlow-rank (SLR) screening based on the tensor interaction regres-sion and lasso. SLR models the interaction effects using a low-rank matrix to achieve parsimonious parametrization. The low-rankmodel increases the efficiency of statistical inference and, hence,SLR screening is able to more accurately detect gene-gene interac-tions than conventional methods. We illustrate the utility of the pro-posed procedure using real data application and simulations. Theresults suggest that the proposed procedure can identify main andinteraction effects that would have been omitted by conventionalscreening methods.

Rank Selection for Multilinear PCADai-Ni Hsieh, �Su-Yun Huang and I-Ping TuInstitute of Statistical Science, Academia [email protected] study the intrinsic model complexity of multilinear principalcomponent analysis for model rank selection. This model com-plexity, called effective degrees of freedom, is defined as the firstderivative of the fitted tensor as a function of tensor observations.An unbiased estimate of the degrees of freedom is derived usingStein’s identity. The degrees of freedom depend on the eigenvaluesdispersion of each tensor mode. We illustrate how the degrees offreedom can be used for multilinear PCA rank selection.

Discussion: Tensor-Structured Statistical Modelling and Infer-encesMong-Na Lo HuangNational Sun Yat-sen [email protected] will discuss issues related to tensor-structured statistical modelsin reducing the dimension of regression models for making infer-ences with high dimensional data, especially those related to imageand genomic studies.

Session 31: Adaptive Designs for Early-Phase OncologyClinical Trials

Subgroup-Based Adaptive (SUBA) Designs for Multi-ArmBiomarker Trials�Yanxun Xu1, Lorenzo Trippa2, Peter Mueller3 and Yuan Ji4,51Johns Hopkins University2Harvard University3University of Texas at Austin4NorthShore University HealthSystem5University of [email protected]

Targeted therapies based on biomarker profiling are becoming amainstream di- rection of cancer research and treatment. Depending

on the expression of specific prognostic biomarkers, targeted ther-apies assign different cancer drugs to subgroups of patients evenif they are diagnosed with the same type of cancer by traditionalmeans, such as tumor location. For example, Herceptin is only in-dicated for the subgroup of patients with HER2+ breast cancer, butnot other types of breast cancer. How- ever, subgroups like HER2+breast cancer with effective targeted therapies are rare and most can-cer drugs are still being applied to large patient populations that in-clude many patients who might not respond or benefit. Also, theresponse to targeted agents in humans is usually unpredictable. Toaddress these issues, we propose SUBA, subgroup-based adaptivedesigns that simultaneously search for prognostic subgroups and al-locate patients adaptively to the best subgroup-specific treatmentsthroughout the course of the trial. The main features of SUBA in-clude the continuous reclas- sification of patient subgroups basedon a random partition model and the adaptive allocation of patientsto the best treatment arm based on posterior predictive proba- bili-ties. We compare the SUBA design with three alternative designs in-cluding equal randomization, outcome-adaptive randomization anda design based on a probit re- gression. In simulation studies wefind that SUBA compares favorably against the alternatives.

A Curve-free Bayesian Decision-theoretic Design for Two-agentPhase I TrialsBee Leng Lee1, �Shenghua Fan2 and Ying Lu3

1San Jose State University2California State University, East Bay3Stanford [email protected]

Although Bayesian statistical methods are gaining attention in themedical community, as they provide a natural framework for incor-porating prior information, the complexity of these methods limitedtheir adoptions in clinical trials. This article proposes a Bayesiandesign for two-agent phase I trials that is relatively easy for clin-icians to understand and implement, yet performs comparably tomore complex designs, so that it is more likely to be adopted in ac-tual trials. In order to reduce model complexity and computationalburden, we choose a working model with conjugate priors so thatthe posterior distributions have analytical expressions. Furthermore,we provide a simple strategy to facilitate the specification of priorsbased on the toxicity information accrued from single-agent phase Itrials. The proposed method should be useful in terms of the ease ofimplementation and the savings in sample size without sacrificingperformance. Moreover, the conservativeness of the dose-findingalgorithm renders it a relatively safe method.

Bayesian Dose-finding Designs for Combination of MolecularlyTargeted Agents Assuming Partial Stochastic OrderingBeibei Guo1 and �Yisheng Li21Louisiana State University2M. D. Anderson Cancer [email protected]

Molecularly targeted agent (MTA) combination therapy is in theearly stages of development. When using a fixed dose of one agentin combinations of MTAs, toxicity and efficacy do not necessarilyincrease with an increasing dose of the other agent. Thus, in dose-finding trials for combinations of MTAs, interest may lie in identify-ing the optimal biological dose combinations (OBDCs), defined asthe lowest dose combinations (in a certain sense) that are safe andhave the highest efficacy level meeting a prespecified target. Thelimited existing designs for these trials use parametric dose-efficacyand dose-toxicity models. Motivated by a phase I/II clinical trial of a

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 65

Page 74: Published by: International Chinese Statistical

Abstracts

combination of two MTAs in patients with pancreatic, endometrial,or colorectal cancer, we propose Bayesian dose-finding designs toidentify the OBDCs without parametric model assumptions. Theproposed approach is based only on partial stochastic ordering as-sumptions for the effects of the combined MTAs and uses isotonicregression to estimate partially stochastically ordered marginal pos-terior distributions of the efficacy and toxicity probabilities. Wedemonstrate that our proposed method appropriately accounts forthe partial ordering constraints, including potential plateaus on thedose-response surfaces, and is computationally efficient. We de-velop a dose-combination-finding algorithm to identify the OBDCs.We use simulations to compare the proposed designs with an alter-native design based on Bayesian isotonic regression transformationand a design based on parametric change-point dose-toxicity anddose-efficacy models and demonstrate desirable operating charac-teristics of the proposed designs.

Session 32: Recent Development on Next Generation Se-quencing Based Data Analysis

Bayesian nonparametric models for Tumor Heterogeneity usingNext-Generation Sequencing Data�Yuan Ji1,2, Yanxun Xu3, Juhee Lee4, Subhajit Sengupta1, PeterMueller3 and Yitan Zhu1

1Northshore University HealthSystem2University of Chicago3The University of Texas at Austin4University of California, Santa [email protected]

I will discuss Bayesian nonparametric models for the inference oftumor heterogeneity using next-generation sequencing data. Thetask is to reconstruct cell subpopulations in a tumor sample or mul-tiple samples possessing unique genetic variants. Statistical infer-ence is based on feature allocation models such as the Indian buffetprocess (IBP). I will discuss extensions of the classical IBP and theirapplications to tumor heterogeneity. Various examples will be pre-sented aiming to reveal intra-tumor heterogeneity using subclonalsingle nucleotide variants and copy number variants.

Leveraging in Big Data AnalyticsPing MaUniversity of [email protected]

Rapid advance in science and technology in the past decade bringsan extraordinary amount of data, offering researchers an unprece-dented opportunity to tackle complex research challenges. The op-portunity, however, has not yet been fully utilized, because effectiveand efficient statistical tools for analyzing super-large dataset arestill lacking. One major challenge is that the advance of computingresources still lags far behind the exponential growth of database.In this talk, I will introduce a family of statistical leveraging meth-ods to facilitate scientific discoveries using current computing re-sources. Leveraging methods are designed under a subsamplingframework, in which one samples a small proportion of the data(subsample) from the full sample, and then performs intended com-putationsfor the full sample using the small subsample as a sur-rogate. The key of the success of the leveraging methods is toconstruct nonuniform sampling probabilities so that influential datapoints are sampled with high probabilities. These methods stand asthe very unique development of their type in big data analytics and

allow pervasive access to massive amounts of information withoutresorting to high performance computing and cloud computing.

Investigating Microbial Co-occurrence Patterns Based on Me-tagenomic Compositional Data�Yuguang Ban1, Lingling An2 and Hongmei Jiang1

1Northwestern University2University of [email protected]

Metagenomics has provided us a powerful tool to study the micro-bial organisms living in various environments. Characterizing theinteractions among the microbes can give us insights into how theywork and live together as a community. Analyzing microbial re-lationships on metagenomic compositional data using conventionalcorrelation methods has been shown prone to bias that leads to ar-tifactual correlations. We propose a novel method, REBACCA, toidentify co-occurrence patterns using log ratios of count data andsolve its equivalent system using the l1-norm shrinkage method.Our simulation studies show that REBACCA 1) achieves higheraccuracy in general 2) is more robust to various structures of net-works; 3) is computationally efficient as compared to other existingmethods.

Rapid Alignment and Filtration for Accurate Pathogen Identifi-cation in Clinical Samples Using Unassembled Sequencing Data�W. Evan Johnson1, Solaiappan Manimaran1, Changjin Hong1,Keith Crandall2 and Eduardo Castro-Nallar21Boston University2The George Washington [email protected]

The use of sequencing technologies to investigate the microbiome ofa sample can positively impact patient healthcare by providing ther-apeutic targets for personalized disease treatment. However, thesesamples contain genomic sequences from various sources that com-plicate the identification of pathogens. Here we present a pipeline torapidly and accurately remove host contamination, isolate microbialreads, and identify potential disease-causing pathogens. We devel-oped an optimized framework for pathogen identification using acomputational subtraction methodology in concordance with readtrimming and ambiguous read reassignment. We have also demon-strated the ability of our approach to identify multiple pathogens ina single clinical sample, accurately identify pathogens at the sub-species level, and determine the nearest phylogenetic neighbor ofnovel or highly mutated pathogens using real clinical sequencingdata. Finally, we have shown our approach outperforms previouslypublished pathogen identification methods with regard to computa-tional speed, sensitivity, and specificity.

Session 33: Challenges of Quantile Regression in High-Dimensional Data Analysis: Theory and Applications

Regularized Quantile Regression for Quantitative GeneticTraits�Chad He1, Linglong Kong2, Yanhua Wang1, Sijian Wang3, Timo-thy Chan4 and Eric Holland1

1Fred Hutchinson Cancer Research Center2University of Alberta3University of Wisconsin-Madison4Memorial Sloan-Kettering Cancer [email protected]

Genetic association studies often involve quantitative traits, such asBody Mass Index, blood pressure, and lipids level. Quantile re-

66 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 75: Published by: International Chinese Statistical

Abstracts

gression method considers the conditional quantiles of the responsevariable, and is able to describe the underlying structure in a morecomprehensive manner. Here, we introduce a regularized quantileregression method that is able to characterize the underlying geneticstructure under high dimensions, and at the same time is able to ac-count for the potential genetic heterogeneity. We investigate thetheoretical property of our method, and examine its performancethrough a series of simulation studies. A real dataset is analyzed todemonstrate the usefulness of the proposed method.

Globally Adaptive Quantile Regression with High DimensionalData�Qi Zheng1, Limin Peng1 and Xuming He21Emory University2University of [email protected] regression has become a valuable tool to analyze hetero-geneous covaraite-response associations that are often encounteredin practice. The development of quantile regression methodologyfor high dimensional covariates primarily focuses on examinationof model sparsity at a single or multiple quantile levels, which aretypically prespecified ad hoc by the users. The resulting modelsmay be sensitive to the specific choices of the quantile levels, lead-ing to conceptual difficulties in identifying relevant variables of in-terest. We propose a new penalization framework for quantile re-gression in the high dimensional setting. We employ adaptive L1

penalties, and more importantly, propose a uniform selector of thetuning parameter for a set of quantile levels to avoid potential prob-lems with model selection at individual quantile levels. Our pro-posed approach achieves consistent shrinkage of regression quantileestimates across a continuous range of quantiles levels, enhancingthe flexibility and robustness of the existing penalized quantile re-gression methods. Our theoretical results include the oracle rate ofuniform convergence and weak convergence of the parameter esti-mators. We also use numerical studies to confirm our theoreticalfindings and illustrate the practical utility of our proposal.

Focused Information Criterion and Model Averaging Based onWeighted Composite Quantile Regression�Ganggang Xu1, Suojin Wang2 and Jianhua Huang2

1Binghamton University2Texas A&M [email protected] study the focused information criterion and frequentist modelaveraging and their application to post-model-selection inferencefor weighted composite quantile regression (WCQR) in the contextof the additive partial linear models. With the non-parametric func-tionsapproximated by polynomial splines, we show that, under cer-tain conditions, the asymptotic distribution of the frequentist modelaveraging WCQR-estimator of a focused parameter is a non-linearmixture of normal distributions. This asymptotic distribution is usedto construct confidence intervals that achieve the nominal coverageprobability. With properly chosen weights, the focused informa-tion criterion basedWCQR estimators are not only robust to outliersand non-normal residuals but also can achieve efficiency close tothe maximum likelihood estimator, without assuming the true errordistribution. Simulation studies and a real data analysis are used toillustrate the effectiveness of the proposed procedure.

Bayesian Quantile Regression via Dirichlet Process Mixture ofLogistic Distributions�Chao Chang and Nan LinWashington University in St. Louis

[email protected]

We propose a new nonparametric Bayesian approach to solve quan-tile regression for a single quantile. One innovation of this paperis that the error distribution is modelled by using the Dirichlet pro-cess mixture of relatively unexplored densities – logistic distribu-tions, which have the desired feature of being smooth and havinga close-formed quantile function. Also unlike other methods basedon Dirichlet process mixture, which require the kernel densities tosatisfy the quantile constraint, our kernel function is just the sim-ple logistic densities. The quantile constraint is satisfied by a post-process of the Dirichlet process mixture by a suitable location shift.Although we have a simpler kernel, our mixture model can still pro-vide great flexibility by mixing over both the location parameterand the scale parameter. The posterior consistency of our proposedmodel is studied carefully. And Monte Carlo Markov chain algo-rithm is provided to do posterior inference. The performance of ourapproach is evaluated using simulated data and real data.

Session 34: Recent Advances in Genomics

Accounting For Gene Length in RNA-Seq Data�Patrick Harrington and Lynn KuoUniversity of [email protected]

Next-generation sequencing is being used to advance genetics andbiological research at a rapid pace. One of the goals is to identifydifferentially expressed transcripts between two or more differentconditions based on RNA-seq data. In addition to the complexity ofthe data, it has been shown empirically that differentially expressed(DE) transcripts of a longer length are more likely to be identifiedthan their shorter counterparts. While methods have been suggestedto correct for this bias, we explore hierarchical Bayesian modelswith a negative binomial distributional assumption to address thislength bias and identify differentially expressed transcripts. We dis-cuss how gene length is calculated and considered, and introduce acondition specific gene length. We also use gene length to createa zero inflated model to account for abundance of zero counts inRNA-Seq data. We use real data as well as a simulation study toshow the benefit of our approach, as well as address questions ofover and under fitting and the effect on classification of DE genes.

Integrating Diverse Genomics Data to Infer Regulations�Yuping Zhang1 and Hongyu Zhao2

1University of Connecticut2Yale [email protected]

Recent advances in high-throughput biotechnologies have generatedunprecedented types and amounts of data for biomedical research.It is likely that integrating results from diverse experiments maylead to a more unified and global view of complex diseases such ascancer. In this talk, we will address statistical issues in data inte-gration and present a new statistical learning method for integratingdiverse genomics data. Our method provides an integrated pictureof commonalities and differences across tumor types. The perfor-mance of our method will be demonstrated through simulations andapplications to real cancer data.

Phylogenetic Trait Evolution with Drift�Mandev Gill and Marc SuchardUniversity of California, Los [email protected]

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 67

Page 76: Published by: International Chinese Statistical

Abstracts

An important issue in statistical phylogenetics is understandingthe processes giving rise to quantitative characters associated withmolecular sequence data. Examples of such characters include ge-ographic coordinates and phenotypic traits. A popular approach isto model trait evolution as a Brownian diffusion process acting ona phylogenetic tree. We extend this approach in a flexible Bayesianmodel that incorporates a nontrivial drift in the Brownian diffusion.We apply our model to viral data sets and demonstrate an improvedability to understand the dynamics of phenotypic drift and the spa-tiotemporal spread of epidemics.

Session 35: Novel Designs and Applications of AdaptiveRandomization in Medical Research

Statistical Inference for Covariate Adaptive Randomized Clini-cal Trials with Survival EndpointsLu Wang1, Jing Ning2 and �Hongjian Zhu1

1The University of Texas School of Public Health2M. D. Anderson Cancer [email protected] adaptive randomization (CAR) procedures such as thestratified permuted block randomization become standard in clin-ical trials. It is well known that the type I error rates for clinicaltrials using CAR are problematic if not all the design covariates areincluded in the data analysis. However, there is few research on thetheoretical investigation for these trials with time to event outcomes.In this talk, we study clinical trials using CAR for randomizationand the accelerated failure time model for analysis. We derive theasymptotic distribution of the test statistics and explicitly show thereason for the conservation of the type I error rate. We also pro-pose two approaches to protect the type I error rate and improve theefficiency. Numerical studies support our theoretical findings anddemonstrate the advantages of the proposed method.

Biomarker-Stratified Adaptive Basket Designs for MultipleCancersLorenzo TrippaDana-Farber Cancer [email protected] in cancer research have shown that tumors have hetero-geneous genetic events, many of which are targetable by anticanceragents. Tests of treatment-biomarker interactions tend to be under-powered when done as secondary objectives in trials of drug effi-cacy. Clinical trials that make use of adaptive randomization andBayesian prediction have been proposed as more efficient for inves-tigating multiple agents and predictive biomarkers. We proposedthat clinical studies enrolling patients with multiple cancer modal-ities (CSMCM) would greatly contribute to statistical learning andaccelerate the pace at which new drugs are studied. In silico sim-ulations to evaluate the benefit of CSMCMs in a research portfoliorequire accurate parameters of (1) accession of patients by diseasemodality, and (2) joint prevalence of target gene mutations. At theDana-Farber Cancer Institute (DFCI), a research study was initiatedin 2011 to parallelize molecular profiling with routine histopathol-ogy at diagnosis or disease progression, and to date has assayed>5000 patients across 11 disease centers. I present the design andcharacteristics of in silico studies parameterized from this cohort

Outcome Adaptive Randomization for Comparative Effective-ness Clinical TrialsMei-Chiung ShihVA Cooperative Studies Program

[email protected] goal of comparative effectiveness research (CER) is to supportevidence-based choices of treatments. Currently comparative effec-tiveness trials are a small fraction of the totality of CER studies. Thepoint-of-care (POC) comparative effectiveness trials, which aim toembed clinical research in routine care, have been proposed to ad-dress some of the challenges of randomized CER trials includingcost, complexity, and large sample sizes due to small to moderateeffect sizes. In this talk, we describe the use of outcome adaptiverandomization to bring the benefits of knowledge generated fromthe POC trial to improve health care without having to mount a sep-arate implementation strategy. This is in particular useful when thegoal of the POC trial is to select treatments whose results are closeto the best (yet unknown) available treatment.

Worth Adapting? When and How to Apply Adaptive Random-ization to Make More Bang for the Buck�J. Jack Lee and Yining DuM. D. Anderson Cancer [email protected]

Outcome adaptive randomization (AR) allocates more patients tothe better treatments as the information accumulates in the trial. Isit worth it to apply AR in clinical trials? There are still controver-sies in the medical and statistical communities. Compare to equalrandomization (ER), AR produces a higher overall response rate atthe cost of larger sample size and with higher variability in the trialoperating characteristics. However, improvements can be made toaddress these weaknesses. For example, adding a burn-in period ofER, applying a power transformation on the randomization proba-bility, or bounding the randomization probability by a clip method.The tradeoff of the numbers of patients on the trial and beyond thetrials in the light of the total patient horizon is also examined. Bycarefully choosing the method and the tuning parameters, AR meth-ods can be tailored to strike a balance between achieving the desiredstatistical power, limit the increase sample size and its variability,and enhancing the overall response rate.

Session 36: Lifetime Data Analysis

Statistical Inference on Quantile Residual LifeJong JeongUniversity of [email protected]

The residual lifetime has a straightforward interpretation for theanalysis results from time-to-event data. The quantile residual lifefunction is desirable for summarizing a skewed time-to-event distri-bution, which is often encountered in reliability and survival data.In this talk, recent developments in statistical inference on the quan-tile residual life function with and without competing risks will bereviewed. Specific numerical examples will be presented to demon-strate how the methods work and the methods will be also illustratedwith real examples based on clinical trial datasets.

Onset Time of Chronic Pseudomonas Aeruginosa Infection inCystic Fibrosis Patients with Interval Censored DataWenjie Wang1, Huichuan Lai2, �Jun Yan1 and Zhumin Zhang2

1University of Connecticut2University of [email protected]

Chronic pseudomonas aeruginosa (PA) infection indicates lungfunction deterioration in children cystic fibrosis (CF) patients. Mod-eling the onset time of chronic PA infection is important for clini-

68 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 77: Published by: International Chinese Statistical

Abstracts

cians to devise better treatment plan and patient management. Dueto the scheduled visits for patients in the cystic fibrosis foundationpatient registry and the definition of chronic PA infection, the onsettime is only known up to a certain interval. The analysis if fur-ther challenged by the need to allow some risk factors to have time-varying effects on the onset time. This problem fits into the frame-work of Bayesian dynamic Cox model for interval censored datarecently developed by Wang, Chen, and Yan (2013, Lifetime DataAnalysis 19:297–316). Application of the methodology to the onsettime of chronic PA infection of children CF patients revealed in-teresting findings. Compared with patients diagnosed via new bornscreening, patients diagnosed via meconium ileus or other symp-toms had moderately higher risks of acquiring PA infections withincertain range of young age. Two cohorts of 5 years apart were com-pared, and patients in the more recent cohort were found to havelower risks of chronic PA infection before age 3.

A Model for Time to Fracture with a Shock Stream Superim-posed on Progressive Degradation: the Study of OsteoporoticFractures�Xin He1, G. A. Whitmore2, Geok Yan Loo1, Marc Hochberg3 andMei-Ling Lee11University of Maryland2McGill University3University of Maryland Baltimore [email protected]

Osteoporotic hip fractures in the elderly are associated with a highmortality in the first year following fracture and a high incidenceof disability among survivors. We study first and second fracturesof elderly women using data from the Study of Osteoporotic Frac-tures. We present a new conceptual framework, stochastic model,and statistical methodology for time to fracture. Our approach givesadditional insights into the patterns for first and second fractures andthe concomitant risk factors. Our modeling perspective involves anovel time-to-event methodology called threshold regression, whichis based on the plausible idea that many events occur when an un-derlying process describing the health or condition of a person orsystem encounters a critical boundary or threshold for the first time.In the parlance of stochastic processes, this time to event is a firsthitting time of the threshold. The underlying process in our modelis a composite of a chronic degradation process for skeletal healthcombined with a random stream of shocks from external traumas,which taken together trigger fracture events.

Explained Variation in Correlated Survival DataGordon Honerkamp-Smith and �Ronghui XuUniversity of California, San [email protected]

Explained variation in survival data has attracted much attention inrecent years. In this talk we consider explained variation in cor-related survival data, specifically under the proportional hazardsmixed-effects modeling (PHMM) of such data, which provides anatural setting for the decomposition of different sources of varia-tion. A motivation of the concept originated from genetic epidemi-ology. More generally, explained variation can be formulated indifferent ways, and we discuss the formulations that are often en-countered in the literature and that are interpretable in practice. Westudy the proposed measures both in theory and through simulation,and discuss some common pitfalls in using and understanding suchmeasures in practice. To conclude, we show some interesting appli-cations to both multi-center clinical trials and recurrent events.

Session 37: Statistical Methods for Large Computer Ex-periments

Uncertainty Propagation using Dynamic Discrepancy for aMulti-scale Carbon Capture System�K. Sham Bhat1, Curt Storlie1, David Mebane2 and PriyadarshiMahapatra3

1Los Alamos National Laboratory2West Virginia University3URS [email protected]

Uncertainties from model parameters and model discrepancy fromsmall-scale models impact the accuracy and reliability of predic-tions of large-scale systems. Inadequate representation of these un-certainties may result in inaccurate and overconfident predictionsduring scale-up to larger systems. Hence multiscale modeling ef-forts must accurately quantify the effect of the propagation of uncer-tainties during upscaling. Using a Bayesian approach, we calibratea small-scale solid sorbent model to Thermogravimetric (TGA) dataon a functional profile using chemistry-based priors. Crucial tothis effort is the representation of model discrepancy, which usesa Bayesian Smoothing Splines (BSS-ANOVA) framework. Our un-certainty quantification (UQ) approach could be considered intru-sive as it includes the discrepancy function within the chemical rateexpressions; resulting in a set of stochastic differential equations.Such an approach allows for easily propagating uncertainty by prop-agating the joint model parameter and discrepancy posterior into thelarger-scale system of rate expressions. The broad UQ frameworkpresented here could be applicable to virtually all areas of sciencewhere multiscale modeling is used.

Bayesian Calibration of Computer Models with InformativeFailures�Peter Marcy and Curtis StorlieLos Alamos National [email protected]

Gaussian process emulators are widely used to calibrate determin-istic computer codes (simulators) to experimental/field data. Thegoal of such an analysis is to determine which values of the physi-cal parameters used in the computer model are most consistent withthe experimental observations. However, there may be incompletesimulated data as some simulators can fail to produce output forparticular combinations of the inputs. Under the assumption thatthe failed model runs correspond to regions of the parameter spacewhich are not physically feasible, the missing data can be incorpo-rated into a Bayesian calibration routine. In this talk I detail theprocedure and then illustrate it using a computational fluid dynam-ics model for carbon capture.

A Frequentist Approach to Computer Model Calibration�Raymond K. W. Wong1, Curtis B. Storlie2 and Thomas C. M. Lee31Iowa State University2Los Alamos National Laboratory3University of California, [email protected]

We consider the computer model calibration problem and provide ageneral frequentist approach with uncertainty quantification. Underthe proposed framework, the data model is semi-parametric witha nonparametric discrepancy function which accounts for any dis-crepancy between the physical reality and the simulator. In an at-tempt to solve the fundamentally important (but often ignored) iden-tifiability issue between the computer model parameters and the dis-crepancy function, we propose a new and identifiable parametriza-

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 69

Page 78: Published by: International Chinese Statistical

Abstracts

tion of the calibration problem. We also develop a two-step pro-cedure for estimating all the relevant quantities under the new pa-rameterization. This estimation procedure is shown to enjoy excel-lent rates of convergence and can be straightforwardly implementedwith existing software. For uncertainty quantification, bootstrap-ping is adopted to construct confidence regions for the quantities ofinterest. The practical performance of the proposed methodologyis illustrated through simulation examples and an application to acomputational fluid dynamics model.

Session 38: New Approaches for Analyzing Time SeriesData

Spectral Analysis of Linear Time Series in Moderately High Di-mensionsLili Wang1, Alexander Aue2 and �Debashis Paul21Zhejiang University2University of California, [email protected] study the spectral behavior of p-dimensional linear processes inthe moderately high-dimensional setting when both dimensionalityp and sample size n tend to infinity so that p/no0. Under appropri-ate regularity conditions, it is shown that the empirical spectral dis-tributions of the renormalized symmetrized sample autocovariancematrices converge almost surely to a nonrandom limit distributions.The key structural assumption is that the linear process is driven bya sequence of p-dimensional real or complex random vectors withiid entries possessing zero mean, unit variance and finite fourth mo-ments, and that the coefficient matrices in the linear process rep-resentation are Hermitian and simultaneously diagonalizable. Theresults facilitate inference on model parameters and model diagnos-tics.

High Order Corrected Estimator of Time-average VarianceConstant�Chun yip Yau and Kin Wai ChanThe Chinese University of Hong [email protected] of time-average variance constant (TAVC), which is theasymptotic variance of the sample mean of a time series, is of fun-damental importance in statistical inference. In this paper, by con-sidering high order corrections to the asymptotic biases, we developa new class of TAVC estimator that enjoys optimal convergence rateunder different strength of dependence of the time series. Com-parisons to existing TAVC estimators are comprehensively investi-gated. In particular, the high order corrected estimator has the bestperformance in terms of mean squared error.

Session 39: Statistica Sinica Special Invited Session onSpatial and Temporal Data Analysis

Likelihood Approximations for Big Nonstationary Spatial Tem-poral Lattice Data�Joseph Guinness and Montserrat FuentesNorth Carolina State [email protected] propose a nonstationary Gaussian likelihood approximation forthe class of evolutionary spectral models for data on a regular lattice.Lattice data include many important environmental data sourcessuch as weather model output or gridded data products derived fromsatellite observations. The likelihood approximation is an extension

of the Whittle likelihood and is computationally efficient to evalu-ate when the evolutionary transfer function can be expressed in aflexible low-dimensional form. The low-dimensional form for theevolutionary transfer function is an attractive modeling frameworksince it allows the practitioner to build nonstationary models in asequential manner and choose the appropriate dimension based onchanges in approximate loglikelihood. While the transfer functionsare low-dimensional, the resulting covariance matrices are gener-ally full rank, and thus no rank reduction is required for the compu-tational efficiency of the methods. We study the covariance matriximplied by the likelihood approximation and give its asymptotic rateof approximation to the exact covariance matrix. We evaluate thelikelihood approximation in a simulation study and show that it canproduce asymptotically efficient parameter estimates when an op-eration similar to tapering is applied. We introduce an algorithmbased on the Ising model to partition the domain into stationarysubregions and show in a simulation that the methodscan reliablyrecover an unknown partition. We apply our modeling and estima-tion framework to analyze spatial-temporal output from a regionalweather model comprised of 151,200 wind speed values, and wedemonstrate that the fitted covariances are consistent with local em-pirical variograms.

A Multivariate Gaussian Process Factor Model for Hand ShapeDuring Reach-to-Grasp MovementsLucia Castellanos1, �Vincent Vu2, Sagi Perel1, Andrew Schwartz3

and Robert Kass11Carnegie Mellon University2The Ohio State University3University of [email protected]

We propose a Multivariate Gaussian Process Factor Model to esti-mate low dimensional spatio-temporal patterns of finger motion inrepeated reach-to-grasp movements. Our model decomposes and re-duces the dimensionality of variation of the multivariate functionaldata. We first account for time variability through multivariate func-tional registration, then decompose finger motion into a term thatis shared among replications and a term that encodes the variationper replica- tion. We discuss variants of our model, estimation al-gorithms, and we evaluate its performance in simulations and indata collected from a non-human primate executing a reach-to-grasptask. We show that by taking advantage of the repeated trial struc-ture of the experiments, our model yields an intuitive way to inter-pret the time and replication variation in our kinematic dataset.

A Covariance Parameter Estimation Method for Polar-OrbitingSatellite Data�Michael Horrell and Michael SteinUniversity of [email protected]

We consider the problem of estimating an unknown covariancefunction of a Gaussian random field for data collected by a polar-orbiting satellite. The complex and asynoptic nature of such data re-quires a parameter estimation method that scales well with the num-ber of observations, can accommodate many covariance functions,and uses information throughout the full range of spatio-temporallags present in the data. Our solution to this problem is to developnew estimating equations using composite likelihood methods asa base. We modify composite likelihood methods through the in-clusion of an approximate likelihood of interpolated points in theestimating equation. The new estimating equation is denoted the I-likelihood. We apply the I-likelihood method to 30 days of ozone

70 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 79: Published by: International Chinese Statistical

Abstracts

data occurring in a single degree latitude band collected by a polarorbiting satellite, and we compare I-likelihood methods to compet-ing composite likelihood methods. The I-likelihood is shown ca-pable of producing covariance parameter estimates that are equallyor more statistically efficient than competing composite likelihoodmethods and to be more computationally scalable.

Bayesian Analysis of Spatially-Dependent Functional Re-sponses with Spatially-Dependent Multi-Dimensional Func-tional Predictors�Scott Holan1, Wen-Hsi Yang2, Christopher Wikle1, D. BrentonMyers1 and Kenneth Sudduth3

1University of Missouri-Columbia2CSIRO3U.S. Department of [email protected]

Modeling high-dimensional functional responses utilizing multi-dimensional functional covariates is complicated by spatial and/ortemporal dependence in the observations in addition to high-dimensional predictors. To utilize such rich sources of informa-tion we develop multi-dimensional spatial functional models thatemploy low-rank basis function expansions to facilitate model im-plementation. These models are developed within a hierarchicalBayesian framework that accounts for several sources of uncer-tainty, including the error that arises from truncating the infinite-dimensional basis function expansions, error in the observations,and uncertainty in the parameters. We illustrate the predictive abil-ity of such a model through a simulation study and an applicationthat considers spatial models of soil electrical conductivity depthprofiles using spatially dependent near-infrared spectral images ofelectrical conductivity covariates.

Session 40: Use of Biomarker and Genetic Data in DrugDevelopment

An Overview of Statistical Methods in Biomarker Evaluation�Dawei Liu, John Zhong, Kimberly Crimin, lakshmi Amaravadi,Xiaoxi Li, Stacy Lindborg and Donald JohnsBiogen [email protected]

Biomarkers have been playing an important role in drug discoveryand development, biomedical research and clinical practice. Basedon their utilities and applications, biomarkers can be classifiedinto different categories, for example, pharmacodynamic markers,screening markers, diagnostic markers, prognostic markers, predic-tive markers, safety markers, and surrogate endpoints. The statis-tical methods used for biomarker evaluation are highly diverse inthe sense that different types of markers require different sets of an-alytic techniques. Moreover, most of these methods are also veryspecialized as they are not part of the standard statistical toolbox. Inthis presentation, a broad overview of various statistical methods forbiomarker evaluation will be given, with a focus on challenges andrecent methodological developments in the assessment of prognos-tic, predictive and safety biomarkers, as well as surrogate endpoints.In the field of neurodegenerative diseases, neuroimaging biomarkershave received increasing attention in drug development and clinicalstudies. In this talk statistical issues in the analysis of neuroimagingbiomarkers wull also be reviewed. Toward the end of the presen-tation, the application of machine learning methods in biomarkerevaluation will be discussed.

A Case Study of Integrating Scientific Knowledge with Statisti-cal Biomarker AnalysisSheng FengBiogen [email protected]

Former ASA president Marie Davidian emphasized the importanceof scientific training in statistical education, especially in the era ofBIG DATA. In this presentation, I will introduce a case study, wherescience plays a key role in determining statistical analysis strategies.In designing a future clinical trial, scientists and clinicians askedif ApoE, a well-known gene associated with Alzheimer’s disease(AD), should be used as a population stratification factor. Scientificknowledge of ApoE and how it was discovered being associatedwith AD helped the statistician to better understand the researchquestion, better communicate with collaborators, better plan the sta-tistical analysis, and better organize and present results.

Intratumor Genetic Heterogeneity Analysis and Its Implica-tions in Personalized Medicine�Ronglai Shen and Venkatraman SeshanMemorial Sloan-Kettering Cancer [email protected]

Intratumor heterogeneity is characterized by the presence of geneti-cally and phenotypically distinct subclones of tumor cells within anindividual tumor. Such genetic diversity within a tumor is increas-ingly recognized as a driver of rapid disease progression, resistanceto targeted therapies, and poor survival outcome. It also has impor-tant implications in defining “actionable” driver genes for makingtreatment decisions. Inferring subclonal genetic alterations fromwhole-exome or whole-genome sequencing studies is often con-founded by tumor sample purity (normal cell contamination) andlocal copy number states. We present a statistical framework forinferring intratumor heterogeneity using whole-exome and whole-genome sequencing data. We demonstrate its performance in lungand breast cancer sequencing datasets.

Session 41: New Frontier of Functional Data Analysis

Making Patient-specific Treatment Decisions Based on Func-tional and Imaging Data�Todd Ogden1, Adam Ciarleglio2, Eva Petkova2 and ThaddeusTarpey31Columbia University2New York University3Wright State [email protected]

A major goal of precision medicine is to use information gatheredat the time that a patient presents for treatment to help cliniciansdetermine, separately for each patient the particular treatment thatprovides the best expected outcome. In psychiatry it is thought thatvarious brain imaging techniques may allow for the discovery of in-formation vital to predicting response to treatment. We will presentthe general problem of using both scalar and functional data to guidepatient-specific treatment decisions and describe some approachesthat can be used to perform model fitting and variable selection.

Quantifying Connectivity in Resting State fMRI with Func-tional Data AnalysisJinjiang He1, Xiaoke Zhang2, Owen Carmichael1, �Jane-LingWang1 and Hans-Georg Mueller11University of California, Davis2University of [email protected]

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 71

Page 80: Published by: International Chinese Statistical

Abstracts

Resting state blood oxygen level dependent (BOLD) functionalmagnetic resonance imaging (fMRI) has begun to provide new sci-entific insights into the role of functional connectivity in brain net-work. Temporal Pearson correlation is frequently used to measuretime series similarity, but it ignores statistical dependencies betweentime points in a time series, and in some situations leads to a biasedgroup-level correlation measure. We consider BOLD fMRI timeseries at each voxel of a subject as time-discretized observationsof a random function, and employ a functional data approach toevaluate functional connectivity. Two corrections for Pearson cor-relation will be discussed in this talk and compared with the Pear-son correlation in simulations and in a resting state fMRI study thatcompares functional connectivity between normal participants andAlzheimer’s disease (AD) patients.

Optimal Estimation for the Functional Cox Model�Simeng Qu1, Jane-Ling Wang2 and Xiao Wang1

1Purdue University2University of California, [email protected]

Functional covariates are common in many medical, biodemo-graphic,and neuroimaging studies. The aim of this paper is to studyfunctional Cox models with right-censored data in the presence ofboth functional and scalar covariates. We study the asymptoticproperties of the maximum partial likelihood estimator and estab-lish theasymptotic normality and efficiency of the estimator of thefinitedimensional estimator. Under the framework of reproducingkernel Hilbert space, the estimator of the coefficient function for afunctional covariate achieves the minimax optimal rate of conver-gence under a weighted L2-risk. This optimal rate is determinedjointly by the censoring scheme, the reproducing kernel and the co-variance kernel of the functional covariates. Implementation of theestimation approach and the selection of the smoothing parameterare discussed in detail. The finite sample performance is illustratedby simulated examples and a real application.

Robust and Gaussian Adaptive Mixed Models for CorrelatedFunctional Data, with Application to Event-Related PotentialData�Hongxiao Zhu1 and Jeffrey Morris21Virginia Tech2M. D. Anderson Cancer [email protected]

Event-related potential (ERP) data is a type of functional data withcomplex hierarchical structure and spatial correlation at the lowestlevel of hierarchy. Existing analytical methods focus on known orextracted features from prespecified time-windows and multivariateanalysis, therefore could miss potentially interesting information.Motivated by ERP data in a cigarette-addiction study, we propose ageneral data analysis strategy that compares the effects of differentstimuli on the ERP curves. This new strategy relies on the Bayesianfunctional mixed models (FMMs) which flexibly capture the com-plex data structure yet yield intuitive and natural inferential sum-maries that adjust for multiple testings. In particular, we generalizethe Gaussian and the robust FMMs to incorporate channel-specificfixed effects as well as spatial correlations at the lowest level of ahierarchy design. A correlated normal-exponential-gamma (CNEG)prior is assumed for the channel-specific fixed effects, and a Maternstructure with both separable and non-separable configuration isassumed for the spatial correlation. The proposed models are fit-ted in the dual space of wavelet coefficients using discrete wavelettransform (DWT), and inference is performed in the data domain

by applying the inverse DWT to the posterior samples. The pre-diction performance of the proposed models are compared throughcomputing posterior predictive likelihoods on a validation dataset.In the posterior inference, based on the channel-specific fixed ef-fects, we are able to flag out significant regions on the contrast ef-fects using either Simultaneous Band Scores (SimBaS) or Bayesianfalse discovery rate (BFDR) based analysis, both adjusting for thefamily-wise error rate in the inherent multiple testing problem. Theapplication to the ERP data shows different degrees of similaritybetween the cigarette and the emotional stimuli during the time pe-riod of 248-700ms. Within the period, the cigarette stimulus showsmore similarity with the pleasant than with the unpleasant stimu-lus during 248-512ms, and shows similarity with both pleasant andunpleasant stimuli during 516-700ms. Prior or post the period of248-700ms, cigarette and the emotional stimuli show different ef-fects on the ERP in contrast with the neural stimulus. As comparedwith the initial results of Versace et al. (2011), our proposed ap-proach provides more refined information about when and wherethe contrast effects are significant.

Session 42: New Methodology in Spatial and Spatio-Temporal Data Analysis

Estimation of Spatial Variation in Disease Risk from UncertainLocations Using SIMEXDale ZimmermanUniversity of [email protected]

The assignment of spatial locations to addresses of subjects in aspatial epidemiologic study, through a process known as geocod-ing, typically results in positional errors. Ignoring these errors in astatistical analysis may lead to biased estimators and incorrect con-clusions. This talk explores the utility of Simulation-Extrapolation(SIMEX) methods for accounting for positional errors in the estima-tion of spatial variation in risk from geocoded locations of diseasecases and controls. The performance of SIMEX, relative to naivelyignoring the positional errors, is investigated by simulation. Themethodology is also applied to childhood asthma data from an Iowacounty.

Bayesian Estimates of CMB Gravitational LensingEthan AnderesUniversity of California, [email protected]

This talk will present a new Bayesian methodology for CMB lens-ing estimates. The quadratic estimator, developed by Hu and Oko-moto (2001, 2002), is the current state-of-the-art estimator of CMBlensing. Possibly the most promising alternative to the quadraticestimator is through the use of Bayesian methodology. Indeed,Bayesian techniques applied to the lensed CMB have the poten-tial to drastically changing the way lensing is estimated and usedfor inference. Current frequentest estimators of the lensing poten-tial treat the unlensed cosmic microwave background as a sourceof shape noise which is marginalized out. Conversely, a Bayesianlensing posterior treats the lensing potential and the unlensed cos-mic microwave background as joint unknowns, whereby obtainingscientific constrains jointly rather than marginally. Moreover, theposterior distribution is easier to interpret and sequentially updatewith additional data. From a statistical perspective, the lensing ofthe CMB is a perfect scenario for Bayesian methods in that boththe observations and the unknown lensing potential are very nearly

72 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 81: Published by: International Chinese Statistical

Abstracts

Gaussian random fields. Physicists have known, for some time, thatBayesian methods could potentially provide next-generation lens-ing estimates. However, the main obstacle for naive Gibbs im-plementations is that iterations do not converge nearly fast enoughto produce even approximate simulations. Our new results haveshown that, indeed, there does exist a practical way to obtain Gibbsiterations which converge quickly. The solution is through a re-parameterization of cosmic microwave background lensing prob-lem. Instead of treating the lensing potential as unknown we workwith inverse-lensing or an approximation we call anti-lensing. Sur-prisingly, the slowness of naive Gibbs translates to fast convergenceof the re-parameterized Gibbs chain.

Bayesian Functional Data Models for Coupling High-dimensional LiDAR and Forest Variables over Large Geo-graphic Domains�Andrew Finley1, Sudipto Banerjee2, Yuzhen Zhou1 and BruceCook31Michigan State University2University of California, Los Angeles3National Aeronautics and Space [email protected]

Recent advances in remote sensing, specifically Light Detection andRanging (LiDAR) sensors, provide the data needed to quantify for-est variables at a fine spatial resolution over large geographic do-mains. In this talk I will define several Bayesian functional spatialdata models for coupling high-dimensional and spatially indexedLiDAR signals with forest variables. The proposed modeling frame-works explicitly: 1) reduce the dimensionality of signals in an opti-mal way; 2) propagate uncertainty in parameters through to predic-tion, and; 3) acknowledge and leverage spatial dependence amongthe derived regressors and model residuals to meet statistical as-sumptions and improve prediction. Gaussian Processes (GPs) areapplied in several model components where spatial dependence canbe used to improve inference. Fitting such models requires ma-trix operations whose complexity increase in cubic order with thenumber of spatial locations resulting in a computational bottleneck.In each case, the dimensionality of the problem is tackled by re-placing the GP with its low-rank model counterpart. Two differentapproaches are explored for modeling LiDAR signals. The first con-siders a non-separable spatial covariance function to capture withinand among signal dependence, whereas the second accommodatesdependence structures via a spatial factor model. The proposedframeworks are illustrated using LiDAR and spatially coincidingforest inventory data collected on the Penobscot Experimental For-est, Maine.

Spatial Bayesian Hierarchical Model for Small Area Estimationof Categorical DataXin Wang1, Emily Berg1, �Zhengyuan Zhu1, Dongchu Sun2 andGabriel Demuth1

1Iowa State University2University of [email protected]

The National Resource Inventory (NRI) survey is a large longitudi-nal survey to assess the status and change in soil, water, and otherrelated natural resources in US. State and local stakeholders are in-terested in estimation of land cover at the county level to addresslocal resource concerns. Though the NRI survey provide reliable es-timates at the state level, the direct survey estimators at the countylevel are unreliable due to small sample sizes. In this paper, wedevelop a spatial hierarchical Bayesian model to construct small

area predictors of proportions for several mutually exclusive andexhaustive land cover classes. At the first level, the design basedestimators of the proportions are assumed to follow the GeneralizedDirichlet distribution (GD). After proper transformation, the designbased estimators is then modeled by beta regression. We consider alogit mixed model for the expectation of the beta distribution, whichincorporates covariates through fixed effects and spatial structurethrough a conditionally autoregressive (CAR) process. The covari-ates are derived from the Cropland Data Layer (CDL), a land covermap based on satellite data. The method is applied to NRI data,and the Bayesian small area estimators are shown to have smallerrelative root mean squared error than design based estimators.

Session 44: Funding Opportunities and Grant Applica-tions

Funding Opportunities and Grant Applications�Debashis Ghosh1, �Hulin Wu2 �Heping Zhang3 �Li Zhu4

1University of Colorado at Denver2University of Rochester3Yale University4National Institutes of [email protected];hulin [email protected];[email protected]; [email protected]

This is a special session aims at providing junior statisticians withfunding opportunities and strategies for successful grant applica-tions. Dr. Li Zhu, Mathematical Statistician and Prgram Director,in the Statistical Methodology and Applications Branch (SMAB),the National Cancer Institute, will talk about founding opportuni-ties and application and review procedures for biostatisticians. Pro-fessors Debashis Ghosh of Department of Biostatistics and Infor-matics, Colorado School of Public Health, University of Colorado,Hulin Wu of Department of Biostatistics and Computational Biol-ogy, University of Rochester, Heping Zhang of Department of Bio-statistics, Yale School of Public Health, will exchange their expe-riences in successfully getting research funding, such as survivalstrategies that include exploring different funding sources and fund-ing channels to support statistical methodology research and collab-oration practice.

Session 45: Advances and Case Studies for MultiplicityIssues in Clinical Trials

Confidence Intervals for Multiple Comparisons ProceduresBrian WiensPortola [email protected]

We consider confidence intervals that correspond to common mul-tiple comparisons procedures. Test-based confidence intervals (orinverted hypothesis tests) are not a new idea. In spite of some ap-pealing properties, they may be overshadowed by more recentlyproposed confidence regions. Because testing for some endpointsis dependent on results of previously tested endpoints, the testingand therefore confidence intervals are adaptive, and regions may beconcave. We discuss advantages and disadvantages, and present anexample.

Composite Endpoints - Some Common Misconceptions�David Li1 and Jin Xu2

1Pfizer Inc.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 73

Page 82: Published by: International Chinese Statistical

Abstracts

2Merck & [email protected]

A composite endpoint may be used in a clinical trial to combine theinformation from multiple components in a single outcome. Theperformance of the composite endpoint relative to the individualcomponents of the endpoint may be counterintuitive. Some exam-ples will be discussed.

Multiplicity Adjustment in Vaccine Efficacy Trial with AdaptivePopulation-Enrichment DesignShu-Chih SuMerck & Co.Shu-Chih [email protected]

Adaptive design has the flexibility allowing pre-specified modifi-cations to an ongoing trial to mitigate the potential risk associatedwith the assumptions made at the design stage. It allows studies toinclude broader target patient population and to evaluate the perfor-mance of vaccine/drug across subpopulations simultaneously. Oneof the most important statistical consideration, in adaptively de-signed clinical trials is the control of the overall study type I er-ror rate under adaptation. Our work is motivated by a Phase IIIevent-driven vaccine efficacy trial. Two target patient populationsare being enrolled with the assumption that vaccine efficacy can bedemonstrated based on the two patient subpopulations combined. Itis recognized due to the heterogeneity of the patient characteristics,the two subpopulations might respond to the vaccine differently.i.e., the vaccine efficacy in one population could be lower than thatin the other. To maximize the probability of demonstrating vaccineefficacy in at least one patient population while taking advantageof combining two populations in one single trial, an adaptive de-sign strategy with potential population enrichment is developed. Asthere is no analytic form to present the overall study type I error ratein this setting, simulations were conducted to better accommodatethe feature of population enrichment for adaptive design.

Session 46: Recent Advances in Integrative Analysis ofOmics Data

A Bayesian Model for the Identification of Differentially Ex-pressed Genes in Daphnia Magna Exposed to Munition Pollu-tants�Marina Vannucci1, Alberto Cassese1,2 and Michele Guindani21Rice University2M. D. Anderson Cancer [email protected]

This talk will introduce a Bayesian hierarchical model for the identi-fication of differentially expressed genes in Daphnia Magna organ-isms exposed to chemical compounds. The proposed model con-stitutes one of the first attempts at a rigorous modeling of the bio-logical effects of water purification. The model incorporates a vari-able selection mechanism for the identification of the differentialexpressions, with a prior distributionon the probability of a changethat accounts for the available information on the concentration ofchemical compounds present in the water. The model successfullyidentifies a number of pathways that show differential expressionbetween consecutive purification stages. We also find that changesin the transcriptional response are more strongly associated to thepresence of certain compounds, with the remaining contributing toa lesser extent.

A Bayesian Approach to Biomarker Selection through miRNA

Regulatory NetworksThierry Chekouo1, �Francesco Stingo1, James Doecke2 and Kim-Ahn Do1

1M. D. Anderson Cancer [email protected]

The availability of cross-platform, large-scale genomic data hasenabled the investigation of complex biological relationships formany cancers. Identification of reliable cancer-related biomarkersrequires the characterization of multiple interactions across com-plex genetic networks. MicroRNAs are small non-coding RNAsthat regulate gene expression; however, the direct relationship be-tween a microRNA and its target gene is difficult to measure. Wepropose a novel Bayesian model to identify microRNAs and theirtarget genes that are associated with survival time by incorporatingthe microRNA regulatory network through prior distributions. Weassume that biomarkers involved in regulatory networks are likelyassociated with survival time. We employ non-local prior distribu-tions and a stochastic search method for the selection of biomarkersassociated with the survival outcome. Using simulation studies, weassess the performance of our method, and apply it to experimen-tal data of kidney renal cell carcinoma (KIRC) obtained from TheCancer Genome Atlas. Our novel method validates previously iden-tified cancer biomarkers and identifies biomarkers specific to KIRCprogression that were not previously discovered.

Testing Differential RNA-isoform Expression/Usage�Wei Sun1, Yufeng Liu1, James Crowley1, Ting-Huei Chen2, HuaZhou3, Yichao Wu3 and Fei Zou1

1The University of North Carolina at Chapel Hill2National Institutes of Health3North Carolina State [email protected]

We have developed a statistical method named IsoDOT to assessdifferential isoform expression (DIE) and differential isoform usage(DIU) using RNA-seq data. Here isoform usage refers to relativeisoform expression given the total expression of the correspondinggene. IsoDOT performs two tasks that cannot be accomplished byexisting methods: to test DIE/DIU with respect to a continuous co-variate, and to test DIE/DIU for one case versus one control. Thelatter task is not an uncommon situation in practice, e.g., compar-ing the paternal and maternal alleles of one individual or comparingtumor and normal samples of one cancer patient. Simulation stud-ies demonstrate the high sensitivity and specificity of IsoDOT. Weapply IsoDOT to study the effects of haloperidol treatment on themouse transcriptome and identify a group of genes whose isoformusages respond to haloperidol treatment.

Session 47: New Development in Nonparametric Meth-ods and Big Data Analytics

A Nonparametric Spectral-Temporal Model for High-energyAstrophysical SourcesRaymond Wong1, Vinay Kashyap2, �Thomas Lee3 and David vanDyk41Iowa State University2Harvard-Smithsonian Center for Astrophysics3University of California, Davis4Imperial [email protected]

74 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 83: Published by: International Chinese Statistical

Abstracts

Variable-intensity astronomical sources are the result of complexand often extreme physical processes. Abrupt changes in source in-tensity are typically accompanied by equally sudden spectral shifts;i.e., in the wavelength distribution of the emission. In this workwe develop a nonparametric spectral-temporal model for such high-energy astronomical data. It includes the automatic detection ofemission lines and change points in the temporal direction. The L1penalty is applied to regularize the model fitting. The ”dimension”of the best-fitting model is chosen by a new form of the minimumdescription length principle that is designed for the ”large p smalln” scenario. This is joint work with Vinay Kashyap, David van Dykand Raymond K. W. Wong.

Multistage Adaptive Testing of Sparse SignalsWenguang SunUniversity of Southern [email protected] common feature in large-scale scientific studies is that signalsare sparse and it is desirable to narrow down the focus and iden-tify the signals in a sequential manner. In this talk, I discuss howto find a subset that virtually contains all and only signals via mul-tistage adaptive testing (MAT). At each stage, the MAT procedureaims to simultaneously eliminate noise, localize signals and electunits for the next stage analysis. We develop a sequential compounddecision-theoretic framework for multistage simultaneous inferenceand propose an MAT procedure based on the sequential probabilityratio test (SPRT). It is shown that the MAT procedure based on theSPRT controls both the false positive rate and missed discovery rateat the nominal level and minimizes the total sampling efforts.

Variable Selection for Sufficient Dimension Reduction usingWeighted Leverage ScoreWenxuan ZhongUniversity of [email protected] dimension reduction is a very important data exploratorytool in big data analysis. As the rapid development of informationtechnology, there is a high demand for novel sufficient dimensionreduction methods which can help us extract information from datawith complicated structure, particularly, sparse structure. In thistalk, we will discuss a simple variable selection strategy for esti-mating the sparse sufficient dimension reduction model based on theweighted leverage score. The weighted leverage score is a variantof the leverage score that has been widely used for the diagnostic oflinear regression. As demonstrated by our early bio-threat detectionexample, our method can quickly pin down some genomic markersthat are the reliable indicators of certain infectious disease.

Efficient Computation of Smoothing Splines via Adaptive BasisSamplingPing Ma1, �Jianhua Huang2 and Nan Zhang2

1University of Georgia2Texas A&M [email protected] splines provide flexible nonparametric regression esti-mators. However, the high computational cost of smoothing splinesfor large data sets has hindered their wide application. In this arti-cle, we develop a new method, named adaptive basis sampling, forefficient computation of smoothing splines in super-large samples.Except for the univariate case where the Reinsch algorithm is ap-plicable, a smoothing spline for a regression problem with samplesize n can be expressed as a linear combination of n basis functionsand its computational complexity is generally O(n3). We achieve

a more scalable computation in the multivariate case by evaluatingthe smoothing spline using a smaller set of basis functions, obtainedby an adaptive sampling scheme that uses values of the responsevariable. Our asymptotic analysis shows that smoothing splinescomputed via adaptive basis sampling converge to the true functionat the same rate asfull basis smoothing splines. Using simulationstudies and a large-scale deep earth core-mantle boundary imagingstudy, we show that the proposed method outperforms a samplingmethod that does not use the values of response variable.

Session 48: Trends and Innovation in Missing Data Sen-sitivity Analyses

Missing Data Sensitivity Analyses for Continuous EndpointsUsing Controlled ImputationsCraig MallinckrodtEli Lilly and [email protected]

Recent research has fostered new guidance on conducting sensi-tivity analyses in clinical trials. The controlled imputation fam-ily of sensitivity analyses, that includes reference-based and delta-adjustment methods are rapidly gaining acceptance for continuousendpoints, and work is ongoing to extend these principles and meth-ods to binary and time to event time endpoints. This session willbegin by explaining and illustrating controlled-imputation sensitiv-ity analyses for continuous endpoints, and how these methods canbe used to support inferences from the primary analysis.

Sensitivity Analysis for Time-to-event Endpoints�Bohdana Ratitch, Ilya Lipkovich and Michael O’KellyQuintiles, [email protected]

Analyses of time-to-event data can be challenged with respect totheir robustness to censored data when subjects leave the studyprior to experiencing an event of interest and prematurely withdrawfrom treatment and/or follow-up. Several approaches for analysisof continuous and categorical endpoints with non-ignorable miss-ingness (such as control-based imputation, delta-adjustment, andtipping point analysis) have been gaining acceptance in the clinicaltrial community because of their clinically meaningful and clearlyinterpretable ways to stress-test the assumption of ignorable miss-ingness. We will discuss how these strategies can be adapted forstress-testing an assumption of ignorable censoring typically usedin the time-to-event analysis of clinical trials. We will present sev-eral methods for conducting such analyses based on multiple im-putation with time-to-event data using parametric, semi-parametric,and non-parametric imputation models for survival. We will illus-trate the results using a real-world dataset and share some insightsabout performance of these methods based on a simulation study.

Analysis and Sensitivity Analysis of Incomplete CategoricalDataGeert Molenberghs1,21Universiteit Hasselt2Katholieke Universiteit [email protected]

Incomplete data are prominent in many areas of empirical research.While a lot of work has been done, many of it is in the contextof continuous data. It is helpful to review methods for incompletecategorical data. Not only analysis methods are presented, also sen-sitivity analysis methodology is reviewed. Emphasis is placed on:

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 75

Page 84: Published by: International Chinese Statistical

Abstracts

directed likelihood, inverse probability weighting, multiple imputa-tion, and pattern-mixture-based sensitivity analysis.

Session 49: Multi-Regional Clinical Trial Design andAnalysis

Design and Analysis of Multiregional Clinical Trials in Evalua-tion of Medical Devices: A Two-component Bayesian Approachfor Targeted Regulatory Decision Making�Yunling Xu and Nelson LuU.S. Food and Drug [email protected] statistical design and analysis of multiregional clinical trialsgenerally follows a paradigm where the treatment effect of interestis assumed consistent among US and OUS regions. In this presen-tation, we discuss the situations where the treatment effect mightvary among US and OUS regions, and illustrate a two-componentBayesian approach for regulatory decision making. In this ap-proach, anticipated treatment difference among US and OUS re-gions is formally taken into account, hopefully leading to increasedtransparency and predictability of local regulatory decision-making.

Assessing Benefit and Consistency of Treatment Effect under aDiscrete Random Effects Model in Multiregional Clinical Trials�Hsiao-Hui Tsou1, K. K. Gordon Lan2, Jung-Tzu Liu1, Chin-FuHsiao1, Chi-Tian Chen1 and Chyng-Shyan Tzeng3

1National Health Research Institutes2Johnson & Johnson3National Tsing Hua [email protected] recent years, developing pharmaceutical products via a multire-gional clinical trial (MRCT) has become standard. Traditionally,an MRCT would assume a fixed effects model. However, hetero-geneity among regions may have impact upon the evaluation of amedicine’s effect. In this study, we consider a random effects modelusing discrete priors (DREM) to account for heterogeneous treat-ment effects across regions for the design and evaluation of MRCTs.We derive power for a treatment is beneficial under DREM and il-lustrate determination of the overall sample size in an MRCT. Weuse the concept of consistency based on Method 2 of the JapaneseMinistry of Health, Labour and Welfare guidance to evaluate theprobability for treatment benefit and consistency under DREM. Wefurther derive an optimal sample size allocation over regions to max-imize the power for consistency. In practice, regional treatment ef-fects are unknown. Thus, we provide some guidelines on the designof MRCTs with consistency when the regional treatment effect areassumed to fall into a specified interval. Numerical examples aregiven to illustrate the applications of the proposed approach.

Multi-Regional Clinical Trials – Where We Have Been andWhere We Are GoingBruce BinkowitzMerck & [email protected] topic of multi-regional clinical trials has now matured into anactive area of research among statisticians and other disciplines.Regulators have long recognized the importance of this issue, butwe are now seeing some of the first movement toward formal guid-ances. This session will review how the topic of MRCT has grown,some of the current activities, and propose areas where MRCT re-searchers can continue to investigate.

Session 50: Biostatistics and Health Sciences

When to Initiate Combined Antiretroviral Therapy in HIV-infected Individuals to Reduce the Risk of AIDS or Severe Non-AIDS Morbidity Using Marginal Structural Model�Yassin Mazroui, Valerie Potard, Murielle Mary-Krause, OpheliaGodin and Donminique CostagliolaUniversity Pierre et Marie [email protected]

Background:The optimal CD4 cell count at which the combined an-tiretroviral therapy (cART) should be initiated is still a matter ofdebate. Clinical guidelines from the European AIDS Clinical Soci-ety recommend initiating cART when CD4 cell count has decreasedto less than 350 cells/mul, the threshold recommended by the WorldHealth Organization is 500 cells/mul, whereas U.S. guidelines rec-ommend initiating cART regardless of their CD4 cell count. Thesevariations in clinical recommendations reflect the uncertainty ofavailable evidence. The primary objective of the study was to com-pare two strategies of cART initiation: “start cART when CD4 cellcount first drop below 500 cells/muL” versus “start cART whenCD4 cell count first drop below 350 cells/muL” to reduce the riskof AIDS-defining event or severe non-AIDS defining event or death.Methods: From the FHDH-ANRS CO4 cohort, we selected therapy-naive HIV1-infected individuals included between 1997 and 2012,at 15 years of age or older, with no history of AIDS-definingevent and baseline CD4 cell counts at or above 500 cells/mul.A further requirement was an inclusion at least one year beforethe closing date. We used marginal structural models to estimatethe relative causal effect of the two strategies in the presence oftime-dependent confounders. The primary endpoint was an AIDS-defining event or severe non-AIDS defining event leading to hos-pitalization or death and the secondary endpoint was an AIDS-defining event or death. The hazard ratios were estimated in an ex-panded dataset using inverse probability weighted Cox proportionalhazards model. Stabilized weights were truncated at a maximumvalue of 10 for statistical efficiency. The method assumes that allrelevant confounders were measured. The models included the fol-lowing covariates: age (15-34,35-49, ≥50 years), sex*geographicorigin*transmission group, baseline CD4 cell count (500-599, 600-749,≥750 cells/muL), baseline HIV-1 RNA level (<10000, 10000-100000, >100000 copies/ml), baseline period (1997-2000, 2001-2003, 2004-2008, ≥2009), months from baseline to first CD4 cellcount below 500 cells/mul, and time since the beginning of follow-up (restricted cubic splines with 4 knots month).Results: A total of 9389 patients met the eligibility criteria forthe study. Among these patients, 4623 follows the strategy “CD4cell count below 500 cells/mul” experiencing 668 primary end-points (579 non-AIDS, 78 AIDS, 11 deaths) and 4607 follows thestrategy “CD4 cell count below 350 cells/mul” experiencing 851primary endpoints (676 non-AIDS, 163 AIDS, 12 deaths). Com-pared to initiating cART at the CD4 cell count threshold of 500cells/mul, the primary endpoint hazard ratio was 1.04 (0.92-1.18)for the 350 cells/mul threshold. The corresponding hazard ratio was1.27 (0.93-1.74) for the secondary endpoint.Conclusion: We did not observe a significant difference betweeninitiating cART when the CD4 cell count decreases below 500cells/mul or delaying cART initiation until the CD4 cell count de-creased below 350 cells/mul for the primary endpoint of AIDS-defining event or severe non-AIDS defining event or death. How-ever our findings tend to support cART initiation once the CD4 cellcount threshold decreased below 500 cells/mul in order to increase

76 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 85: Published by: International Chinese Statistical

Abstracts

AIDS-free survival.

Trace Elements Uptake and Effect of Two Steppic MedicinalSpecies of a Mining Area on Their Soil Trace Element Contentsversus Bulk SoilsOumeima Mebirouk1, Fatima-Zohra Afri-Mehennaoui2, SmaılMehennaoui3, Lila Sahli2 and �Oualida Rached1

1Ecole Nationale Superieure de Biotechnologie2University Constantine3University [email protected]

Two medicinal steppic species of a mining area (Artemisia herbaalba Asso and Thymus algeriensis Boiss. & Reut) appeared, in pre-vious studies, able to grow on the most polluted soils by severaltrace metals. This study aims to determine if the two species had dif-ferent effects on the intrinsic characteristics of their soils which canresult in metal accumulation differences in these ones and in planttissues. For this purpose seven stations were randomly selected inthe mining area. In each one the aerial parts of two specimens ofT. algeriensis and of A. herba alba were taken. The two speciesrhizosphere soils and adjacent bulk soils were also sampled. Cad-mium (Cd), copper (Cu), lead (Pb) and zinc (Zn) were measuredin soil and plant extracts by flame atomic absorption spectrome-ter (FAAS). Soils were also the subject of organic matter (OM),pH, electrical conductivity (EC), active limestone (AC), total cal-cium (TC) and available phosphorus (AP ) analysis. The analysisof variance (ANOVA) revealed significant differences between thetwo species soils and between those and bulk soils, from the pointof view of the pH, OM and AC contents. They result in differenteffects on their Cd, Pb and Zn soil contents and on the two speciesmetal uptake.

Study of the Effect of Trace Metals from Old Antimony Mineon Biodiversity by Stepwise Regression.�Alima Bentellis and Oualida RachedEcole Nationale Superieure de [email protected]

A previous study on the effect of pollution by trace metals of an oldmine antimony on contamination of the mining area wadi bank soilsshowed that they contained high concentrations of arsenic and anti-mony and were contaminated with other potentially toxic elements,including zinc and lead.This study aims, using the stepwise regres-sion, whether antimony, had a major effect, compared to other traceelements and/or physicochemical soil factors, on the Biodiversity ofthe mining area. Results of the linear stepwise regression analysis(forward) of the diversity index and the floristic richness based onall trace metals studied, the distance to the mine and the distance tothe road, adjusted on two soil factors of rotation, show that the ef-fect of these two factors is not significant for the diversity index andthat only the distance to the mine, and the cobalt and chromium soilcontents are involved in the diversity index prediction. On the otherside, only the soil intrinsic characteristics set has a significant effecton the floristic richness and under this effect, there are only arsenic,distance to the road, copper, zinc and cadmium which significantlyenter in its prediction. Analysis of the forward selection (forward)synthesis of the distance to the mine and soil antimony concentra-tion as a function of species presence adjusted on two edaphic fac-tors, the distance to the road and/or the distance to the mine showsthat for the two syntheses the effect of the two rotation factors andthe distance to the road is very significant. Under this effect, severalgroups of plant species emerge by their absence/presence accord-ing to their associations with the distance to the road and/or the soil

antimony concentrations.

Session 51: Recent Developments in Analyzing CensoredSurvival Data

Stacking Survival ModelsDebashis GhoshUniversity of Colorado at [email protected]

In many clinical prediction settings and other machine learning con-texts, there is interest in combining multiple models. Stacking is anapproach coined by Wolpert but which has roots in statistics dat-ing back to the early 1970s. It involves constructing combinationsof predictions from various modelling and algorithmic approaches.This approach also belies the current “super learning” approach ofVan der Laan and collaborators. In this talk, we revisit the idea ofstacking and discuss the roles of two components: residuals andprincipal components. Using a toy scenario, we are able to revealinsights in stacking that provide some guidance as to the types ofmodels that should be combined. We then discuss the approach withcensored data and illustrate the methodology with some simulatedand real data examples.

Estimation of Concordance Probability with Censored Regres-sion Models�Zhezhen Jin and Xinhua LiuColumbia [email protected]

In this talk, evaluation and comparison of various methods oftenarise in medical research. The concordance probability can be usedto assess the discriminatory power of censored regression models.In this talk, we present the estimation of concordance probabilitywith various censored regression models in the analysis of right cen-sored data.

Improving Efficiency in Biomarker Incremental Value Evalua-tion under Two-phase Study Designs�Yingye Zheng1 and Tianxi Cai21Fred Hutchinson Cancer Research Center2Harvard [email protected]

Cost-effective yet efficient designs are critical to the success ofbiomarker evaluation research. Two-phase sampling designs, un-der which expensive markers are only measured on a subsample ofcases and non-cases, are useful in novel biomarker studies for pre-serving study samples and minimizing cost of biomarkerassaying.Statistical methods for quantifying the predictiveness of biomark-ers under two-phase studies have been recently proposed (Cai andZheng, 2012; Liu and others, 2012), based on a class of inverseprobability weighted (IPW) estimators where weights are ‘true’sampling weights that simply reflect the sampling strategy of thestudy. While simple to implement, one major limitation of theseexisting IPW estimators is lack of efficiency. We investigate a vari-ety of two-phase design options and provide statistical approachesaimed at improving the efficiency of simple IPW estimators by in-corporating auxiliary information available for the entire cohort. weconsider accuracy summary estimators that accommodate auxiliaryinformation in the context of evaluating the incremental values ofnovel biomarkers over existing prediction tools and provide formaltheoretical justifications in terms of consistency and weak conver-gence. In addition, we evaluate the relative efficiency of a varietyof sampling and estimation options under both types of two-phase

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 77

Page 86: Published by: International Chinese Statistical

Abstracts

studies, to shed light on issues pertaining to both the design andanalysis of biomarker validation studies.

Nonparametric Tests of Treatment Effect for a Recurrent EventProcess that TerminatesNabihah Tayob1 and �Susan Murray21M. D. Anderson Cancer Center2University of [email protected] and terminal events are common outcomes for studyingtreatment effects in clinical studies. Existing approaches follow ei-ther a time-to-first event analysis approach or a recurrent event mod-eling approach. Recurrent event analyses are often restricted by in-dependence assumptions on gap-times between events. Althoughtime-to-first event analyses are not subject to this restriction, thisanalysis discards information that occurs beyond the initial eventand is much less powerful for detecting treatment differences. Wedevelop two new approaches for this data structure, motivated byless restrictive assumptions of time-to-first event analyses, that com-bine information from multiple follow-up intervals in determiningtreatment effects. Each approach follows behavior of short termoutcomes during pre-specified intervals over time. The first test-ing procedure pools (correlated) short term au-restricted outcomesfrom pre-specified intervals starting at times tk, k = 1, ..., b, andcompares estimated au-restricted mean survival across treatmentgroups from this combined dataset. The second procedure calcu-lates conditional au-restricted means from those at risk at timestk, k = 1, ..., b, and compares the area under a function of these bytreatment. Variances calculations, taking into account correlationof short-term outcomes within individuals, linearize random com-ponents of the test statistics following Woodruff (1971) and morerecently Williams (1995). Simulations compare the finite sampleperformance of our tests to the robust proportional rates model pro-posed by Lin et al. (2000) and the Ghosh and Lin (2000) test for re-current events subject to death. In treatment effect patterns follow-ing proportional hazards, delayed treatment effect, short durationtreatment effect and moderate duration effect the proposed methodsperform favorably when compared to existing methods. These newanalysis approaches also produce correct type I error rates whengap-times between events are correlated. The analysis approach isillustrated in data from a randomized trial of azithromycin in pa-tients with chronic obstructive pulmonary disease (COPD).

Session 52: Advances in Survey Statistics

Quantile Regression Imputation for a Survey SampleEmily Berg and �Cindy YuIowa State [email protected] study an imputation method based on quantile regression for acomplex survey design. The imputed data are simulated from anestimate of the conditional quantile function, evaluated at observedvalues of covariates. The quantile regression imputation procedurerelies on fewer assumptions than procedures that involve specifica-tion of full distributions. In a simulation, linearization and boot-strap variance estimators are evaluated, and the quantile regressionimputation method is compared to parametric fractional imputation.Informative and non-informative sample designs are considered.

Triply Robust Inference in the Presence of Missing Survey Data�David Haziza1, Valery Dongmo Jiongo2 and Pierre Duchesne11Universite de Montreal

2Statistics [email protected]

Item nonresponse is typically treated by some form of single impu-tation in statistical agencies. For example, deterministic regressionimputation that includes ratio and mean imputation within classesas special cases, is widely used in surveys. Recently, there has beenan interest in doubly robust imputation procedures. An imputationprocedure is said to be doubly robust wen the resulting imputed es-timator is consistent if either the imputation model (also called theoutcome regression model) or the nonresponse model is correctlyspecified. However, in the presence of influential units, the result-ing imputed estimator may be potentially very unstable. We pro-pose a robust version of the doubly robust imputed estimator basedon the concept of conditional bias of a unit. Implementation of theproposed method via a calibrated imputation procedure will be dis-cussed. Finally, the results from an empirical study will be shown.

Adaptive Post-stratification Using Monotonicity Constraints�Jean Opsomer, Jiwen Wu and Mary MeyerColorado State [email protected]

In large-scale government surveys, it is common that estimates aredesired for numerous small domains. Unfortunately, even for sur-veys of very large overall size, in many domains the sample size isoften too small to ensure that the direct survey estimates are suffi-ciently reliable to be released. A solution to this problem is to adap-tively collapse some neighboring domains to ensure sufficient sam-ple size in all estimation domains, but this is both time-consumingand heuristic. We propose a method to adaptively pool neighboringdomains, which is suitable for situations in which it is reasonable toassume that the estimates in the domains are monotone. This situa-tion is not uncommon practice, and we give some examples duringthe talk. The method is based on ideas from isotonic regression, andwe describe the case with a single covariate defining the domains,for which we propose a weighted version of the pooled adjacent vi-olator algorithm (PAVA). The resulting estimator is equivalent to anadaptively post-stratified estimator, with the post-strata chosen sothat the domain estimates satisfy the monotonicity constraint whileproviding weights that can be applied to any survey variable and do-main of interest. We describe the asymptotic design properties, in-cluding design consistency and asymptotic distribution, of the pro-posed domain estimator and we illustrate its finite sample behaviorin simulations. Finally, we discuss variance estimation using repli-cation methods.

Session 53: Innovative Statistical Methods in Genomicsand Genetics

Statistical Analysis of Differential Alternative Splicing usingRNA-Seq DataMingyao LiUniversity of [email protected]

RNA sequencing (RNA-seq) allows an unbiased survey of the en-tire transcriptome in a high-throughput manner. It has rapidly re-placed microarrays as the major platform for transcriptomics stud-ies. A major application of RNA-seq is to detect differential al-ternative splicing (DAS), or transcript usage, across experimentalconditions. Differential analysis at the transcript level is of greatbiological interest due to its direct relevance to protein function anddisease pathogenesis. However, DAS analysis using RNA-seq data

78 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 87: Published by: International Chinese Statistical

Abstracts

is challenging because of the difficulty of quantifying alternativesplicing and various biases present in RNA-seq data. In this talk, Iwill present several statistical issues related to the analysis of DAS.I will discuss methods for detecting DAS for both paired and un-paired data, and compare the performance of exon-based and gene-based tests of DAS. I will show simulation results as well as someexamples from real transcriptomics studies.

Integrating Auxillary Information in Complex Traits StudiesMarc Coram, Sophie Candille and �Hua TangStanford [email protected] association studies (GWAS) have become a standardapproach for identifying loci influencing complex traits. However,GWAS in non-European populations are hampered by limited sam-ple sizes and are thus underpowered. We describe an empiricalBayes approach, which improves the power for mapping complextrait loci in a minority population by adaptively integrating infor-mation from another ethnic population. Extending this approach,we discuss methods for model selection and genetic risk prediction.

Detecting Nonlinear Associations in High-throughput Data withApplications Clustering and Variable Selection�Tianwei Yu and Hesen PengEmory [email protected] expression technologies measure thousands of fea-tures, i.e. genes or metabolites etc., on a continuous scale. Insuch data, both linear and nonlinear relations exist between fea-tures. Nonlinear relations can reflect critical regulation patterns inthe biological system. Howeverthey are not identified and utilizedby traditional methods based on linear associations. We developeda sensitive nonparametric measure of general dependency between(groups of) random variables in high-dimensions. Based on this de-pendency measure, we developed new clustering methods to unravelstructures of the data, as well as a stage-forward variable selectionscheme to select features that are nonlinearly associated with thecontinuous outcome variable. We evaluated the methods using sim-ulations and real data analysis.

Hypothesis Test of Mediation Effect in Causal Mediation Modelwith High-dimensional Mediators�Yen-Tsung Huang and Wen-Chi PanBrown UniversityYen-Tsung [email protected] mediation modeling has become a popular approach forstudying the effect of an exposure on an outcome through a media-tor. However, current methods can not be applicable to the settingwith a large number of mediators. We propose a testing procedurefor mediation effects of high-dimensional mediators. We character-ize the marginal mediation effect, the multivariate component-wisemediation effects and the L2 norm of the component-wise effects,and develop a Monte-Carlo procedure for evaluating their statisticalsignificance. To accommodate the setting with a large number ofmediators and a small sample size, we furhter propose a transfor-mation model using the spectral decomposition. Under the trans-formation model, mediation effects can be estimated using a seriesof regression models with a univariate transformed mediator, andexamined by our proposed testing procedure. Extensive simulationstudies are conducted to assess the performance of our methods forcontinuous and dichotomous outcomes. We apply our methods toanalyze genomic data investigating the effect of microRNA miR-223 on the dichotomous survival status of patients with glioblas-

toma multiforme (GBM). We identify nine gene ontology sets withexpression values that significantly mediate the effect of miR-223on GBM survival.

Session 54: Recent Development in Epigenetic Research

A Hidden Markov Random Field Based Bayesian Method forthe Detection of Long-range Chromosomal Interactions in Hi-CDataZheng Xu1, Guosheng Zhang1, Fulai Jin2, Chen Mengjie1, TerryFurrey1, Patrick Sullivan1, Yun Li1, and �Ming Hu3

1The University of North Carolina at Chapel Hill2Case Western Reserve University3New York [email protected]

Motivation: Advances in chromosome conformation capture andnext-generation sequencing technologies are enabling genome-wideinvestigation of dynamic chromatin interactions. For example, Hi-C experiments generate genome-wide contact frequencies betweenpairs of loci by sequencing DNA seg-ments ligated from loci inclose spatial proximity. One essen-tial task in such studies is peakcalling, that is, detecting non-random interactions between locifrom the two-dimensional contact frequency matrix. Successfulfulfillment of this task has many important implications includingidentifying long-range interactions that assist interpreting a sizablefraction of the results from genome-wide association studies. Thetask dis-tinguishing biologically meaningful chromatin interactionsfrom massive numbers of random interactions poses great chal-lenges both statistically and computationally. Model-based methodsto address this challenge are still lacking. In particu-lar, no statisti-cal model exists that takes the underlying de-pendency structure intoconsideration. Results: In this paper we propose a hidden Markovrandom field (HMRF) based Bayesian method to rigorously modelin-teraction probabilities in the two-dimensional space based on thecontact frequency matrix. By borrowing information from neigh-boring loci pairs, our method demonstrates superior reproducibilityand statistical power in both simulation studies and real data analy-sis.

Base-resolution Methylation Patterns Accurately Predict Tran-scription Factor Bindings In Vivo�Tianlei Xu, Ben Li, Meng Zhao, Keith E. Szulwach, R. Craig Street,Li Lin, Bing Yao, Feiran Zhang, Peng Jin, Hao Wu and Zhaohui QinEmory [email protected]

Detecting in vivo transcription factor (TF) binding is important forunderstanding gene regulatory circuitries. ChIP-seq is a powerfultechnique to empirically define TF binding in vivo. However, themultitude of distinct TFs makes genome-wide profiling for them alllabor-intensive and costly. Algorithms for in silico prediction ofTF binding have been developed, based mostly on histone modifi-cation or DNase I hypersensitivity data in conjunction with DNAmotif and other genomic features. However, technical limitationsof these methods prevent them from being applied broadly, espe-cially in clinical settings. We conducted a comprehensive surveyinvolving multiple cell lines, TFs, and methylation types and foundthat there are intimate relationships between TF binding and methy-lation level changes around the binding sites. Exploiting the con-nection between DNA methylation and TF binding, we proposeda novel supervised learning approach to predict TF-DNA interac-tion using data from base-resolution whole-genome methylation se-quencing experiments. We devised beta-binomial models to charac-

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 79

Page 88: Published by: International Chinese Statistical

Abstracts

terize methylation data around TF binding sites and the background.Along with other static genomic features, we adopted a random for-est framework to predict TF-DNA interaction. After conductingcomprehensive tests, we saw that the proposed method accuratelypredicts TF binding and performs favorably versus competing meth-ods.

Statistical Analysis of Illumina HumanMethylation450 BeadAr-rays�Jie Liu and Kimberly SiegmundUniversity of Southern [email protected] methylation is a commonly studied epigenetic mark, its im-portance well-established in human development and disease. To-day, Illumina’s HumanMethylation450 arrays provide the mostcost-effective means of high-throughput DNA methylation analysis.As with other types of microarray platforms, technical artifacts are aconcern. I will introduce the Illumina HumanMethylation450 array,discuss approaches to assess data quality, and methods for signalprocessing. I will show a combination of within-array normaliza-tion steps that can improve between-array reproducibility as wellas, or better than, other published approaches. The methods willbe illustrated using replicate controls and biological samples. I willconclude with a discussion of the variety of statistical approachesapplied in DNA methylation association studies.

Differential Methylation Analysis for BS-seq Data under Gen-eral Experimental Design�Yongseok Park1 and Hao Wu2

1University of Pittsburgh2Emory [email protected] sequencing (BS-seq) has emerged as the technology ofchoice to profile DNA methylation because of its accuracy, genomecoverage, and resolution. Methods to identify differentially methy-lated loci (DML) or regions (DMRs) from BS-seq data are avail-able, but mostly designed to work for two-group comparison. Wedevelop a novel statistical model, based on a beta-binomial regres-sion model with arcsine link function, to detect DML from BS-seqdata under general experimental design. Simulation and real dataanalyses demonstrate that our method is accurate, powerful, robustand computationally efficient. The method is implemented in Bio-conductor package DSS.

Session 55: New Method Development for Survival Anal-ysis

Analysis of the Proportional Hazard Model for with SparseLongitudinal Covariates�Hongyuan Cao1, Matthew M. Churpek2, Donglin Zeng3 and Ja-son P. Fine31University of Missouri-Columbia2University of Chicago3The University of North Carolina at Chapel [email protected] analysis of censored failure observations via the pro-portional hazards model permits time-varying covariates which areobserved at death times. In practice, such longitudinal covariatesare typically sparse and only measured at infrequent and irregularlyspaced follow-up times. Full likelihood analyses of joint modelsfor longitudinal and survival data impose stringent modelling as-sumptions which are difficult to verify in practice and which are

complicated both inferentially and computationally. In this article,a simple kernel weighted score function is proposed with minimalassumptions. Two scenarios are considered: half kernel estimationin which observation ceases at the time of the event and full ker-nel estimation for data where observation may continue after theevent, as with recurrent events data. It is established that these es-timators are consistent and asymptotically normal. However, theyconverge at rates which are slower than the parametric rates whichmay be achieved with fully observed covariates, with the full kernelmethod achieving an optimal convergence rate which is superior tothat of the half kernel method. Simulation results demonstrate thatthe large sample approximations are adequate for practical use andmay yield improved performance relative to last value carried for-ward approach and joint modelling method. The analysis of the datafrom a cardiac arrest study demonstrates the utility of the proposedmethods.

Hypoglycemic Events Analysis via Recurrent Time-to-Event(HEART) ModelsHaoda FuEli Lilly and [email protected]

Diabetes affects an estimated 25.8 million people in the UnitedStates and is one of the leading causes of death.A major safetyconcern in treating diabetes is the occurrence of hypoglycemicevents. Despite this concern, the current methods of analyzing hy-poglycemic events, including the Wilcoxon rank sum test and nega-tive binomial regression, are not satisfactory. The aim of this paperis to propose a new model to analyze hypoglycemic events withthe goal of making this model a standard method in industry. Ourmethod is based on a gamma frailty recurrent event model. To makethis method broadly accessible to practitioners, this paper providesmany details of how this method works and discusses practical is-sues with supporting theoretical proofs. In particular, we make ef-forts to translate conditions and theorems from abstract countingprocess and martingale theories to intuitive and clinical meaningfulexplanations. For example, we provide a simple proof and illustra-tion of the coarsening at random condition so that the practitionercan easily verify this condition. Connections and differences withtraditional methods are discussed, and we demonstrate that undercertain scenarios the widely used Wilcoxon rank sum test and nega-tive binomial regression cannot control type 1 error rates while ourproposed method is robust in all these situations. The usefulness ofour method is demonstrated through a diabetes dataset which pro-vides new clinical insights on the hypoglycemic data.

Accelerated Intensity Frailty Model for Recurrent Events DataBo Liu1, Wenbin Lu1 and �Jiajia Zhang2

1North Carolina State University2University of South [email protected]

In this article we propose an accelerated intensity frailty (AIF)model for recurrent events data and derive a test for the varianceof frailty. In addition, we develop a kernel-smoothingbased EMalgorithm for estimating regression coefficients and the baseline in-tensity function. The variance of the resulting estimator for regres-sion parameters is obtained by a numerical differentiationmethod.Simulation studies are conducted to evaluate the finite sample per-formance of the proposed estimator under practical settings anddemonstrate the efficiency gain over the Gehan rank estimator basedon the AFT model for counting process. Our method is further il-lustrated with an application to a bladder tumor recurrence data.

80 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 89: Published by: International Chinese Statistical

Abstracts

A New Flexible Association Measure for Semi-Competing RisksData�Jing Yang and Limin PengEmory [email protected] the semi-competing risks setting, it is often of interest to assessthe impact of the non-terminalevent on the terminal event, whichmay help understand the role of some important disease landmarkto reveal the disease progression. In this work, we propose a new“impact” measure which is well tailored to the data structure ofsemi-competing risks and can accommodate the exploration of thepotential changing pattern of such impact in the identifiable region.We develop a nonparametric estimation procedure for the proposedmeasure by adopting a working quantile residual lifetime model.The estimation method can readily be extended to adjust for con-founders. We establish the asymptotic properties of the proposed es-timator and develop inferences accordingly. The proposed methodscan be implemented based on standard statistical software withoutinvolving smoothing or resampling. Our proposals are illustratedvia simulation studies and an application to real data.

Session 56: Recent Developments in Statistical LearningMethods

Multiclass Sparse Discriminant Analysis�Qing Mai1, Yi Yang2 and Hui Zou2

1Florida State University2University of [email protected] recent years many sparse linear discriminant analysis methodshave been proposed for high-dimensional classification and variableselection. However, most of these proposals focus on binary classi-fication and they are not directly applicable to multiclass classifica-tion problems. There are two sparse discriminant analysis methodsthat can handle multiclass classification problems, but their theo-retical justifications remain unknown. In this paper, we propose anew multiclass sparse discriminant analysis method that estimatesall discriminant directions simultaneously. We show that when ap-plied to the binary case our proposal yields a classification direc-tion that is equivalent to those by two successful binary sparse LDAmethods in the literature. An efficient algorithm is developed forcomputing our method with high-dimensional data. Variable selec-tion consistency and rates of convergence are established under theultrahigh dimensionality setting. We further demonstrate the su-perior performance of our proposal over the existing methods onsimulated and real data.

Composite Large Margin Classifiers with Latent Subclasses forHeterogeneous Biomedical Data�Guanhua Chen1, Yufeng Liu2 and Michael Kosorok21Vanderbilt University2The University of North Carolina at Chapel [email protected] dimensional classification problems are prevalent in a widerange of modern scientific applications. Despite alarge number ofcandidate classification techniques available to use, practitioners of-ten face a dilemma of the choice between linear and general nonlin-ear classifiers. Specifically, simple linear classifiers have good inter-pretability, but may have limitations in handling data with complexstructures. In contrast, general nonlinear kernel classifiers are moreflexible but may lose interpretability and have higher tendency for

overfitting. In this paper, we consider data with potential latent sub-groups in the classes of interest. We propose a new method, namelythe Composite Large Margin Classifier (CLM) to address the issueof classification with latent subclasses. The CLM aims to find threelinear functions simultaneously: one linear function to split the datainto two parts, with each part being classified by a different lin-ear classifier. Our method has comparable prediction accuracy to ageneral nonlinear kernel classifier and it maintains the interpretabil-ity of traditional linear classifiers. We demonstrate the competitiveperformance of the CLM through comparisons with several existinglinear and nonlinear classifiers by Monte Carlo experiments. An-alyzing Alzheimer’s disease classification problem using CLM notonly provides lower classification error in discriminating cases andcontrols, but also identifies subclasses in controls which are morelikely to develop into disease in the future.

Positive Definite Regularized Estimation of Large CovarianceMatrices�Lingzhou Xue1, Shiqian Ma2 and Hui Zou3

1Pennsylvania State University2The Chinese University of Hong Kong3University of Minnesota

[email protected]

The regularized covariance estimator has nice asymptotic proper-ties for estimating large covariance matrices, but it often has nega-tive eigenvalues when used in real data analysis. To simultaneouslyachieve desired low-dimensional structure and positive definiteness,we develop a unified positive definite regularized covariance estima-tor for estimating large covariance matrices. An efficient alternat-ing direction method of multipliers is derived to solve the challeng-ing optimization problem and its convergence properties are estab-lished. Under weak regularity conditions, non-asymptotic statisticaltheory is also established for the proposed estimator. The compet-itive finite-sample performance of our proposal is demonstrated byboth simulation and real applications.

Feature Selection Utilizing the Whole Solution Path

Yang Liu1 and �Peng Wang2

1Bowling Green State University2University of Cincinnati

[email protected]

The performances of penalized likelihood approaches profoundlydepend on the selection of the tuning parameter, however there hasnot been a common agreement on the criterion for choosing the tun-ing parameter. Moreover, penalized likelihood estimation based ona single value of the tuning parameter would suffer from severaldrawbacks. In this project, we introduces a novel approach for fea-ture selection based on the whole solution path rather than choosingone value for the tuning parameter, which significantly improves theselection accuracy. Moreover, it allows for feature selection usingridge or other strictly convex penalties. The key idea is to classifythe variables as relevant or irrelevant for each tuning parameter andthen select all the variables which have been classified as relevantat least once. We establish the theoretical properties of the method,and illustrate the advantages of the proposed approach with simula-tion studies and a data example.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 81

Page 90: Published by: International Chinese Statistical

Abstracts

Session 57: Recent Developments on High-DimensionalInference in Biostatistics

Profiling and Accounting for Heterogeneity in the Analysis ofCancer Sequencing DataMengjie ChenThe University of North Carolina at Chapel [email protected] cancer genome is characterized by genetic heterogeneity thatis seen across tumor types, among samples of a particular type andwithin an individual tumor. This heterogeneity has posed great chal-lenges in both cancer treatment and basic cancer biology research.Recent advances of next-generation sequencing technology providethe opportunity to scrutinize the cancer genome at single base-pairresolution. Harnessing the statistical properties of the sequencingdata enables us to profile the genetic heterogeneity of each tumor,which potentially generates unprecedented insights into the devel-opment of cancers. I’ll introduce how to infer purity and clon-ality from different cancer sequencing data, including whole ex-ome/genome sequencing and targeted sequencing.

Optimal Detection of Weak Positive Dependence between TwoMixture Distributions�Sihai Zhao1, Tony Cai2 and Hongzhe Li21University of Illinois at Urbana-Champaign2University of [email protected] paper studies the problem of detecting dependence betweentwo mixture distributions, motivated by questions arising from sta-tistical genomics. The fundamental limits of detecting weak posi-tive dependence are derived and an oracle test statistic is proposed.It is shown that for mixture distributions whose components arestochastically ordered, the oracle test statistic is asymptotically op-timal. Connections are drawn between dependency detection andsignal detection, where the goal of the latter is to detect the presenceof non-null components in a single mixture distribution. It is shownthat the oracle test for dependency can also be used as a signal de-tection procedure in the two-sample setting, and there can achievedetection even when detection using each sample separately is prov-ably impossible. A nonparametric data-adaptive test statistic is thenproposed, and its closed-form asymptotic distribution under the nullhypothesis of independence is established. Simulations show thatthe adaptive procedure performs as well as the oracle test statistic,and that both can be more powerful than existing methods. In anapplication to the analysis of the shared genetic basis of psychiatricdisorders, the adaptive test is able to detect genetic relationships notdetected by other procedures.

Multiple Testing for Conditional Dependence by Quantile-Based Contingency Table�Jichun Xie1 and Ruosha Li21Duke University2The University of Texas School of Public [email protected] chi-square test is a popular tool for testing dependencebetween two categorical or ordinal variables in the frameworkof contingency table, where the categorical boundaries are pre-determined. In this paper, we discuss another type of contingencytables, called quantile-based contingency tables, where the categor-ical boundaries are estimated quantiles depending on other covari-ates. This type of tables can be used to test general dependencebetween two continous variables conditioning on other covariates.After organizing data using the quantile-based contingency table,

we prove that the null distribution of the Pearson statistic is still thechi-square distribution, but its degree of freedom only depends onthe dimension of the table. Furthermore, when the number of vari-ables is much greater than the sample size, we propose a multipletesting method to test whether each pair of variables is dependentconditioning on covariates. The multiple testing method asymp-totically controls the false discovery rate at the desired level. Inaddition to the theoretical analysis, we perform numerical studiesto compare the performance of the proposed test and other com-petitive tests. The proposed test is both robust and powerful undervarious settings. We demonstrate the effectiveness of the test by alarge-scale genomic data analysis.

Spurious Discoveries for High-dimensional DataJianqing Fan1, Qi-Man Shao2 and �Wen-Xin Zhou1

1Princeton University2The Chinese University of Hong [email protected]

Over the last two decades, many exciting variable selection meth-ods have been developed for finding a small group of covariatesthat are associated with the response from a large pool. Can thediscoveries by such data mining approaches be spurious? Can ourfundamental assumptions on exogeneity of covariates needed forsuch variable selection be validated with the data? To answer thesequestions, we need to derive the distributions of the maximum spu-rious correlations given certain number of predictors. When thecovariance matrix of covariates possesses the restricted eigenvalueproperty, we derive such distributions, using Gaussian approxima-tion and empirical process techniques. However, such a distributiondepends on the unknown covariance matrix of the covariate. Hence,we propose a multiplier bootstrap method to approximate the un-known distributions and establish the consistency of such a simplebootstrap approach. The results are further extended to the situationwhere residuals are from regularized fits. Our approach is then ap-plied to construct the upper confidence limit for the maximum spu-rious correlation and testing exogeneity of covariates. The formerprovides a baseline for guiding false discoveries due to data miningand the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniquesand results are illustrated by both numerical examples.

Session 58: Blinded and Unblinded Evaluation of Aggre-gate Safety Data during Clinical Development

Continuous Safety Signal Monitoring with Blinded Data�Greg Ball1 and William Wang2

1AbbVie Inc.2Merck & [email protected]

Concerns over product safety have resulted in late stage programfailures and market withdrawals. This has focused more interest andattention paid to aggregate safety analyses during clinical develop-ment. The Code of Federal Regulations (CFR) requires a safety re-port whenever aggregate analysis indicates that “events occur morefrequently in the drug treatment group than in a concurrent controlgroup”. The FDA guidance on Safety Reporting Requirements forINDs and BA/BE Studies amplifies the CFR in asserting that a “sys-tematic approach for safety surveillance... should include a processfor reviewing, evaluating and managing accumulating safety datafrom the entire clinical trial database at appropriate intervals”. Anindependent group, whether an external DMC or an internal safety

82 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 91: Published by: International Chinese Statistical

Abstracts

team, should use all of the available data, including data from out-side of the clinical trials in assessing the safety of the developingproduct. Any suspected adverse reactions should be evaluated rel-ative to other events in ongoing studies as well as previous stud-ies. Emerging trends in clinical trial data must be frequently andcarefully evaluated to protect the safety and well-being of patients.Data Monitoring Committees (DMCs) evaluate accumulating un-blinded data and make recommendations about the continuing safeconduct of trials. Trial sponsors must decide how best to implementthese recommendations and whether or not to stop a trial. How-ever, while DMC recommendations are based on differences ob-served between treatment arms, sponsors are generally blinded tothese differences, adding considerable uncertainty to the decision-making process. Our blinded safety signals can provide a full mea-sure of blinded data that trial leadership can use, in combinationwith open information from the DMC, to evaluate the strength ofevidence contained in the data in order to make properly informeddecisions that protect patients from unnecessary harm while allow-ing trials to lead to conclusive results.Safety monitoring is an essentially dynamic process which requiresthe flexibility of a likelihood based method to respond to unforeseendevelopments. We use a Bayesian approach to provide a unifiedframework for continuous safety signal monitoring with all of theavailable blinded information in order to produce probability state-ments that are easy to interpret. A moderately informative prioris used to regulate influence for a signal. Designing useful safetysignals requires an active collaboration between Statistics, Pharma-covigilance and Clinical. We have evaluated this method in a re-cently completed Phase 2b study and are currently working togetherto implement it in a clinical trial program. Our goal is to formalizeand improve conversations about safety signal monitoring amongall stake-holders in the conduct of randomized clinical trials.This presentation was sponsored by AbbVie. AbbVie contributedto the design, research, and interpretation of data, writing, review-ing, and approving of the presentation. Greg Ball is an employee ofAbbVie, Inc.

How Should the Final Rule Affect DMCs?Janet WittesStatistics [email protected]

The FDA’s Final Rule on Safety Reporting changes the way spon-sors of drugs report serious adverse events to the FDA. Many phar-maceutical companies have already changed their internal processesto comply with the new rule. DMCs have traditionally been very re-luctant to report unblinded data to sponsors unless the data showclear enough evidence of specific harms to require a change in theInformed Consent Form. If sponsors are to comply with the FinalRule, DMCs will have to change their behavior and consider report-ing more frequently to sponsors when there sufficient evidence ofcausality arises during the course of a trial it is monitoring. Thistalk addresses those aspects of the Final Rule that affect DMCs andproposes approaches a DMC might use in deciding what seriousadverse events it should report unblinded to the sponsor during thecourse of a trial.

Implementation of the Investigational New Drug Safety Report-ing Requirements “Final Rule”Brenda CroweEli Lilly and [email protected]

In 2010, the Food and Drug Administration(FDA) issued legislation

(the “Final Rule”*) addressing the safety reporting requirements forinvestigational new drug applications (INDs). It changed how com-panies were to assess serious adverse events with respect to whichshould be reported in an expedited fashion to FDA. I will discusshow processes that were already in place helped us implement theFinal Rule. I will share some high-level case examples and discusssound statistical methodologies, as well as the statistical leadershipthat it took to get them implemented.* Investigational New Drug Safety Reporting Requirements for Hu-man Drug and Biological Products and Safety Reporting Require-ments for Bioavailability and Bioequivalence Studies in Humans

Session 59: Design and Analysis of Non-Inferiority Clini-cal Trials

Some Comments on the Three-Arm Non-inferiority Trial De-sign�Ming Zhou and Sudeep KunduBristol-Myers Squibb [email protected]

In non-inferiority clinical trials, when it is ethically justifiable, aplacebo arm is often included in addition to the experimental ther-apy and active comparator/reference therapy. This leads to a three-arm non-inferiority trial. Although there’s an extensive literature onthe design of three-arm non-inferiority trials, interesting questionsand practical considerations can still arise when it comes to clinicaltrial planning. In this paper, we discuss the following two aspectsof a gold standard three-arm non-inferiority trial design. Firstly, weexamine the optimal sample size allocation based on overall powerinstead of single step power. We then discuss the procedure givenin Rothman et al. (2011) for ”capturing all possibilities”, and showthat it can be greatly simplified. Lastly, we compare the frequen-tist multiple-testing approaches and the Bayesian approach for thesimplified procedure.

Non-inferiority Tests for Prognostic Models�Ning Xu and Yongzhao ShaoNew York [email protected]

In this talk, we will discuss issues of testing for non-inferiority incomparing prognostic models and prognostic factors. Both logis-tic regression based and Cox proportional hazards regression basedprognostic models are considered. A mixture model of survival out-come with latent case-control group information is also considered.Utilities of prognostic factors or prognostic models can be comparedusing common concordance measures.

Session 60: Toward More Effective Identification ofBiomarkers and Subgroups for Development of TailoredTherapies

Confidence Intervals for Assessing SNP Effects on TreatmentEfficacy�Jason Hsu1, Ying Ding2, Grace Li3 and Steve Ruberg3

1The Ohio State University2University of Pittsburgh3Eli Lilly and [email protected]

An important decision in personalized medicine development iswhether the new treatment should target a subgroup of the patients.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 83

Page 92: Published by: International Chinese Statistical

Abstracts

Testing Single Nucleotide Polymorphisms (SNPs) for use as poten-tial biomarkers in drug development is more complex than in tradi-tional case-control GWAS, because the drug development processtypically involves comparing a new treatment with a control. Weshow, in fact, a SNP’s effect on treatment efficacy cannot simply becharacterized by the traditional definitions of dominant, recessive,and additive effects, and popular current techniques for SNP testingoffer no protection against targeting wrong subgroups. Proposinga new formulation, we provide simultaneous confidence intervalsfor the four possible SNP effects relevant to making sound decisionon which patient subgroup to target. Our simultaneous confidencelevel guarantee is equivalent to strong control of familywise errorrate (FWER) in multiple testing within each SNP. To account formultiplicity of the SNPs, we use a thresholding technique that con-trols the per family error rate, the expected number of SNPs falselyinferred to be predictive of efficacy, while taking dependence acrossthe SNPs into account. Our novel combination error rate control,familywise error rate control within each SNP and per family errorrate control across the SNPs, is rigorous and has a clear practical in-terpretation. Across different SNP studies, the expected number ofSNPs with incorrectly inferred target subgroup is controlled. Suchcontrol is appropriate in a drug development environment, as it al-lows flexibility in the exploration of multiple candidate SNPs, whilebeing confident in the patient subgroup to target in any selected SNP.

Identification of Biomarker Signatures Using Adaptive ElasticNet�Xuemin Gu1, Lei Shen2 and Yaoyao Xu3

1Bristol-Myers Squibb Company2Eli Lilly and Company3AbbVie [email protected]

A new method for Biomarker/Subgroup Identification has been de-veloped, utilizing penalized regression in a two-step procedure.Specifically, adaptive elastic net is used for effective variable se-lection, and data splitting enables the calculation of valid p-valuesand analytical control of error rate. Two of the main advantages ofthis method are its ability to handle a large number of biomarkers,and identification of a signature based on multiple biomarkers thatcan have much stronger effect than its individual components.

Correcting Ascertainment Bias in Biomarker IdentificationShengchun KongPurdue [email protected]

In biomarker studies, associations between clinical phenotype andbiomarkers are of interest. After detecting significant association,the estimated effect size is used to make important decisions, such asplanning of the replication study. We consider multiplicity issue inestimation where upward bias exists in the estimated effect size foran inferred association. Efforts to replicate findings often fail dueto the overestimation of the effect. This ascertainment bias is alsoknown as the winner’s curse. We investigated different approachesto correct the ascertainment bias including resampling methods andempirical Bayes’s approach. Simulations are conducted to comparethese approaches.

Analysis Optimization for Biomarker and Subgroup Identifica-tionLei ShenEli Lilly and Companyshen [email protected]

A number of statistical methods have been proposed to identify sub-groups of patients with enhanced treatment effect using data fromclinical trials. While post hoc subgroup analyses have long beenconducted for clinical trials, the more recent methods have been de-signed to proactively identify subgroups to enable the developmentof tailored therapies. With an increasing number of available op-tions, the a priori selection of subgroup identification method for aparticular trial requires consideration of pertinent information, in-cluding prior knowledge and project needs, as well as a quantitativeprocess to optimize the analytical method. The situation becomeseven more complex when jointly considering two sequential trials,which often are realistically required for the identification and con-firmation of subgroup findings. In this presentation, I present gen-eral considerations and a case study of analysis optimization to de-termine statistical methods for biomarker and subgroup identifica-tion in this type of applications.

Session 61: Design and Analysis Issues in Clinical Trials

Nonparametric Response Adaptive Randomization ProceduresZhongqiang Liu1 and �Feifang Hu2

1Renmin University of China2The George Washington [email protected]

In literature, many response-adaptive randomization procedureshave been proposed and extensively studiedin the past decades.However, Most of these procedures are based on parametric struc-ture and do not usually apply to nonparametric situations. In thistalk, we propose a new family of response-adaptive randomizationbased the p-values of corresponding hypothesis tests. Thus, the pro-posed procedures apply to both parametric and non-parametric sit-uations. Under widely satisfied conditions, we derive the asymp-totic properties of the procedures and further obtain the power func-tion under non-parametric settings.The proposed procedures are: (i)more powerful; (ii) more robust; and (iii) more ethical than the clas-sical RAR procedures under some situations. The advantages arealso illustrated in numerical studies.

Ecological Momentary Assessment for Measuring Outcome inClinical TrialsStephen RathbunUniversity of [email protected]

Ecological Momentary Assessment (EMA) is a research method inthe behavioral sciences focused on the collection of subjects’ cur-rent psychological states in their everyday environments. EMA isparticularly well suited for patient-reported outcomes related to sub-jective symptoms, quality of life and emotional states. It has oftenbeen used to investigate recurrent addictive behavioral events suchas the use of tobacco. In this paper we will first review methodssuitable for EMA data and for fitting fixed and mixed effects modelspredicting risk as a function of partially observed time-varying co-variates. These methods are based on covariates sampled over timeaccording to a known probability-based sampling design, as wellas a random sample of the recurrent events. The objective of thepaper, however, is to construct optimal EMA sampling designs thatbalance minimization of the standard errors of parameter estimatorsagainst burden placed on study subjects. Time-varying covariatesare sampled according to a self-correcting point process with pa-rameters that control the mean number of covariate samples per dayas well as the distribution of inter-arrival times of covariate samples.Covariates are also sampled at a random subset of the events using

84 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 93: Published by: International Chinese Statistical

Abstracts

a thinning function that also controls the number of events sampledper day and the distribution of times between sampled events. Opti-mal design calls for increased sampling intensity when subjects areeither at high or low risk of the event of interest. Our approach willbe illustrated using data from an EMA of smoking and an EMA ofdietary lapse.

Robust Zero-Inflated Poisson/Negative Binomial Regression forOver-Dispersed Count Data�Yichen Qin1, Ye Shen2 and Yang Li31University of Cincinnati2University of Georgia3Renmin University of [email protected] this article, we introduce a new robust estimation procedurefor Zero-Inflated Poisson/Negative Binomial Regression. We showthat, by robustifying the likelihood function, we can successfully al-leviate the influence of over-dispersion and model assumption vio-lation such as heavy tails. In addition, we introduce the zero-inflatedcomponent in the model to handle the excess zero-count in the data.An EM algorithm is also introduced for estimating such a model.Through simulation and real data, we demonstrate the effectivenessof our method and its advantage over the traditional approach.

Joint Modeling Tumor Burden and Time to Event Data in On-cology Trials�Ye Shen1, Aparna Anderson2, Riwik Sinha3 and Yang Li41University of Georgia2Bristol-Myers Squibb Company3Adobe Research India Labs4Renmin University of [email protected] tumor burden (TB) process is postulated to be the primarymechanism through which most anticancer treatments provide ben-efit. In phase II oncology trials, the biologic effects of a therapeu-tic agent are often analyzed using conventional endpoints for bestresponse, such as objective response rate and progression-free sur-vival, both of which causes loss of information. On the other hand,graphical methods including spider plot and waterfall plot lack anystatistical inference when there is more than one treatment arm.Therefore, longitudinal analysis of TB data is well recognized asa better approach for treatment evaluation. However, longitudinalTB process suffers from informative missingness because of pro-gression or death. We propose to analyze the treatment effect on tu-mor growth kinetics using a joint modeling framework accountingfor the informative missing mechanism. Our approach is illustratedby multisetting simulation studies and an application to a non-smallcell lung cancer data set. The proposed analyses can be performedin early-phase clinical trials to better characterize treatment effectand thereby inform decision-making.

Session 62: Statistical Challenges in Economic ResearchInvolving Medical Costs

Projecting Survival and Lifetime Costs from Short-Term Smok-ing Cessation TrialsDaniel Heitjan1,2

1Southern Methodist University2The University of Texas Southwestern Medical [email protected] smoking is a leading potentially preventable cause of mor-bidity and mortality. Therefore there has been considerable interest

in the development of treatments – both behavioral and pharmaco-logic – to assist smokers to conquer their nicotine addiction. As thenumber of smokers is large – roughly 20% of the US population, andmore in many other countries – it is critical to evaluate not only theeffectiveness of such treatments, but also their costs. The fact thatmany smokers will not experience the major health effects of theiraddiction for many decades substantially complicates this evalua-tion. Moreover, clinical trials of smoking cessation treatments com-monly take as their primary endpoint a short-term outcome such assix-month quit rate. Thus the evaluation of the cost-effectiveness ofsmoking cessation treatment, in its usual form of cost per life-yearsaved, involves a considerable extrapolation. In this talk I will de-scribe a micro-simulation model that projects early quit rates andtreatment costs into survival and long-term cessation-related costsfor a general smoking population.

A Flexible Model for Correlated Medical Costs, with Applica-tion to Medical Expenditure Panel Survey DataJinsong Chen1, �Lei Liu2, Tina Shih3, Daowen Zhang4 and ThomasSeverini21University of Illinois, Chicago2Northwestern University3M. D. Anderson Cancer Center4North Carolina State [email protected]

We propose a flexible model for correlated medical cost data withseveral appealing features. First, the mean function is partially lin-ear. Second, we do not specify the distributional form for the re-sponse. Third, the covariance structure of medical costs has a semi-parametric form. We use the extended generalized estimating equa-tions to simultaneously estimate all parameters of interest. B-splineis used to estimate unknown functions, and a modification to AkaikeInformation Criterion is proposed for selecting the knots in splinebases. Simulation study is conducted to assess the performance ofour method. Finally, we apply the model to correlated medical costsin the Medical Expenditure Panel Survey (MEPS) dataset.

A Bivariate Copula Random-Effects Model for Length of Stayand CostXiaoqin Tang1, Zhehui Luo2 and �Joseph Gardiner21Allegheny Health Network2Michigan State [email protected]

Copula models and random effect models are becoming increas-ingly popular for modeling dependencies or correlations betweenrandom variables. Recent applications include in such fields aseconomics, finance and insurance, and survival analysis. We givea brief overview of the principles of construction of such cop-ula models from the Farlie-Gumbel-Morgenstern, Gaussian, andArchimedean families including Frank, Clayton, and Gumbel fami-lies. We develop a new flexible joint model for correlated measure-ment errors modeled by copulas and incorporate a cluster level ran-dom effect to account for individual and within-cluster correlationssimultaneously. In an empirical application our proposed approachattempts to capture the various dependence structures of hospitallength of stay and cost (symmetric or asymmetric) in the copulafunction. It takes advantage of the relative ease in specifying themarginal distributions and introduction of within-cluster correlationbased on the cluster level random effects.

Nonparametric Inference for the Joint Distribution of Recur-rent Marked Variables and Recurrent Survival TimeLaura Yee and �Gary Chan

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 85

Page 94: Published by: International Chinese Statistical

Abstracts

University of [email protected] between recurrent medical events may be correlated with thecost incurred at each event. We discuss a nonparametric estimatorfor the joint distribution of recurrent events and recurrent medicalcosts in right-censored data. We also derive the asymptotic varianceof our estimator, a test for equality of recurrent marker distributions,and present simulation studies to demonstrate the performance ofour point and variance estimators. Our estimator is shown to per-form well for a wide range of levels of correlation, demonstratingthat our estimators can be employed in a variety of situations whenthe correlation structure may be unknown in advance. We apply ourmethods to hospitalization events and their corresponding costs inthe second Multicenter Automatic Defibrillator Implantation Trial(MADIT-II), which was a randomized clinical trial studying the ef-fect of implantable cardioverter-defibrillators in preventing ventric-ular arrhythmia.

Session 63: Adaptive Design and Sample Size Re-Estimation

Methods for Flexible Sample-Size Design in Clinical Trials�Gang Li1, Weichung Shih2 and Yining Wang1

1Johnson & Johnson2Rutgers [email protected] size plays a crucial role in clinical trials. Flexible sample-size designs, as a part of the more general category of adaptive de-signs that utilizes interim data from the current trial, have been apopular topic in recent years. In this paper, we give a comparativereview of four related methods for such a design. The likelihoodmethod uses the likelihood ratio test with an adjusted critical region.The weighted method adjusts the test statistic with given weightsrather than the critical region. The dual test method requires boththe likelihood ratio statistic and the weighted statistic to be in theunadjusted critical region. The promising zone approach uses thelikelihood ratio statistic with the unadjusted region with other con-straints. All four methods preserve the type-I error rate. We exploretheir properties and compare their relationships and merits. We de-lineate what is necessary to specify in the study protocol to ensurethe validity of the statistical procedure and what can be kept implicitin the protocol so that more flexibility can be attained for confirma-tory phase III trials in meeting regulatory requirements.

Blinded Sample Size Re-estimation in Trials with Survival Out-comes and Incomplete InformationThomas CookUniversity of [email protected] many large multicenter clinical trials, especially in cardiology,the primary outcome is a composite of fatal and non-fatal events.Study power is determined by the total number of subjects with atleast one primary event and the sample size and duration of follow-up are selected to achieve the target number of events given an as-sumed underlying survival distribution. Furthermore, potential pri-mary events typically require adjudication by an independent eventclassification committee (ECC). As the study progresses, informa-tion becomes available to assess these design assumptions, allowingblinded adjustments to be made to both sample size and study du-ration. This assessment is complicated by two factors, however.First, delays in the reporting of potential primary events results

in incomplete ascertainment of potential events, and unless prop-erly accounted for, can lead to an underestimation of the primaryevent rate. Second, ECC review be incomplete for many reportedevents. In this talk I will describe an estimator that simultaneouslyaccounts for both delayed ascertainment and incomplete adjudica-tion and show how this estimator can be used to make the desireddesign modifications.

SMART with Adaptive Randomization�Ken Cheung1, Bibhas Chakraborty2 and Karina Davidson1

1Columbia [email protected]

Implementation study is an important tool for deploying state-of-the-art treatments from clinical trials into a treatment program, withthe dual goals of learning about effectiveness of the treatments andimproving the quality of care for patients enrolled into the program.In this talk, I will introduce a SMART-based methodology to opti-mize a treatment program of dynamic treatment regimens (DTRs)for patients with depression post acute coronary syndrome. Theproposed method involves a novel application of adaptive random-ization aimed to address three main concerns of an implementationstudy: it allows incorporation of historical data or opinions, it in-cludes randomization for learning purposes, and it aims to improvecare via adaptation throughout the program. By simulation, we il-lustrate that the inputs from historical data are important for theprogram performance measured by the expected outcomes of theenrollees, but also show that the adaptive randomization scheme isable to compensate poorly specified historical inputs by improvingpatient outcomes within a reasonable horizon. The simulation re-sults also confirm that the proposed design allows efficient learningof the treatments by alleviating the curse of dimensionality.

Session 64: Recent Development in PersonalizedMedicine and Survival Analysis

Estimating the Optimal Dynamic Treatment Regime from aClassification Perspective: C-learningBaqun Zhang1 and �Min Zhang2

1Renmin University of China2University of [email protected]

Personalized medicine, which is focused on making treatment de-cisions on individuals based on his/her own available information,has received much attention lately. Treatment of chronic disease of-ten involves a series of decisions on treatment at multiple pointsand personalizing medicine can be formalized using the conceptof dynamic treatment regimes. A dynamic treatment regime is aset of sequential decision rules that determine how to treat a pa-tient over time using patient’s information available at each decisionpoint and the optimal dynamic treatment regime is the one that leadsto the most favorable outcome on average if followed by the patientpopulation. Currently, the two main approaches for identifying theoptimal dynamic treatment regime are Q- and A-learning, where Q-learning involves modeling for the outcome and A-learning involvesmodeling for part of the outcome (eg, treatment contrast) and fortreatment assignment. A key concern for Q-and A-learning is modelmisspecification. Recently, Zhang et al. (2013) proposed a methodbased on maximizing a doubly robust augmented inverse probabil-ity weighted estimator (AIPWE) for population mean outcome overa restricted class of regimes and this method has been shown to be

86 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 95: Published by: International Chinese Statistical

Abstracts

more robust to model misspecification. In practice, how to choose arestricted class of regimes under consideration may not be easy. Inthis study, we propose to recast the problem of identifying the op-timal treatment regime into a classification problem and identifyingthe optimal treatment regime is equivalent to minimizing a weightedclassification error, for which many existing and powerful machinelearning methods can be used. This method enjoys the robustnessproperty of the Zhang et al. (2013) method by using a doubly robustAIPWE estimator for the treatment contrast function as well and, inaddition, this framework is much more flexible.

Parsimonious and Robust Treatment Strategies for Target Pop-ulations Using Clinical Trial Data�Yingqi Zhao1 and Donglin Zeng2

1University of Wisconsin-Madison2The University of North Carolina at Chapel Hill

[email protected]

Individualized treatment rules, which recommend treatments basedon individual patient characteristics, have gained increasing inter-est in clinical practice. Properly planned and conducted random-ized clinical trials are ideal for constructing individualized treatmentrules. However, it is often a concern that they are susceptible to lackof representativeness, which limits the applicability of the derivedrules to a future large population. Furthermore, in order to informclinical practice, it is crucial to provide rules that are easy to in-terpret and disseminate. To tackle these issues, using data from asingle clinical trial study, we propose a two-stage procedure to de-rive the best parsimonious rule to maximize the proportion of futurepatients receiving their optimal treatments. The procedure is robustover a wide range of possible covariate distributions in the targetpopulation, with minimal requirements on the mean and covarianceof the patients who benefit from each treatment. The practical utilityand the favorable performance of the methodology are demonstratedusing extensive simulations and a real data application.

A Sieve Semiparametric Maximum Likelihood Approach forRegression Analysis of Bivariate Interval-censored FailureTime Data�Qingning Zhou1, Tao Hu2 and Jianguo Sun1

1University of Missouri-Columbia2Capital Normal University

[email protected]

Interval-censored failure time data arise in a number of fields andmany authors have discussed various issues related to their analysis.However, most of the existing methods are for univariate data andthere exists only limited research on bivariate data, especially on re-gression analysis of bivariate interval-censored data. We present aclass of semiparametric transformation models for the problem andfor inference, a sieve maximum likelihood approach is developed.The model provides a great flexibility, in particular including thecommonly used proportional hazards model as a special case, andin the approach, Bernstein polynomials are employed. The strongconsistency and asymptotic normality of the resulting estimators ofregression parameters are established andfurthermore, the estima-tors are shown to be asymptotically efficient. Extensive simulationstudies are conducted and indicate that the proposed method workswell for practical situations.

Session 65: New Strategies to Identify Disease AssociatedGenomic Biomarkers

Discovering Disease Associated Molecular Interactions UsingDiscordant CorrelationCharlotte Siska and �Katerina KechrisUniversity of Colorado at [email protected]

A common approach for identifying molecular features (such astranscripts or proteins) associated with a biological perturbation ordisease is testing for differential expression or abundance in -omicsdata. However, this approach is limited for studying interactions be-tween molecular features, which would give a deeper knowledge ofthe relevant molecular systems and pathways. As an alternative, dif-ferentially correlated pairs of features can be identified that changecorrelation based on groups of samples (e.g., wildtype or mutant)or subjects (e.g. cases or controls). We have developed a methodfor this purpose called the Discordant method, which determinesthe posterior probability that a pair of features has discordant cor-relation between phenotypic groups using mixture models and theEM algorithm. We compare our method to existing approaches;one that uses Fisher’s transformation and another that uses an Em-pirical Bayes joint probability model. In simulations we demon-strate that while all the methods have similar specificity, the Discor-dant method has better sensitivity and is more powerful at iden-tifying pairs that have a correlation coefficient close to 0 in onegroup and a largely positive or negative correlation coefficient inthe other group. Using glioblastoma data from The Cancer GenomeAtlas (TCGA), which has matched samples between miRNA andmRNA, we find that the Discordant method finds relatively moreglioblastoma-related miRNAs compared to other methods. The sim-ulations and TCGA data results indicate that the Discordant methodis beneficial for identifying interactions associated with phenotypicgroups or disease severity.

Joint Analysis of Genomic Data from Different Sources usingKernel Machine Regression with Multiple Kernels�Michael Wu and Ni ZhaoFred Hutchinson Cancer Research [email protected]

Comprehensive understanding of complex trait etiology requires ex-amination of multiple sources of genomic variability. Integrativeanalysis of these data sources promises elucidation of the biologi-cal processes underlying particular outcomes. Consequently, manylarge GWAS consortia are expanding to simultaneously examine thejoint role of DNA methylation. However, it is unclear how to lever-age both data types to determine if particular genetic regions arerelated to traits of interest. Therefore, we propose to use the power-ful kernel machine framework for first testing the cumulative effectof both epigenetic and genetic variability on a trait, and for subse-quent mediation analysis to understand the mechanisms by whichthe genomic data types influence the trait. Specifically, we use amulti-kernel approach to model the effects of methylation and geno-type on a continuous outcome while controling for potential con-founders. We demonstrate through simulations and real data appli-cations that our proposed testing approach often improves power todetect trait associated genes, while protecting type I error, and thatour mediation analysis framework can often correctly elucidate themechanisms by which genetic and epigenetic variability influencestraits.

Transformed Low-rank ANOVA Models for High Dimensional

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 87

Page 96: Published by: International Chinese Statistical

Abstracts

Variable Selection�Jianhua Hu1 and Yoonsuh Jung2

1M. D. Anderson Cancer Center2University of [email protected]

For high dimensional genetic data, an important problem is to searchfor associations between genetic variables and a phenotype—typically, a discrete variable (diseased versus normal). A conven-tional solution is to characterize such relationships through regres-sion models in which a phenotype is treated as the response variableand genetic variables are treated as the covariates. Not surprisingly,such a way incurs the challenging problem of the number of vari-ables much larger than the number of observations. We propose astatistical framework of expressing the transformed mean of the ge-netic variables in exponential distribution family via ANOVA typeof models in which a low-rank interaction space captures associ-ation between phenotype and genetic variables. This alternativemethod transforms the variable selection problem to a well-posedproblem with the number of observations larger than number of ge-netic variables. We also develop a new model selection criterionbased on Bayesian information criterion for the new model frame-work with diverging number of parameters. In the talk, we focus ona specific application to genome-wide association studies.

Proper Use of Allele-Specific Expression Improves StatisticalPower for cis-eQTL Mapping with RNA-Seq Data�Yijuan Hu1, Wei Sun2, Jung-Ying Tzeng3 and Charles Perou2

1Emory University2The University of North Carolina at Chapel Hill3North Carolina State [email protected]

Studies of expression quantitative trait loci (eQTLs) offer insightinto the molecular mechanisms of loci that were found to be asso-ciated with complex diseases and the mechanisms can be classifiedinto cis- and trans-acting regulation. At present, high-throughputRNA sequencing (RNA-seq) is rapidly replacing expression mi-croarrays to assess gene expression abundance. Unlike microar-rays that only measure the total expression of each gene, RNA-seq also provides information on allele-specific expression (ASE),which can be used to distinguish cis-eQTLs from trans-eQTLs and,more importantly, enhance cis-eQTL mapping. However, assessingthe cis-effect of a candidate eQTL on a gene requires knowledge ofthe haplotypes connecting the candidate eQTL and the gene, whichcannot be inferred with certainty. The existing two-stage approachthat first phases the candidate eQTL against the gene and then treatsthe inferred phase as observed in the association analysis tends to at-tenuate the estimated cis-effect and reduce the power for detecting acis-eQTL. In this article, we provide a maximum-likelihood frame-work for cis-eQTL mapping with RNA-seq data. Our approach in-tegrates the inference of haplotypes and the association analysis intoa single stage, and is thus unbiased and statistically powerful. Wealso develop a pipeline for performing a comprehensive scan of alllocal eQTLs for all genes in the genome by controlling for falsediscovery rate, and implement the methods in a computationally ef-ficient software program. The advantages of the proposed methodsover the existing ones are demonstrated through realistic simulationstudies and an application to empirical breast cancer data from TheCancer Genome Atlas project.

Session 66: Recent Advances in Empirical LikelihoodMethod

Jackknife Empirical Likelihood for U-Statistics with EstimatedConstraints�Fei Tan and Hanxiang PengIndiana University-Purdue [email protected]

In this talk, the jackknife empirical likelihood for U-statistics (Jing,et al. (2009) and Peng et al. (2015)) is generalized to allow for afinite and growing number of estimated constraints. The latter isneeded to handle naturally occurring nuisance parameters in semi-parametric models. The developed theory is applied to derive thejackknife empirical likelihood based tests and confidence sets forthe variance in a linear regression model, the variances in a bal-anced random effects model with estimated constraints; for Theil’stest about the slope in a simple linear regression with growing num-ber of estimated constraints; for the Wilcoxon signed rank test onsymmetry about a unknown center. Simulations are conducted tostudy their numerical behaviors.

Jackknife Empirical Likelihood for Order-restricted StatisticalInference with Missing Data�Heng Wang and Ping-Shou ZhongMichigan State [email protected]

We consider testing means with an increasing order or a decreasingorder for data with missing values. The missing values are imputednonparametrically under the missing at random assumption. Fordata with imputation, the classical likelihood ratio test designed fortesting the order restricted means is no longer applicable since thelikelihood does not longer exist. This paper proposes a novel testbased on jackknife empirical likelihood (JEL). It is shown that theJEL ratio statistic evaluated under the null hypothesis converges to achi-bar distribution. Simulation study shows that our proposed testmaintains the nominal level well under the null and has prominentpower under the alternative. The test is also robust for normally andnon-normally distributed data.

Improving Estimation in Structural Equation Models: An EasyEmpirical Likelihood Approach�Shan Wang and Hanxiang PengIndiana University-Purdue [email protected]

In a structural equation model (SEM), if the covarianceof two vari-ables is known, then it can be used to improve efficiency. This is re-alized by replacing the covariance of the two variables in the struc-tured covariance matrix with the known covariance. This methodwill not work if side information is not given in covariances. Infact, SEM’s only use the information up to second moments. Forexample, random errors are modeled as uncorrelated with covari-ates. It is common in statistics that random errors are modeled asindependent of covariates, which can’t be used the current SEM’s.In this talk, we propose an easy empirical likelihood approach toincorporate side information in SEM’s. We demonstrate efficiencygain by modeling random errors (1) being independent of covariates(2) symmetric about zero. We exhibit that the implementation of themethod is extremely convenient and can be done with the existingsoftware. We report large simulation results to exhibit efficiencygain.

Composite Empirical Likelihood�Nicole Lazar and Adam Jaeger

88 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 97: Published by: International Chinese Statistical

Abstracts

University of [email protected] virtue of a non-parametric approach such as empirical likeli-hood over parametric counterparts to inference is that one does nothave to specify a distributional family. This provides robustness andflexibility, but often comes with a computational cost. Compositelikelihood, another alternative to classical parametric likelihoods,builds up a complete likelihood piecewise and hence is computa-tionally efficient. In this talk, I will propose and define the ”com-posite empirical likelihood.” The new construct builds a likelihoodout of empirical likelihood pieces, thereby inheriting desirable prop-erties from both ”parents.”

Session 67: New Advances in Adaptive Design and Anal-ysis of Clinical Trials

Sensitivity Analyses for Missing Not at Random (MNAR) inClinical TrialsPeter ZhangOtsuka Pharmaceutical Development & Commercialization [email protected] data has become a focus area for regulatory authorities(FDA, EMA). The FDA reqires the Sponsor to pre-specify sen-sitivity analyses plan for missing not at random (MNAR) to sup-port the primary analysis under missing at random (MAR) assump-tion. Most of MNAR methods (Diggle P, Kenward MG, 1994) havetreated all observations with dropout as if they fall within the samedropout type. In practice, we would find that different dropout rea-sons may be related to the outcomes in different ways, for example,detailed dropout reasons for this study are: adverse events (AE),lack of efficacy (LOE), lost to follow-up, protocol deviation, spon-sor discontinued study, subject met (protocol specified) withdrawalcriteria, subject was withdrawn from participation by the investi-gator, and subject withdrew consent to participate. Dropout dueto an adverse event (AE) and lack of efficacy (LOE) may lead toMNAR dropout. Subject withdrew consent may also lead to MNARdropout. However, it is debatable whether a dropout caused by sub-jects withdrew consent is MAR or MNAR. Except AE, LOE, andsubject withdrew consent, all the other dropout reasons may be as-sumed as either MCAR or MAR dropout. Pattern Mixture Mod-els (PMM) based on Multiple Imputation (MI) with mixed missingdata mechanisms will be used to investigate the response profile ofdropout patients by last dropout reason under MNAR mechanism.

Moment-based Covariate Adjustment Method for TreatmentEffect Estimation in Randomized Clinical Trials�Xiaofei Wang1, Junling Ma2 and Stepen George11Duke University2Shanghai University of Finance and [email protected] the analysis of randomized clinical trials, the covariates that cor-relate with the primary outcome are often adjusted in the estimationof treatment effect in order to improve efficiency and to compen-sate for any lack of baseline covariate balance between treatmentarms. There are many different techniques for adjusting for baselinecovariates. One commonly used method and allowing estimationof conditional treatment effect is the multivariable regression mod-elling. We will review a class of new covariate adjustment methodsthat incorporate the covariates in treatment effect estimation andallow estimation of marginal treatment effect. We will discuss anew moment-based covariate adjustment method that constrain all

higher order moments of the covariate distribution. The proposedmethod follows the same spirit of the new class of covariate adjust-ment methods, but its efficiency gain doesn’t require correct specifi-cation of the parametric form of any regression model. Asymptoticproperties of the proposed method are established. Simulation stud-ies show that the proposed method has nice finite sample proper-ties and perform well compared to existing methods. The proposedmethod is illustrated with a data example.

On Design and Analysis of a Stratified Biomarker Time-to-Event Clinical Trial in the Presence of Measurement ErrorAiyi LiuNational Institutes of [email protected]

Clinical trials utilizing predictive biomarkers have become a topicof increasing research in the era of personalized medicine. Weconfine our attention to the stratified biomarker design where pa-tients with the same biomarker status are randomly assigned to ei-ther an experimental arm or the standard of treatment. The pri-mary interest of a stratified biomarker design is to investigate ifpatients respond differently to treatment based on their biomarkerstatus. Despite the advancements in molecular assays, correctlyidentifying the biomarker status remains a challenging task. We an-alytically demonstrate the profound adverse effects of misclassifiedbiomarker status on the estimates of treatment effect, biomarker ef-fect, treatment-biomarker interaction, the corresponding confidenceintervals, power of the tests, and required sample sizes. We furtherpropose respective remedies that tackle the misclassified biomark-ers in the design and analysis phase of clinical trials. We illus-trate the serious consequences of ignoring the classification errorand demonstrate the performance of the proposed solutions usingsimulations.

Session 68: Design and Analysis in Drug CombinationStudies

Design and Statistical Analysis of Multidrug Combinations inPreclinical Studies and Clinical TrialsMing TanGeorgetown [email protected]

Combination therapy is the hallmark of therapies for cancer, vi-ral or microbial infections, hypertension and other diseases involv-ing complex biological networks. Synergistic drug combinations,which are more effective than predicted from summing effects of in-dividual drugs, often achieve increased therapeutic index. Becausedrug-effect is dose-dependent, multiple doses of an individual drugneed to be examined, yielding rapidly increasing number of com-binations and a challenging high dimensional statistical modelingproblem. The lack of proper design and analysis methods for multi-drug combination studies have resulted in many missed therapeuticopportunities. Although system biology holds the promise to unveilcomplex interactions within biological systems, the knowledge onnetwork remains predominantly at the level of topology. We pro-posed a novel two-stage procedure starting with an initial selectionby utilizing an in silico model built upon experimental data of singledrugs and current system biology information to obtain maximumlikelihood estimate. In this talk, I will present an efficient experi-mental design on selected multi-drug combinations, statistical mod-eling of the resulting data and the proof of its statistical properties.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 89

Page 98: Published by: International Chinese Statistical

Abstracts

Then I will present an adaptive Bayesian trial design for multidrugcombinations with the modeling concept.

Bayesian Hierarchical Monotone Regression I-splines for Dose-Response Assessment and Drug-Drug Interaction Analysis�Gary Rosner1, Violeta Hennessey2 and VeerabhadranBaladandayuthapani31Johns Hopkins University2Amgen Inc.3M. D. Anderson Cancer [email protected] provide a practical and flexible method for dose-response mod-eling and drug-drug interaction analysis. This semi-parametricBayesian hierarchical model allows a meta-analysis of indepen-dent repeated dose-response experiments. We use monotone re-gression I-splines to estimate the mean dose-response function forfunctional in vitro dose-response data and incorporate the spline-based model in a Bayesian hierarchical framework. Posterior infer-ence on quantities of interest is facilitated by Markov chain MonteCarlo (MCMC). Inference focuses on estimating inhibitory concen-trations and the Loewe Interaction Index for drug-drug interactionanalysis. We compare our approach to analyses using a parametricEmax model.

A Bayesian Nonparametric Approach for Synergy Assessmentin Drug Combination StudiesChenguang WangJohns Hopkins [email protected] complex diseases, drug combinations are a common and hope-ful strategy for achieving synergistic treatment effect and reducingdosages and toxicity. Because of the intrinsic complexity of bio-logical systems, nevertheless, it is a thwarting challenge to specifythe parametric probability model that can appropriately describe themechanisms by which the individual drugs achieve their single andjoint effects. In this paper, we propose a Bayesian non-parametricapproach for in vitro drug combination studies which models bothdose-response and dose-toxicity relationships in a flexible manner.A Bayesian synergism evaluation procedure is also developed usingLoewe additivity as the reference.

Session 69: Recent Developments in Empirical Likeli-hood Methodologies: Diagnostic Studies, Goodness-of-FitTesting, and Missing Values

Jackknife Empirical Likelihood Confidence Regions for theEvaluation of Continuous-scale Diagnostic Tests with Verifica-tion BiasBinhuan Wang1 and �Gengsheng Qin2

1New York University2Georgia State [email protected], Wang and Qin proposed various bias-corrected empiricallikelihood confidence regions for anytwo of the three parameters,sensitivity, specificity, and cut-off value, with the remaining param-eter fixed at a given value in the evaluation of a continuous-scalediagnostic test with verification bias. In order to apply those meth-ods, quantiles of the limiting weighted chi-squared distributions ofthe empirical loglikelihood ratio statistics should be estimated. Inorder to facilitate application and reduce computation burden, inthis paper, jackknife empirical likelihood-based methods are pro-posed for any pairs of sensitivity, specificity and cut-off value, and

asymptotic results can be derived accordingly. The proposed meth-ods can be easily implemented to construct confidence regions forthe evaluation of continuous-scale diagnostic tests with verificationbias. Simulation studies are conducted to evaluate the finite sam-ple performance and robustness of the proposed jackknife empiricallikelihood-based confidence regions in terms of coverage probabili-ties. Finally, a real example is provided to illustrate the applicationof new methods.

Jackknife Empirical Likelihood Goodness-Of-Fit Tests For Vec-tor U-statisticsFei Tan1, Qun Lin2, Wei Zheng1 and �Hanxiang Peng1

1Indiana University-Purdue University2Eli Lilly and [email protected]

Motivated by applications to goodness of fit U-statistic testing, thejackknife empirical likelihood (Jing, et al.) is justified with two al-ternative approaches and the Wilks theorems for vector U- statisticsare proved. This generalizes Owen’s empirical likelihood for vec-tors to vector U statistics and includes the JEL for U-statistics withside information as a special case. The results are extended to al-low for the constraints to use estimated criteria functions and for thenumber of constraints to grow with the sample size. The developedtheory is applied to derive the JEL tests or confidence sets for someuseful vector U-statistics associated with chi-square statistic, Cron-bach’s coefficient alpha, Pearson’s correlation, concordance cor- re-lation coefficient, Cohen’s kappa, Goodman & Kruskal’s Gamma,Kendall’s tau-b, and interclass correlation; for a linear mixed ef-fects model; for a balanced random effects model; for models withover-dispersion and zero-inflated Poisson; for U-quantiles includ-ing Hodges- Lehmann median, Gini’s mean difference and Theil’stest; and for the simplicial depth function. A small simulation isconducted to evaluate the tests.

Jackknife Empirical Likelihood Interval Estimators for theGini Index�Dongliang Wang1, Yichuan Zhao2 and Dirk Gilmore21SUNY Upstate Medical University2Georgia State [email protected]

A variety of statistical methods have been developed to the intervalestimation of a Gini index, one of the most widely used measuresof economic inequality. However there is still plenty of room forimprovement in terms of coverage accuracy and interval length. Inthis paper, we propose interval estimators for the index and the dif-ference of two Gini indexes via jackknife empirical likelihood. Viaexpressing the estimating equations in the form of U-statistics, ourmethod can be simply applied as the standard empirical likelihoodfor a univariate mean and avoid maximizing theprofile empiricallikelihood for the difference of two Gini indexes. Simulation studiesshow that our method is comparable to existing empirical likelihoodmethods in terms of coverage accuracy, but yields shorter intervals.The proposed methods are illustrated via analyzing a real data set.

Jackknife Empirical Likelihood Inference with Regression Im-putation and Survey Data�Ping-Shou Zhong1 and Sixia Chen2

1Michigan State [email protected]

We propose jackknife empirical likelihood (EL) methods for con-structing confidence intervals of mean with regression imputationthat allows ignorable or nonignorable missingness. The confidence

90 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 99: Published by: International Chinese Statistical

Abstracts

interval is constructed based on the adjusted jackknife pseudo-values (Rao and Shao, 1992). The proposed EL ratios evaluatedat the true value converge to the standard chi-square distribution un-der both missing mechanisms for simple random sampling. Thusthe EL can be applied to construct a Wilks type confidence inter-val without any secondary estimation. We then extend the proposedmethod to accommodate Poisson sampling design in survey sam-pling. The proposed methods are compared with some existingmethods in simulation studies. We also apply the proposed methodto an Italy household income panel survey data set.

Session 70: Use of Simulation in Drug Development andDecision Making

Simulations: The Future of Clinical Trial DesignBen SavilleBerry [email protected] are key to understanding complex clinical trial design.They allow investigators to prospectively “see” a trial in action anditeratively refine the design to align with the scientific goals of thetrial. In addition, simulations are used to produce operating char-acteristics that assess performance and satisfy regulatory concerns.The utility of simulations is illustrated through a cancer study of asingle experimental regimen applied to three different subtypes ofsarcoma. A Bayesian hierarchical model is used to allow borrow-ing of information across the sarcoma cohorts. The design includesfrequent interim analyses to allow stopping based on futility or suc-cess. Software and regulatory issues for simulation-based designsare discussed.

Bayesian Application in Optimizing Probability of Study Suc-cess with Multiple Endpoints Setting�Grace Li, Honghua Jiang, Shen Lei, Karen Price, Haoda Fu andDavid MannerEli Lilly and Companyli ying [email protected] analysis is broadly applied in decision making throughoutthe drug development process, due to its intuitive framework andability to provide direct answers to complex problems. Likewise,the graphical testing approach is also being applied more in clinicaltrials because of its flexibility in testing multiple endpoints whilestrongly controlling familywise error rate. In this talk, we presenta case study for a Phase 3 clinical trial design. The Bayesian inte-grated two-component prediction model was fit using virtual patientPK/PD-derived data. A set of Phase 3 studies were simulated fromthe posterior samples obtained via the Bayesian model. The simu-lations incorporated specific study characteristics, such as dropoutrate and data collection scheme. A set of p-values for all endpointswithin each simulated trial were calculated. Various graphical test-ing schemes were assessed to identify the optimal scheme withthe highest probability of success. This case study is intended todemonstrate the value of optimizing the probability of study suc-cess (PrSS) by leveraging the strength of both Bayesian applicationand the graphical testing approach in a correlated multiple endpointsetting. This application has the potential to improve decision mak-ing and increase efficiency in drug development.

Evaluation of Strategies for Designing Phase 2 Dose FindingStudiesCristiana MayerJohnson & Johnson

[email protected] the dose response relationship for efficacy and safetyremains a challenging component of drug development aiming tobring new therapeutic agents faster to the market. In Phase 2, ex-ploring the dose response relationship and the selecting the “op-timal” doses can result in a significant acceleration of the overalldrug development process, when done in a thorough and efficientway. The talk aims to illustrate a simulation example applied to amodel-based technique to design a Phase 2 dose-finding study in thepulmonary disease therapeutic area. The MCP-Mod methodologyis an efficient approach to explore the nature of the dose responserelationship, estimate target doses of interest such as the minimallyeffective dose (MED), and adequately identify the safe and effectivedose range to move into Phase 3. A specific example will be dis-cussed to highlight the critical role of modeling and simulation, anda comparison with the old-fashion conventional approach of pair-wise multiple comparisons will be made.

Session 71: Next Generation Functional Data

Analysis of Clustered Longitudinal/Functional DataNaisyin WangUniversity of [email protected]

Modern medical diagnostic procedures now often involve recordsconsisting of multiple longitudinal/functional data that can be con-sidered as clusters of assessment of certain underlying systems. Weexplore the use of model-based clustering approaches, with a focusof extending the scope of identification of different latent featuresembedded in the data, for the purpose of differentiating the observa-tions collected from normal versus abnormal samples. Various cri-teria, including out-of-sample prediction, were employed to gaugethe use of different types of features and the bases on which theywere evaluated. Effectiveness of the new methods is demonstratedusing both synthetic data and data collected through medical stud-ies.

Functional Data Analysis for Quantifying Brain Connectivity�Hans-Georg Mueller1, Alexander Petersen1 and OwenCarmichael21University of California, Davis2Louisiana State [email protected]

Functional Data Analysis for Quantifying Brain Connectivity Func-tional data analysis provides a toolbox for the analysis of data sam-ples that can be viewed as being generated by repeated realizationsof an underlying (and often latent) stochastic process. The appli-cation of this methodology to paired processes (X,Y) will be illus-trated by quantifying resting state fMRI connectivity through mea-sures of functional correlation between X and Y. Resulting corre-lations between brain hubs and also of the voxels within hubs canbe used for the construction of subject-specific intra-hub correlationdensity functions as well as inter-hub and intra-hub networks. Weintroduce connectivity threshold functions that quantify a selectedcharacteristic of the network in dependency on the threshold, yield-ing one function per subject. Functional principal components canbe extracted from these connectivity threshold functions and usedto predict cognitive test scores.

Functional Principal Component Analysis of Spatial-TemporalPoint Processes with Applications in Disease Surveillance�Yehua Li1 and Yongtao Guan2

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 91

Page 100: Published by: International Chinese Statistical

Abstracts

1Iowa State University2University of [email protected] disease surveillance applications, the disease events are mod-eled by spatial-temporal point processes. We propose a new classof semiparametric generalized linear mixed Cox model for suchdata, where the event rate is related to some known risk factors andsome unknown latent random effects. We model the latent spatial-temporal process as spatially correlated functional data, and proposecomposite likelihood methods based on spline approximation to es-timate the mean and covariance of the latent process. By performingfunctional principal component analysis to the latent process, wegain deeper understanding of the correlation structure in the pointprocess, and we propose an empirical Bayes method to predict thelatent spatial random effects, which can help highlighting the highrisk spatial regions for the disease. Under an increasing domainand increasing knots asymptotic framework, we provide the asymp-totic distribution for the parametric components in the model andthe asymptotic convergence rate for the functional principal compo-nent estimators. We illustrate the methodology through a simulationstudy and an application to the Connecticut Tumor Registry data.

Localized Functional Principal Component Analysis�Kehui Chen1 and Jing Lei21University of Pittsburgh2Carnegie Mellon [email protected] propose localized functional principal component analysis (LF-PCA), looking for orthogonal basis functions with localized supportregions that explain most of the variability of a random process. TheLFPCA is formulated as a convex optimization problem througha novel Deflated Fantope Localization method and is implementedthrough an efficient algorithm to obtain the global optimum. Theanalyses of a country mortality data and a growth curve data revealinteresting features that cannot be found by standard FPCA meth-ods.

Session 73: Non-Parametrics and Semi-Parametrics:New Advances and Applications

Single-index Models for Function-on-Function Regression�Guanqun Cao1 and Lily Wang2

1Auburn University2Iowa State [email protected] propose a general framework for smooth regression of a func-tional response on multiple functional predictors, in which the meanof the response is related to the linear predictors via an unknownlink function. Assuming that the functional predictors are observedat discrete points, we use B-spline basis functions to estimate theslope functions and the link function, and propose an iterative esti-mating procedure.

Free-knot Splines for Generalized Linear ModelsElla Revzin1 and �Jing Wang2

1Coyote Logistics2University of Illinois at [email protected]

A computational study of bootstrap confidence bands based on afree-knot spline regression is explored for the generalized linearmodels in this paper. In free-knot spline regression, the knot loca-tions as additional parameters offers greater flexibility and the po-

tential tobetter account for rapid shifts in slope and other importantstructures in the target function. However, the search for optimalsolutions becomes very complicated because of “freeing” up theknots. In particular, the “lethargy” property in the objective func-tion results in many local optima with replicate knot solutions. Toprevent solutions with identical knots, a penalized Quasi-likelihoodestimating equation is proposed that relies on both a Jupp transfor-mation of knot locations and an added penalty on solutions withsmall minimal distances between knots. Focusing on logistic re-gression for binary outcome data, a parametric bootstrap is used tostudy the variability of the proposed estimator and to construct con-fidence bands for the unknown form of the logistic regression linkfunction. A real example is also studied.

White Noise Testing and Model Diagnostic Checking for Func-tional Time SeriesXianyang ZhangUniversity of [email protected]

This paper is concerned with white noise testing and model diag-nostic checking for stationary functional time series. To test forthe functional white noise null hypothesis, we propose a Cramer-von Mises type test based on the functional periodogram introducedby Panaretos and Tavakolithe (2013a). Using the Hilbert space ap-proach, we derive the asymptotic distribution of the test statisticunder suitable assumptions. A new block bootstrap procedure isintroduced to obtain the critical values from the non-pivotal limit-ing distribution. Compared to existing methods, our approach isrobust to the dependence within white noise and it does not involvethe choices of functional principal components and lag truncationnumber. We employ the proposed method to check the adequacyof functional linear models and functional autoregressive models oforder one by testing the uncorrelatedness of the residuals. MonteCarlo simulations are provided to demonstrate the empirical advan-tages of the proposed method over existing alternatives. Our methodis illustrated via anapplication to cumulative intradaily returns.

Collective Estimation of Multiple Bivariate Density Func-tions with Application to Angular-sampling-based Protein LoopModelingMehdi Maadooliat1, �Lan Zhou2, Seyed M. Najibi3, Xin Gao4 andJianhua Huang2

1Marquette University2Texas A&M University3Shahid Beheshti University4King Abdullah University of Science and [email protected]

This paper develops a method for simultaneous estimation of den-sity functions for a collection of populations of protein backboneangle pairs using a shared set of bivariate spline basis functions thatare determined by the observed data. The circular nature of angu-lar data is taken into account by imposing appropriate smoothnessconstraints across boundaries of the triangles. Maximum penal-ized likelihood is used to fit the model and an alternating clockwiseNewton-type algorithm is developed for computation. A simula-tion study shows that the collective estimation approach is statis-tically more efficient than estimating the densities separately. Theproposed method was used to estimate neighbor-dependent distri-butions of protein backbone dihedral angles (i.e., Ramachandrandistributions). The estimated distributions were applied to proteinloop modeling, one of the most challenging open problems in pro-tein structure prediction, by feeding them into an angular-sampling-

92 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 101: Published by: International Chinese Statistical

Abstracts

based loop structure prediction framework. Our estimated distri-butions compared favorably to the Ramachandran distributions andthe recently proposed distributions by fitting a hierarchical Dirich-let process model; and in particular, our distributions showed sig-nificant improvements on the hard cases where existing methods donotwork well.

Session 74: Empirical Likelihoods for Analyzing Imcom-plete Data

ANOVA for Longitudinal Data with Missing Values�Songxi Chen1 and Ping-Shou Zhong2

1Iowa State University2Michigan State [email protected] carry out ANOVA comparisons of multiple treatments for longi-tudinal studies with missing values. The treatment effects are mod-eled semiparametrically via a partially linear regression which isflexible in quantifying the time effects of treatments. The empiri-cal likelihood is employed to formulate model-robust nonparamet-ric ANOVA tests for treatment effects with respect to covariates,the nonparametric time-effect functions and interactions betweencovariates and time. The proposed tests can be readily modifiedfor a variety of data and model combinations, that encompassesparametric, semiparametric and nonparametric regression models;cross-sectional and longitudinal data, and with or without missingvalues.

Calibration in Missing Data Analysis Through Empirical Like-lihoodPeisong HanUniversity of [email protected] is a technique developed in sampling survey literature.Its application in missing data analysis has attracted considerableresearch interests recently. We will discuss how calibration, com-bined with the empirical likelihood method, can lead to many de-sirable properties when analyzing incomplete data. Especially, therobustness against model misspecification can be significantly im-proved, resulting in the so-called multiply robust estimators. Theseestimators are consistent if any one of the postulated parallel para-metric models is correctly specified.

Asymptotic Behavior of the Sample Average of Partial Likeli-hood for the Cox ModelJian-Jian RenUniversity of [email protected] Cox model (Cox, 1972) has been the most popular model inthe survival data analysis during the past several decades. In re-cent years, several authors have proposed adaptive LASSO for theCox model to improve the computational efficiency in the context ofvariable selection procedures. It is known that the issue of regular-ization parameter selection for penalized partial likelihood dependson the choice of the regularization parameter. For the study of theasymptotic behavior of the regularization parameter selector, in thisarticle we surprisingly discover that Cox’s partial likelihood doesnot behave like an ordinary likelihood in the sense that the ‘sampleaverage’ of partial likelihood function diverges to infinity, which isin contrast to the well-known fact that under mild regularity condi-tions, the sample average of the ordinary likelihood function con-verges to its expectation (a finite value) in probability as the sample

size n goes to infinity. This is an interesting and surprising resultbecause it has been shown that in most usual senses, Cox’s partiallikelihood behaves asymptotically like an ordinary likelihood. Thecomparison of our discovery here with the full likelihood (empiri-cal likelihood) based procedures for the Cox model (Ren and Zhou,2011) is studied.

Efficient Estimation of the Cox Model with Auxiliary SubgroupSurvival Information�Chiung-Yu Huang1, Jing Qin2 and Huei-Ting Tsai31Johns Hopkins University2National Institutes of Health3Georgetown [email protected]

With the rapidly increasing availability of data in the public domain,combining information from different sources to infer about associ-ations or differences of interest has become an emerging challengeto researchers. We present a novel approach to improve efficiencyin estimating the survival time distribution by synthesizing informa-tion from the individual-level data with t-year survival probabilitiesfrom external sources such as disease registries. While disease reg-istries provide accurate and reliable overall survival statistics for thedisease population, critical pieces of information that influence bothchoice of treatment and clinical outcomes usually are not availablein the registry database. To combine with the published informa-tion, we propose to summarize the external survival information viaa system of nonlinear population moments and estimate the survivaltime model using empirical likelihood methods. The proposed ap-proach is more flexible than the conventional meta-analysis in thesense that it can automatically combine survival information for dif-ferent subgroups and the information may be derived from differentstudies. Moreover, an extended estimator that allows for a differ-ent baseline risk in the aggregate data is also studied. Empiricallikelihood ratio tests are proposed to examine whether the auxiliarysurvival information is consistent with the individual-level data.

Session 75: Model Selection in Complex Data Settings

Meta-analysis Based Variable Selection for Gene ExpressionDataQuefeng Li1, �Sijian Wang2, Menggang Yu2 and Jun Shao2

1The University of North Carolina at Chapel Hill2University of [email protected]

Recent advance in biotechnology and its wide applicationshave ledto the generation of many high-dimensional gene expression datasets that can be used to address similar biological questions. Meta-analysis plays an important role in summarizing and synthesizingscientific evidence from multiple studies. When the dimensions ofdatasets are high, it is desirable to incorporate variable selection intometa-analysis to improve model interpretation and prediction. Inthis talk, we propose two novel methods for variable selection withhigh-dimensional meta-data. Our methods not only borrow strengthacross multiple data sets to boost the power to identify importantgenes, but also keep the selection flexibility among data sets to takeinto account data heterogeneity. Our methods can incorporate priorbiological knowledge, such as pathway information, into the mod-els. We show that our method possesses the gene selection con-sistency with NP-dimensionality. Simulation studies demonstratethe good performance of our method. We applied our meta-lassomethod to a meta-analysis of cardiovascular studies. The analysisresults are clinically meaningful.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 93

Page 102: Published by: International Chinese Statistical

Abstracts

Structural Discovery for Joint Models of Longitudinal and Sur-vival Outcomes�Zangdong He, Wanzhu Tu and Zhangsheng YuIndiana [email protected] models of longitudinal and survival outcomes have been usedwith increasing frequency in clinical investigations. Correct spec-ification of functional forms of independent variables is essentialfor practical data analysis. Structural discovery in both longitudinaland survival components functions as a necessary safeguard againstmodel misspecification. However, structural discovery in such mod-els has not been studied. No existing computational tools, to the bestof our knowledge, have been made available to practitioners. Inthis paper, we describe a penalized likelihood method with adaptiveleast absolute shrinkage and selection operator (ALASSO) penaltyfunctions for structural discovery in joint models by decomposingthe independent variable effect into linear and nonlinear compo-nents. Then the selection of linear and nonlinear components mim-ics the selection of fixed and random effects in the mixed-effectsselection. In doing so, ALASSO and group ALASSO are used toselect linear and nonlinear components, correspondingly. To re-duce the estimation bias resulted from penalization, we propose atwo-stage selection procedure in which the magnitude of the bias isameliorated in the second stage. The penalized likelihood is approx-imated by Gaussian quadrature and optimized by an EM algorithm.Simulation study showed excellent selection results in the first stageand small estimation biases in the second stage. To illustrate, we an-alyzed a longitudinally observed clinical marker and patient survivalin a cohort of cancer patients.

An Empirical Bayes Approach to Integrate Multiple GWASwith Gene Expressions from Multiple Tissues�Jin Liu1 and Can Yang2

1Duke-NUS2Hong Kong Baptist [email protected] date, a large number of genome-wide association studies(GWAS) have been conducted. With advancement of array tech-niques, there are a large number of genomic data available frommultiple sources: e.g., gene expression data from multiple tissues.The unbiased tissue studies can reveal new dimensions of biologicaleffects [Dermitzakis, 2012]. Thus, it becomes essential to integrategene expression from multiple tissues with GWAS that can increasethe statistical power of the analysis of a single GWAS. We proposeto use empirical-Bayes-based approach to model the status of eachgene (null or non-null) enhanced by gene expression from tissues.We develop an expectation-maximization (EM) algorithm to opti-mize the corresponding complete log-likelihood function. Thesemethods can jointly analyze two or more GWAS at the same timeto test for the “pleiotropic” effects. We can also evaluate the sig-nificance of integrating a tissue. To integrate multiple tissues, wepropose a three-stage strategy using penalized linear discriminantanalysis (LDA) to transform expressions from multiple tissues tothe new predictor with much lower dimension. Meanwhile, we es-timate the corresponding local false discovery rate (FDR) and for-mulate the hypothesis testing for ”pleiotropy” and identification ofassociated tissues. Simulation studies are used to evaluate finitesample performance. We make comparison under different levelof “pleiotropy” using generative model. Rheumatoid arthritis andtype-1 diabetes from the Wellcome Trust Case Control Consortium(WTCCC) together with gene expression from multiple tissues areanalyzed using the proposed approach.

Model Selection in Multivariate Semiparametric Regression�Zhuokai Li1, Hai Liu2 and Wanzhu Tu2

1Duke University2Indiana [email protected]

We consider model selection in multivariate semiparametric regres-sion for longitudinal data. We select fixed and random effects us-ing a maximum penalized likelihood method with the adaptive leastabsolute shrinkage and selection operator (LASSO) penalty. Theinterdependence structure among multiple outcomes is determinedthrough random effects selection. Additionally, interactions of in-dependent variables modeled by bivariate tensor product splines areselected using group LASSO. To implement the model selectionmethod, we propose a two-stage expectation-maximization (EM)procedure. We assess the operating characteristics of the proposedmethod through a simulation study. The method is illustrated in aclinical study of blood pressure regulation in children.

Session 76: Advances in Statistical Methods of Identify-ing Subgroup in Clinical Studies

The Bias Correction in Comparing the Treatment Effect in Dif-ferent Subgroups of Patients from a Randomized Clinical Trial�Lu Tian1, Fei Jiang2 and LJ Wei21Stanford University2Harvard [email protected]

To test the interaction between treatment and a binary covariate, weoften directly compare two naıve estimators for the treatment effectin two subgroups with data from a randomized comparative clinicalstudy. This method is criticized for potential bias due to the imbal-ance in important baseline characteristics for patients in the smallsubgroups. A novel and flexible augmentation procedure has beenrecently studied, for example, by Zhang and others (2008. Biomet-rics 64, 707-715) to improve the performance of the naıve estimatorfor estimating the overall treatment effect utilizing the baseline co-variate information. In this talk, we will generalize this method totest the interactions and show that the resulting augmentation notonly reduces the variance but also corrects the bias of the result-ing estimator for the treatment-covariate interaction. We will usenumerical study as well as examples to illustrate the proposal.

A Regression Tree Approach to Identifying Subgroups with Dif-ferential Treatment EffectsWei-Yin LohUniversity of [email protected]

For diseases such as cancer, it is often difficult to discover new treat-ments that benefit all subjects. A more realistic goal is to identifysubgroups of subjects for whom the treatment has a large effect. Re-gression trees are natural solutions because they partition the dataspace. For the subgroups to be reliable, however, it isimportantthat there be no bias in the way splits are selected. We proposean approach that is unbiased and is applicable to data with censoredor multivariate responses, missing predictor values, and treatmentswith two or more levels. Further, we introduce a bootstrap tech-nique for constructing confidence intervals for selective inferencein the nodes of the tree.

Identifying Subgroups of Enhanced Predictive Accuracy fromLongitudinal Biomarker Data Using Tree-based Approaches:

94 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 103: Published by: International Chinese Statistical

Abstracts

Applications to Monitoring Fetal Growth�Jared Foster, Danping Liu, Paul Albert and Aiyi LiuNational Institutes of [email protected] monitoring of biomarkers is helpful for predicting dis-ease or a poor clinical outcome. Typically these longitudinal pre-dictors are evaluated across an entire population; however, it is alsopossible that this prediction is only accurate in one or more sub-groups of the population. For example, recent work suggests thataccurate prediction of large-for-gestational-age birth (LGA) fromultrasounds taken late in pregnancy is possible, but that this pre-diction is poor when only early ultrasound measurements are used.Thus, our goal is to identify subgroups of women for whom earlyprediction is more accurate, should they exist. We propose a tree-based approach, which extends the classification and regression tree(CART) methodology to a longitudinal classification setting, and si-multaneously controls the risk of false discovery of subgroups. Toassess the performance of the proposed methods, extensive simula-tion studies are undertaken. The proposed methods are motivatedby and applied to data from the Scandinavian Fetal Growth Study.

A Bayesian Approach For Subgroup AnalysisJames O. Berger1, �Xiaojing Wang2 and Lei Shen3

1Duke University2University of Connecticut3Eli Lilly and [email protected] talk discusses subgroup analysis, the goal of which is to deter-mine the heterogeneity of treatment effects across subpopulations.Searching for differences among subgroups is challenging becauseit is inherently a multiple testing problem with the complication thattest statistics for subgroups are typically highly dependent, mak-ing simple multiplicity corrections such as the Bonferroni correc-tion too conservative. In this article, a Bayesian approach to iden-tify subgroup effects is proposed, with a scheme for assigning priorprobabilities to possible subgroup effects that accounts for multi-plicity and yet allows for (preexperimental) preference tospecificsubgroups. The analysis utilizes a new Bayesian model selectionmethodology and, as a by-product, produces individual probabili-ties of treatment effect that could be of use in personalized medicine.The analysis is illustrated on an example involving subgroup analy-sis of biomarker effects on treatments.

Session 77: Recent Innovative Methodologies and Appli-cations in Genetics & Pharmacogenomics (GpGx)

Tree-based Rare Variants AnalysesChi Song and �Heping ZhangYale [email protected] the development of next generation sequencing (NGS) tech-nology, researchers have been extending their efforts on genome-wide association studies (GWAS) from common variants to rarevariants to find the missing inheritance. Although various statisticalmethods have been proposed to analyze rare variants data, they gen-erally face difficulties for complex disease models involving multi-ple genes. In this paper, we propose a tree-based method that adoptsa non-parametric disease model and is capable of exploring gene-gene interactions. We found that our method outperforms the se-quence kernel association test (SKAT) in most of our simulationscenarios, and by notable margins in some cases. By applying the

tree-based method to the Study of Addiction: Genetics and Environ-ment (SAGE) data, we successfully detected gene CTNNA2 and its44 specific variants that increase the risk of alcoholism in women.This gene has not been detected in the SAGE data. Post hoc lit-erature search also supports the role of CTNNA2 as a likely riskgene for alcohol addiction. This finding suggests that our tree-basedmethod can be effective in dissecting genetic variants for complexdiseases using rare variants data.

Composite Kernel Machine Regression Based on LikelihoodRatio Test and its Application on Genomic Studies�Ni Zhao and Michael Wu

Fred Hutchinson Cancer Research Center

[email protected]

Semiparametric kernel machine regression has emerged as a power-ful and flexible tool in genomic studies in which genetic variants aregrouped into biologically meaningful entities for association testing.Recent advances have expanded the method to test for the effect ofmultiple groups of genomic features via a composite kernel that isconstructed as a weighted average of multiple kernels. Variancecomponent testing is used to evaluate the significance but requiresfixing the weighting parameters or perturbation. In this paper, wefocus on the (restricted) likelihood ratio test for kernel machine re-gression with composite kernels where instead of fixing the weight-ing parameter, we estimate the weighting parameter by maximizingthe likelihood functions through the linear mixed model with mul-tiple variance components. We derive the spectral representation of(R)LRT in linear mixed models with multiple variance componentsto obtain their finite sample distribution. We conduct extensive sim-ulations to evaluate the power and type I error. Finally, we appliedto proposed (R)LRT method to a real study to illustrate our method-ology.

Improving the Robustness of Variable Selection and PredictivePerformance of Lasso and Elastic-net Regularized GeneralizedLinear Models and Cox Proportional Hazard Models�Feng Hong and Viswanath Devanarayan

AbbVie Inc.

[email protected]

In diagnostic and drug development applications, high-dimensionaldata from genomics, proteomics, imaging, etc., are generated forderiving signatures that predict patient phenotypes such as diseasestatus/progression, drug efficacy and safety. Various statistical al-gorithms are utilized to identify an optimal subset of biomarkers,that when applied to an appropriate model, predicts the desired phe-notype. Both the composition and predictive performance of suchbiomarker signatures are critical. Recent algorithms proposed byFriedman et al (2010) and Simon et al (2011) for the regulariza-tion of generalized linear and cox regression models via cyclicalcoordinate descent are extremely useful as they are very fast andcan handle different phenotypes (multinomial, counts, continuous,time-to-event). However the variable selection results tend to be un-stable and affect the composition of the biomarker signature. In thispaper, we propose a Monte-Carlo approach with a cross-validationwrapper to improve the robustness and stability of the variable se-lection results and predictive performance evaluation. We illustratethe improvements via real datasets and simulations.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 95

Page 104: Published by: International Chinese Statistical

Abstracts

Session 78: Analysis and Classification of High Dimen-sional Data

Neyman-Pearson Classification under High-dimensional Set-tingsAnqi Zhao, Yang Feng, Lie Wang and �Xin TongUniversity of Southern [email protected] existing binary classification methods target on the optimiza-tion of the overall classification risk and may fail to serve some real-world applications such as cancer diagnosis, where users are moreconcerned with the risk of misclassifying one specific class than theother. Neyman-Pearson (NP) paradigm was introduced in this con-text as a novel statistical framework for asymmetric type I/II errorpriorities. It seeks classifiers that minimize type II error while keep-ing type I error under a user specified level. This article is the firstattempt to construct classifiers with theoretical performance guaran-tee under the NP paradigm in high-dimensional settings. Based onthe fundamental Neyman-Pearson Lemma, we employ Naive Bayesmodels and use a plug-in approach to construct NP-type classifiers.The proposed classifiers satisfy NP oracle inequalities, which arenatural NP paradigm counterparts of oracle inequalities in classicalbinary classification. Moreover, their numerical advantages in pri-oritized error control are demonstrated by both simulation and realdata analysis.

Index Models for Functional Data�Peter Radchenko, Xinghao Qiao and Gareth JamesUniversity of Southern [email protected] regression problem involving functional predictors has manyimportant applications and a number of functional regression meth-ods have been developed. However, a common complication infunctional data analysis is one of sparsely observed curves, that ispredictors that are observed, with error, on a small subset of the pos-sible time points. Such sparsely observed data induces an errors-in-variables model, where one must account for measurement er-ror in the functional predictors. Faced with sparsely observed data,most current functional regression methods simply estimate the un-observed predictors and treat them as fully observed; thus failing toaccount for the extra uncertainty from the measurement error. Sincefunctional predictors are infinite dimensional, performing a func-tional regression requires some form of dimension reduction. Manystandard approaches use an unsupervised method, such as func-tional principal components analysis, to represent the predictors andthen regress Y against the lower dimensional representation of X(t).We propose a new functional errors-in-variables approach, SparseIndex Model Functional Estimation (SIMFE), which uses a func-tional index model formulation to deal with sparsely observed pre-dictors. SIMFE has several advantages over more traditional meth-ods. First, the index model implements a non-linear regression anduses an accurate supervised method to estimate the lower dimen-sional space into which the predictors should be projected. Second,SIMFE can be applied to both scalar and functional responses andmultiple predictors. Finally, SIMFE uses a mixed effects model toeffectively deal with very sparsely observed functional predictorsand to correctly model the measurement error.

Stabilized Nearest Neighbor Classifier and Its Theoretical Prop-ertiesWei Sun1, �Xingye Qiao2 and Guang Cheng1

1Purdue University

2Binghamton [email protected]

The stability of the statistical analysis is an important indicator forreproducibility, which is a critical property for the scientific re-search. It implies that similar statistical conclusions can be reachedbased on independent samples from the same population. In thisarticle, we introduce a general measure of classification instability(CIS) to calibrate the sampling variability of the prediction made bya classification method. This allows us to analyze the behavior ofthe nearest neighbor classifier. Motivated by an asymptotic expan-sion formula of the CIS of the weighted nearest neighbor classifier,we propose the stabilized nearest neighbor (SNN) classifier to ob-tain improvement. In theory, we prove that SNN attains the mini-max optimal convergence rate in the risk, and a sharp convergencerate in CIS, which is established in this article for general plug-inclassifiers under a low-noise condition. We compare the CIS andrisk for SNN and some existing methods. Extensive simulation anddata experiments demonstrate that SNN achieves a considerable im-provement in CIS over existing nearest neighbor classifiers, withmostly equal, sometimes improved, classification accuracy.

Session 79: Recent Developments on Combining Infer-ences and Hierarchical Models

Statistical Issues in Health Related Quality of Life researchMounir MesbahUniversity Pierre et Marie [email protected]

HrQoL has become a major issue for longitudinal clinical or epi-demiological studies these last decades. It is particularly the case forchronic diseases such as HIV-infection, due to the lack of definitivecure. Long-term treatment of chronic diseases may involve someshort- and long-term side-effects which can affect the HrQoL of pa-tients. So, the aim of such studies is an epidemiological surveillanceof health, including HrQoL and survival. Such surveillance is prin-cipally based on comparison of longitudinal evolution of the HrQoLbetween different groups of patients.Statistical validation of qualityof life instruments (or questionnaires) is mainly done through thevalidation of some specific measurement models relating the ob-served outcomes to the unobserved theoretical latent construct (theHrQoL variable that scientist aim to assess). Validation of suchmodels, based on goodness of fit (GOF) tests, is not straight for-ward, mainly because the set of variables involved in the models ispartly unobserved. Goodness of fit tests in the latent context still re-mains an issue. I will show in this talk, how and why the BackwardReliability Curve can be used to detect graphically non unidimen-sional instrument, and other departures from underlying theoreticalmeasurement properties. The outcome provided by the question-naire is most often a categorical response, so the use of a generalizedlinear mixed model to analyze the evolution of the latent HrQoL isstraightforward. Inside this framework, choice of a good measure-ment model and an a priori distribution for the longitudinal latentvariable is the main issue. This issue is, in the HrQoL field com-plicated by the possible occurrence for part of the population of ashifted response. In this talk, I will give an overview about the cur-rent research in Health Related Quality of Life (HrQoL) focusingon some important challenging issues for statistical science.

ROC-based Meta Analysis with Individual Level InformationLu Tian1, �Ying Lu1, Peter Countryman2, Julie Dicarlo2 andCharles Peterfy2

96 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 105: Published by: International Chinese Statistical

Abstracts

1Stanford University2Spire [email protected]

Special statistical methods are needed to conduct meta analysis forinvestigating diagnostic value of a biomarker via reported sensitivityand specificity combinations from individual studies. Current sta-tistical method requires special modeling to restrict the shape of theunderlying Receiver Operational Characteristic (ROC) curves sincethe central question is to construct the entire ROC curve by combin-ing often a moderate number of pairs of sensitivity and specificity.However, if individual level data are available for all or part of thestudies used in the meta analysis, the information can be used to re-lax the stringent assumptions on the ROC curve and allow flexiblecombination of various summary measures related to ROC curvesuch as pairs of sensitivity and specificity and area under the ROCcurve. This is in contrast to the meta analysis evaluating the treat-ment effect where the individual level information may not providesubstantial additive value in additional to the study-level summary.In this talk, we will propose a novel semi-parametric methods forperforming meta analysis on diagnostic values measured by ROCcurve based on individual level information from part of the in-volved studies. Real data example and numerical studies will beused to illustrate and study the operational characteristics of pro-posed methods.

Combining Nonparametric Inferences Using Data Depth andConfidence Distribution�Dungang Liu1, Regina Liu2 and Minge Xie21University of Cincinnati2Rutgers [email protected]

For the purpose of combining inferences from several nonparamet-ric studies for a common hypothesis, we develop a new methodol-ogy using the concepts of data depth and confidence distribution.A confidence distribution (CD) is a sample-dependent distributionfunction that can be used to estimate parameters of interest. Itis a purely frequentist concept yet can be viewed as a “distribu-tion estimator” of the parameter of interest. Examples of CDs in-clude Efron’s bootstrap distribution and Fraser’s significance func-tion (also referred to as p-value function). In recent years, the con-cept of CD has attracted renewed interest and has shown high po-tential to be an effective tool in statistical inference. In this project,we use the concept of CD, coupled with data depth, to developa new approach for combining the test results from several inde-pendent studies for a common multivariate nonparametric hypothe-sis. Specifically, in each study, we apply data depth and bootstrapsto obtain a p-value function for the common hypothesis. The p-value functions are then combined under the framework of com-bining confidence distributions. This approach has several advan-tages. First, it allows us to resample directly from the empiricaldistribution, rather than from the estimated population distributionsatisfying the null constraints. Second, it enables us to obtain testresults directly without having to construct an explicit test statis-tic and then establish or approximate its sampling distribution. Theproposed method provides a valid inference approach for a broadclass of testing problems involving multiple studies where the pa-rameters of interest can be either finite or infinite dimensional. Themethod will be illustrated using simulations and flight data from theFederal Aviation Administration (FAA).

Latent Quality Models for Document Networks�Linda Tan1, Aik Hui Chan2 and Tian Zheng1

1Columbia University2Naitional University of [email protected]

We present the latent quality model (LQM) for joint modeling oftopics and citations in document networks. The LQM combinesthe strengths of the latent Dirichlet allocation (LDA) and the mixedmembership stochastic blockmodel (MMB), and associates eachdocument with a latent quality score. This score provides a topic-free measure of the impact of a document, which is different fromthe raw count of citations. We develop an efficient algorithm forfitting the LQM using variational methods. To scale up to largenetworks, we develop an online variant using stochastic gradientmethods and case-control likelihood approximation. We evaluatethe performance of the LQM using the benchmark KDD Cup 2003dataset with approximately 30,000 high energy physics papers anddemonstrate that LQM can improve citation prediction significantly.

Session 80: Recent Advances in Development and Evalu-ation of Predictive Biomarkers

Identifying Optimal Biomarker Combinations for TreatmentSelection through Randomized Controlled TrialsYing HuangFred Hutchinson Cancer Research [email protected]

Biomarkers associated with treatment-effect heterogeneity can beused to make treatment recommendations that optimize individ-ual clinical outcomes. To accomplish this, statistical methods areneeded to generate marker-based treatment-selection rules that canmost effectively reduce the population burden due to disease andtreatment. Compared to the standard approach of risk modeling toderive treatment-selection rules, a more robust approach is to di-rectly minimize an unbiased estimate of total disease and treatmentburden among a pre-specified class of rules. This problem is one ofminimizing a weighted sum of 0-1 loss function, which is computa-tionally challenging to solve due to the non-smoothness of 0-1 loss.We develop a method that derives marker combinations to minimizethe weighted sum of the Ramp loss function that approximates the0-1 loss, based on data from randomized trials. The algorithm esti-mates treatment-selection rules by repetitively minimizing a smoothand differentiable objective function. Feature selection is furtherincorporated through the use of an L1 penalty. The advantage ofthe proposed estimator compared to existing approaches is demon-strated through extensive simulation studies. We illustrate the ap-plication of the method in host-genetics data from an HIV vaccinetrial.

The Challenge in Making Inference about a Biomarker’s Pre-dictive CapacityHolly JanesFred Hutchinson Cancer Research [email protected]

Biomarkers that predict risk of an adverse outcome are highlysought after in many clinical contexts for guiding the use of inter-ventions to prevent the adverse outcome. A wide variety of statis-tical measures for characterizing the predictive capacity or perfor-mance of a risk model, and for contrasting the performance of dif-ferent models, have been proposed. Often, the same data are usedboth to fit the risk model and to estimate its performance. In thissetting, traditional approaches to doing inference about model per-formance, for example using normal theory or the bootstrap, do not

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 97

Page 106: Published by: International Chinese Statistical

Abstracts

perform well. We show that this is because the model performanceestimators are non-regular, and document the poor performance insimulations based on published studies. We also contrast measuresof the performance of the fitted risk model and of the performanceof the true risk model, and show that inference about either is prob-lematic using traditional approaches. We discuss the practical im-plications of these results and provide recommendations for dataanalysis.

A Potential Outcomes Framework for Evaluating PredictiveBiomarkers�Zhiwei Zhang1, Lei Nie1, Guoxing Soon1 and Aiyi Liu2

1U.S. Food and Drug Administration2National Institutes of [email protected] or treatment selection biomarkers are usually evaluatedin a subgroup or regression analysis with focus on the treatment-by-marker interaction. However, the strength of the interaction isnot directly related to the predictive value of the biomarker. Thelatter concept can be crystalized under a potential outcomes frame-work in which a predictive biomarker is considered a predictor fora desirable treatment benefit defined by comparing potential out-comes for different treatments. Under this approach, a predictivebiomarker can evaluated using familiar concepts in prediction andclassification. A major challenge in this approach is that the desiredtreatment benefit is unobservable because each patient can receiveonly one treatment in a typical study. One possible solution to thisproblem is to assume monotonicity of potential outcomes, with onetreatment dominating the other in all patients. Motivated by an HIVexample that appears to violate the monotonicity assumption, wepropose a different approach based on covariates and random ef-fects for evaluating predictive biomarkers under the potential out-comes framework. Under the proposed approach, the parametersof interest can be identified by assuming conditional independenceof potential outcomes given observed covariates, and a sensitivityanalysis can be performed by incorporating an unobserved randomeffect that accounts for any residual dependence. Application ofthis approach to the motivating example shows that baseline viralload and CD4 cell count are both useful as predictive biomarkersfor choosing antiretroviral drugs for treatment-naive patients.

Session 81: What Are the Expected Professional Behav-iors After Statistics Degrees

What Are the Expected Professional Behaviors After StatisticsDegrees�Richard Davis1, �Susan Murphy2 �Jean Opsomer31Columbia University2University of Michigan3Colorado State [email protected]; [email protected];[email protected] young ICSA members are fresh degree holders. It is vitalfor their career development and success in industry and academiafor them to follow expected professional behaviors in our statisticscommunity and workplace. In fact, knowing the expectations helpease anxiety and improve quality of life for these members as wellbecause professional opportunities are results of both emotional in-telligence and intellectual intelligence. Our panelists are leaders inour profession and we encourage participants to come to the panelwith prepared questions.

Session 82: The Jiann-Ping Hsu Invited Session on Bio-statistical and Regulatory Sciences

A Generalized Birth and Death Process for Modeling the Fatesof Gene DuplicationJing Zhao1, Ashley Teufel2, David Liberles2, Lili Yu3 and �LiangLiu1

1University of Georgia2Temple University3Georgia Southern [email protected]

Several biological models have been proposed to depict the mech-anisms that lead to different evolutionary fates of a gene duplicate.In this paper, we develop a probabilistic model for understandingthe duplication/loss process under 4 different mechanisms of generetention (nonfunctionalization, neofunctionalization, subfunction-alization, and dosage balance), which can produce distinct patternsfor the loss rate of a duplicate over time. The probabilistic model forduplication times is based on the reconstruction process with a time-dependent death rate that varies across 4 different mechanisms. Wehave derived the conditional density function of duplication times,given the first duplication time and the number of gene copies atthe present time. The conditional density function can be used tosimulate duplication times under different mechanisms for a fixednumber of gene copies at the present time. The likelihood functionfor duplication times can be used to find the maximum likelihoodestimates of model parameters. Duplication times simulated fromdifferent mechanisms exhibit distinct patterns, indicating that theproposed probabilistic model can be used to reveal the underlyingmechanism that drives the process of gene duplication and loss dur-ing the history of a gene family.

A Nonparametric Approach for Partial Areas under the Re-ceiver Operating Characteristic Curve and Ordinal DominanceCurveHanfang Yang1, Kun Lu2 and �Yichuan Zhao3

1Renmin University of China2University of Chicago3Georgia State [email protected]

The receiver operating characteristic (ROC) curve is a well-knowntechnique used to measure the performance of a classification. Frommany reasons, such as economical efficiency and ethical preference,people are concerned on a certain sensitivity range of area under theROC curve, i.e., called pAUC (partial area under the ROC curve).After reversing axis, area under ordinal dominance curve is of greatinterest as well. Based on a novel estimator of pAUC proposedby Wang and Chang (2011), we develop nonparametric approachesto study partial AUC’s for above two curves using normal approxi-mation method, jackknife method andjackknife empirical likelihoodmethod. The simulation study demonstrates the drawback of the ex-isting method and shows the performance of three proposed meth-ods. We also compare the jackknife empirical likelihood and thenormal approximation method, and verify the consistency of jack-knife variance estimator as well. The Pancreatic Cancer SerumBiomarker data set is used to illustrate the proposed methods whichare useful in medical study.

Analysis of Longitudinal Multivariate Outcome Data from Cou-ples Cohort Studies: Application to HPV Transmission Dynam-icsXiangrong Kong

98 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 107: Published by: International Chinese Statistical

Abstracts

Johns Hopkins [email protected] is a common STI with 14 known oncogenic genotypes caus-ing anogenital carcinoma. While gender-specific infections havebeen well studied, one remaining uncertainty in HPV epidemiol-ogy is HPV transmission within couples. Understanding transmis-sion in couples however is complicated by the multiplicity of genitalHPV genotypes and sexual partnership structures that lead to com-plex multi-faceted correlations in data generated from HPV couplecohorts, including inter-genotype, intra-couple, and temporal cor-relations. We develop a hybrid modeling approach using Markovtransition model and composite pairwise likelihood for analysis oflongitudinal HPV couple cohort data to identify risk factors as-sociated with HPV transmission, estimate difference in risk be-tween male-to-female and female-to-male HPV transmission, andcompare genotype-specific transmission risks within couples. Themethod is applied on the motivating HPV couple cohort data col-lected in the male circumcision trial in Rakai, Uganda to iden-tify modifiable risk factors (including male circumcision) associ-ated with HR-HPV transmission within couples. Knowledge fromthis analysis will contribute to the public health effort in preventingoncogenic HPV and related cancers in sub-Saharan Africa.

Bayesian Nonlinear Model Selection for Gene Regulatory Net-works�Yang Ni1, Francesco Stingo2 and Veera Baladandayuthapani21Rice University2M. D. Anderson Cancer [email protected] regulatory networks represent the regulatory relationships be-tween genes and their products and are important for exploring anddefining the underlying biological processes of cellular systems.We develop a novel framework to recover the structure of non-linear gene regulatory networks using semiparametric spline-baseddirected acyclic graphical models. Our use of splines allows themodel to have both flexibility in capturing nonlinear dependenciesas well as control of overfitting via shrinkage, using mixed modelrepresentations of penalized splines. We propose a novel discretemixture prior on the smoothing parameter of the splines that allowsfor simultaneous selection of both linear and nonlinear functionalrelationships as well as inducing sparsity in the edge selection. Us-ing simulation studies, we demonstrate the superior performanceof our methods in comparison with several existing approaches interms of network reconstruction and functional selection. We ap-ply our methods to a gene expression dataset in glioblastoma mul-tiforme, which reveals several interesting and biologically relevantnonlinear relationships.

Session 83: Dose Response/Finding Studies in Drug De-velopment

Calibration of Two-stage Continual Reassessment Method�Xiaoyu Jia1, Shing Lee2 and Ken Cheung2

1Boehringer-Ingelheim Pharmaceuticals Inc.2Columbia [email protected] continual reassessment method (CRM) is an adaptive model-based design used to estimate the maximum tolerated dose (MTD)in phase I clinical trials. The method is generally implemented intwo-stage approach, whereby the model based dose escalation isactivated after an initial sequence of patients are treated. We estab-lish a theoretical framework for building a two-stage CRM based

on coherence principle, and proved the unique existence of the mostconservative and still coherent initial design given a CRM model.To facilitate implementation of such design, we also propose a sys-tematic approach to calibrate the initial design and model parameterin the second stage based on the theoretical framework. We demon-strate the application of the proposed design using an oncology dosefinding study currently conducted at Columbia University MedicalCenter. The systematic calibration approach simplifies the modelcalibration process for the two-stage continual reassessment methodand yields competitive design performance comparing to the tradi-tional trial-and-error approach.

A Practical Application with Interim Analysis in a Dose Rang-ing DesignXin WangAbbVie [email protected]

In a clinical development program with a test drug, an interim anal-ysis is proposed for a dose-ranging study. The proposal is to per-form a trend test based on interim data to facilitate a “Go/No Go”decision. If the interim results are promising, then the study con-tinues to completion, and various doses can be studied using theentire data set. This presentation describes the background of thisstudy design. Simulations are preformed to evaluate this proposalunder various settings. Probabilities of Go and No Go at interimare assessed, along with the probabilities of finding the efficaciousdose(s). Simulation results indicate the proposed approach is robust,model independent, and easy to communicate with non-statisticians.

Design Considerations in Dose Finding StudiesXin ZhaoJohnson & [email protected]

Dose finding is of the utmost importance during clinical develop-ment of a new drug. Depending on the specific therapeutic area, adose-finding study is usually conducted at late Phase I or early PhaseII stage of a drug development. The main objective of such studies isto elucidate clinical efficacy in the intended patient population andto define the dosage and dosage schedule. An adequate dose-findingstudy shows the optimal doses in late stage Phase II and/or III trials,thereby saving time and effort and reducing the number of patientsrequired. In this talk, we will explain how to design a dose-findingstudy and the impact of such design in overall drug developmentstrategy. Specifically some design considerations utilizing adaptiveand/or Bayesian techniques will be presented. Case studies fromvarious therapeutic areas will be shared to illustrate the concepts.

Dose Response Relationship in a Phase 1b Dose Ranging Studyin Subjects with Chronic Hepatitis C Virus InfectionDi AnGilead Sciences, [email protected]

Clinical trials for the development of new drug involve severalphases, with different doses of study drug, participant population,and numbers of participants. Phase 1b studies are generally de-signed to evaluate safety and tolerability of multiple doses of thestudy drug, and assess the pharmacokinetics and pharmacodynam-ics. In a recent phase 1b multiple dose-ranging study, subjects withChronic Hepatitis C Virus Infection from several cohorts have re-ceived 3 days of dosing of the study drug. Subjects in each cohortwere administered multiple levels of doses or matching placebo.Blood samples were collected at various time points prior to, dur-ing and after treatment. Antiviral activity of the drug was evaluated

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 99

Page 108: Published by: International Chinese Statistical

Abstracts

by the HCV RNA level and analyzed as categorical and continuousendpoints. Dose-response relationship across dose levels was ex-plored for different cohorts and efficacy endpoints. An Emax modelwas applied using the SAS PROC NLMIXED procedure. Simula-tion was also performed with similar data setting and larger samplesize.

Session 84: Design More Efficient Adaptive Clinical Tri-als Using Biomarkers

Sequential Designs for Individualized Dosing in Phase I CancerClinical Trials�Xuezhou Mao1 and Ying Kuen Cheung2

1Sanofi-aventis U.S. LLC.2Columbia [email protected]

This research presents novel dose-finding designs that adjust for in-dividual pharmacokinetic variability inphase I cancer clinical trials.Extending from a single compartmental model in pharmacokinetictheory, we postulate a two-effect linear model to describe the rela-tionship between the area under concentration-time curve, dose andpredicted clearance.We propose a repeated least squares procedurethat aims to sequentially determine dose according to a subject’sability of metabolizing the drug. To guarantee consistent estimationof the individualized dosing function at the end of a trial, we applyrepeated least squares subject to a constraint based on an eigenvaluetheory for stochastic linear regression. We empirically determinethe convergence rate of the eigenvalue constraint using a real dataset from an irinotecan study in colorectal carcinoma patients, andcalibrate the procedure to minimize a loss function that accounts forthe dosing costs of study subjects and future patients. When com-pared to the standard dosing method using a patient’s body surfacearea, our simulation results demonstrate that our proposed proce-dures control the overall dosing cost and allow for precise estima-tion of the individualized dosing function.

Stratification Free Biomarker Designs for Randomized Trialswith Adaptive EnrichmentNoah SimonUniversity of [email protected]

The biomedical field has recently focused on developing targetedtherapies, designed to be effective in only some subset of the pop-ulation with a given disease. However, for many new treatments,characterizing this subset has been a challenge. Often, at the start oflarge-scale trials the subset is only rudimentarily understood. Thisleads practitioners to either 1) run an all-comers trial without use ofthe biomarker or 2) use a poorly characterized biomarker that maymiss parts of the true target population and potentially incorrectlyindicate a drug from a successful trial. In this talk we will discussa class of adaptive enrichment designs: clinical trial designs that al-low the simultaneous construction and use of a biomarker, duringan ongoing trial, to adaptively enrich the enrolled population. Forpoorly characterized biomarkers, these trials can significantly im-prove power while still controlling type one error. However thereare additional challenges in this framework: How do we adapt ourenrollment criteria in an “optimal” way? (what are we trying to op-timize for?) How do we run a formal statistical test after updatingour enrollment criteria? How do we estimate an unbiased treatmenteffect-size in our ”selected population”? (combating a potential se-

lection bias) In this talk we will give an overview of a class of clin-ical trial designs and tools that address these questions.

Bayesian Predictive Modeling for Personalized Treatment Se-lection in OncologyJunsheng Ma, Francesco Stingo and �Brian HobbsM. D. Anderson Cancer [email protected]

Cancer is a complex dynamic microevolutionary process. Treatmentrequires understanding of the alterations within cell signaling path-ways that enable cancer cells to evade cell death, proliferate, andmigrate. Moreover, the extent of variation in the genomes of cancerpatients and among cancer cells within the same tumor make thedisease inherently heterogeneous. Future breakthroughs in person-alized medicine will rely on molecular signatures that derive fromsynthesis of multifarious interdependent molecular quantities. Inthis presentation, I will introduce a Bayesian predictive approach topersonalized treatment selection for new patients based on the treat-ment histories and molecular measurements of previously treatedpatients. The method formalizes the process for choosing an op-timal therapy in consideration of the extent to which the new un-treated patient’s tumor exhibits similarity with previously treatedpatients.

Optimal Marker-Adaptive Designs for Targeted Therapy Basedon Imperfectly Measured BiomarkersYong Zang1, Suyu Liu2 and �Ying Yuan2

1Florida Atlantic University2M. D. Anderson Cancer [email protected]

Targeted therapy revolutionizes the way physicians treat cancer andother diseases, enabling them to adaptively select individualizedtreatment according to the patient’s biomarker profile. The imple-mentation of targeted therapy requires that the biomarkers are accu-rately measured, which may not always be feasible in practice. Inthis article, we propose two optimal marker-adaptive trial designs inwhich the biomarkers are subject to measurement errors. The firstdesign focuses on a patient’s individual benefit and minimizes thetreatment assignment error so that each patient has the highest prob-ability of being assigned to the treatment that matches his/her truebiomarker status. The second design focuses on the group benefit,which maximizes the overall response rate for all the patients en-rolled in the trial. We develop a Wald test to evaluate the treatmenteffects for marker subgroups at the end of the trial and derive thecorresponding asymptotic power function. Simulation studies andan application to a lymphoma cancer trial show that the proposedoptimal designs achieve our design goal and obtain desirable oper-ating characteristics.

Session 85: Advances in Nonparametric and Semipara-metric Statistics

Quantile Regression for Extraordinarily Large DataStanislav Volgushev1 and �Guang Cheng2

1Cornell University2Purdue [email protected]

One complexity of massive data comes from the accumulating er-rors that are often unknown and may even have varying shapes asdata grows. In this talk, we consider a general quantile-based mod-elling that even allows the unknown error distribution to be arbitrar-ily different across all sub-populations. A delicate analysis on the

100 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 109: Published by: International Chinese Statistical

Abstracts

computational-and-statistical tradeoff is further carried out based onnonparametric sieve estimation.

A Validated Information Criterion to Determine the StructuralDimension in Dimension Reduction Models�Yanyuan Ma1 and Xinyu Zhang2

1University of South Carolina2Chinese academy of [email protected] crucial component in performing sufficient dimension reductionis to determine the structuraldimension of the reduction model.We propose a novel information criterion-based method to achievethis purpose, whose special feature is that when examining thegoodness-of-fit of the current model, we need to obtain model eval-uation by using an enlarged candidate model. Although the pro-cedure does not require estimation under the enlarged model withdimension k + 1, the decision on how well the current model withdimension k fits relies on the validation provided by the enlargedmodel. This leads to the name validated information criterion, cal-culated as VIC(k). The method is different from existing informa-tion criteria based model selection methods. It breaks free from thedependence on the connection between dimension reduction mod-els and their corresponding matrix eigen-structures, which heavilyrelies on a linearity condition that we no longer assume. Its consis-tency is proved and its finite sample performance is demonstratednumerically.

Systematic Clustering and Network Structures: a New Non-parametric Approach that Reveals Unprecedented Structuresand Patterns, with Applications to Large CMS Data.Junheng Ma, �Jiayang Sun and Gq ZhangCase Western Reserve [email protected] valuable clinical information for new discovery fromhugemedical data challenges modern analytics in both statistics and com-puter science. Combining modern statistics and computer sciencetechniques we developed a Numerical Formal Concept Analysis(nFCA) technique, with an R package that interfaces with Rubyand Graphviz, a beautiful computer science visualization software.nFCA overcomes the limitation of FCA and standard statisticalclustering techniques by delivering systematic clustering and net-work structures based on numerical data. In this talk, we introducethe building methodology of nFCA, showcase the functionality ofour nfca() package and then focus on an innovative translation re-search, building disease/risk factor networks using nFCA on thou-sands ICD-9 codes. from large HCFA data from CMS. We revealnew findings, and discuss limitations and future possibilities.

Semiparametric Model Building for Regression Models withTime-Varying ParametersTing ZhangBoston [email protected] consider the problem of semiparametric model building for lin-ear regression models with potentially time-varying coefficients. Byallowing the response variable and explanatory variables be jointly anonstationary process, the proposed methods are widely applicableto nonstationary and dependent observations. We propose a locallinear shrinkage method that can simultaneously achieve parameterestimation and variable selection. Its selection consistency and thefavorable oracle property are established. Due to the fear of losingefficiency, an information criterion is further proposed for distin-guishing between time-varying and time-constant components. Nu-

merical examples are presented to illustrate the proposed methods.

Session 86: Cutting-Edge New Tools for Statistical Anal-ysis and Modeling

Web-based Analytics for Business Decision MakingSam WeerahandiPfizer [email protected]

Web-based Business analytics are of interest and practical impor-tance in most areas of statistical practice, especially in BusinessDecision Making. Web-publishing of analytics should also proveto be of interest to professors who provide external and internalconsulting to non-statisticians, and will come handy when they de-liver results in a manner that clients could run them with their ownscenarios and parameters. This approach does not require clientsto have any knowledge in Statistical Techniques or Statistical Pro-gramming languages – they need to know only the business prob-lem.This presentation will provide an overview of State-of-the-Artin Corporate America, Business Intelligence software, and how toweb-publish your analytics, followed by a demo of some interestinganalytics showing how almost any business could benefit from web-based analytics in a variety of applications Business Management.

A GUI Software for Synchronizing Study Design, StatisticalAnalyses, and Reporting into Simple ClicksYanwei ZhangPfizer [email protected]

This talk will introduce a very powerful yet extremely easy to usesoftware, iSTAT, that allows lab scientists, clinicians, pharmacolo-gists, and statisticians to calculate sample size (from both Frequen-tist’s and Bayesian perspectives) for planning studies, perform awide range of statistical analyses, conduct statistical and pharma-cological modeling and simulation, monitor trials (using group se-quential method, Bayesian predictive approach, and prediction in-terval plots), visualize data interactively, and generate study reportinstantaneously through a Graphical User Interface (GUI) by simpleclicks

Bayesian Mechanism to Enhance Financial Value of Clinical De-velopment PortfolioShu HanPfizer [email protected]

Bayesian statistics has been increasingly used in clinical develop-ment as it provides distinctive advantages in devising adaptive clin-ical trials, modeling flexibility, and incorporation of totality data fordecision making. This presentation focuses on employing Bayesianmechanism for dynamically measuring and optimizing financialvalue of a clinical development portfolio which can include inves-tigational therapeutics and/or medical devices. The Bayesian mod-eling of and the link to key financial measurements of a portfolio,such as expected net present value (eNPV) and expected internalrate of return (eIRR), will be introduced, and the mechanism of us-ing Bayesian adaptive clinical development platform to maximizeportfolio financial value will be described.

An R Package Suite for Meta-analysis in Differentially Ex-pressed Gene Analysis�Jia Li1, Geroge C. Tseng2 and Xingbin Wang2

1Henry Ford Health System2University of Pittsburgh

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 101

Page 110: Published by: International Chinese Statistical

Abstracts

[email protected] the rapid advances and prevalence of high-throughput genomictechnologies, integrating information of multiple relevant genomicstudies has brought new challenges. Meta-analysis has become afrequently used tool in biomedical research. Little effort, however,has been made to develop a systematic pipeline and user-friendlysoftware. Here we present MetaOmics, a suite of three R pack-ages MetaQC, MetaDE and MetaPath with the focus on MetaDEpackage. MetaDE was developed for candidate marker detection byintegrating data from multiple sources. The system allows flexibleinput of experimental data, various clinical outcome (case-control,multi-class, continuous or survival). It generates informative sum-mary output and visualization plots, operates on different operationsystems and can be expanded to include new algorithms or com-bine different types of genomic data. This software suite provides acomprehensive tool to conveniently implement and compare variousgenomic meta-analysis pipelines.

Session 87: Advanced Methods for Graphical Models

Learning Causal Networks via Additive FaithfulnessKuang-Yao Lee1, Tianqi Liu1, �Bing Li2 and Hongyu Zhao1

1Yale University2Pennsylvania State [email protected] this paper we introduce a statistical model, called additively faith-ful directed acyclic graph (AFDAG), for causal learning from ob-servational data. Our approach is based on additive conditionalindependence (ACI), a recently proposed three-way statistical re-lation that shares many similarities with conditional independence.However, the nonparametric characterization of ACI does not in-volve multivariate kernel, so is distinct from conditional indepen-dence. Due to this special feature, AFDAG enjoys the flexibilityof a nonparametric estimator but avoids the curse of dimensional-ity when handling high-dimensional networks. We develop an es-timator for AFDAG based on a linear operator that characterizesACI. We propose a modified PC-algorithm to implement the esti-mating procedures efficiently, so that their complexity is determinedby the density of edges and grows only in a polynomial order ofthe network size. We also establish the consistency and conver-gence rates of our estimator. Through simulation studies we showthat our method outperforms existing methods when commonly as-sumed conditions such as Gaussian or Gaussian copula distribu-tions do not hold. Finally, the usefulness of AFDAG formulationis demonstrated through its application on a proteomics data set.

Statistical Modeling of RNase-seq for Genome-wide Inferenceof RNA StructureZhengqing OuyangThe Jackson [email protected] studies have revealed significant roles for RNA structurein almost every step of RNA processing, including transcription,splicing, transport, and translation. RNase footprinting coupledwith high-throughput sequencing (RNase-seq) has emerged to dis-sect RNA structures at the genome scale. Combining structure-specific RNases together can provide complementary informationon the structural features (such as single-strand or double-strand).However, the inference of RNA structural features from RNase-seqremains challenging because of the issues of data sparsity, signalvariability, and correlation as well as contradiction among multi-ple RNases. We present a probabilistic modeling framework that

systematically captures the correlation structure and variability ofmultiple RNase profiles along the transcripts. We apply our methodon simulated datasets and genome-wide footprinting profiles of thedouble-strand specific RNase V1 and single-strand specific S1 nu-clease in yeast. We demonstrate that our joint modeling approachoutputs interpretable RNA structural features, while approaches thatanalyze the V1 or S1 profile separately do not. Furthermore, com-paring to simple thresholding, our probabilistic modeling approachprobes 53% more nucleotides in the yeast transcriptome withoutcompromising accuracy, and resolves the structural ambiguity of300,000 nucleotides with overlapping V1 and S1 peaks. We alsodemonstrate that using a shared latent variable for modeling RNAaccessibility, our model reveals the prevalent influence of three-dimensional conformation of RNA on RNase footprinting.

Distance Shrinkage and Euclidean Embedding via RegularizedKernel EstimationMing YuanUniversity of [email protected]

Although recovering an Euclidean distance matrix from noisy ob-servations is a common problem in practice, how well this could bedone remains largely unknown. To fill in this void, we study a sim-ple distance matrix estimate based upon the so-called regularizedkernel estimate. We show that such an estimate can be characterizedas simply applying a constant amount of shrinkage to all observedpairwise distances. This fact allows us to establish risk bounds forthe estimate implying that the true distances can be estimated con-sistently in an average sense as the number of objects increases. Inaddition, such a characterization suggests an efficient algorithm tocompute the distance matrix estimator, as an alternative to the usualsecond order cone programming known not to scale well for largeproblems. Numerical experiments and an application in visualizingthe diversity of Vpu protein sequences from a recent HIV-1 studyfurther demonstrate the practical merits of the proposed method.

Detecting Overlapping Communities in Networks with SpectralMethodsYuan Zhang, Elizaveta Levina and �Ji ZhuUniversity of [email protected]

Community detection is a fundamental problem in network analy-sis. In practice, it often occurs that the communities overlap, whichmakes the problem more challenging. Here we propose a general,flexible, and interpretable generative model for overlapping com-munities, which can be thought of as a generalization of the degree-corrected stochastic block model. We develop an efficient spectralalgorithm for estimating the community memberships, which dealswith the overlaps by employing the K-medians algorithm rather thanthe usual K-means for clustering in the spectral domain. We showthat the algorithm is asymptotically consistent when networks arenot too sparse and the overlaps between communities not too large.Numerical experiments on both simulated networks and many realsocial networks demonstrate that our method performs well com-pared to a number of benchmark methods for overlapping commu-nity detection.

Session 88: Advanced Development in Big Data AnalyticsTools

Clique-based Method for Social Network Clustering�Dipak Dey and Guang Ouyang

102 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 111: Published by: International Chinese Statistical

Abstracts

University of [email protected] networks in real life are found to divide naturally into smallcommunities. Examples include Facebook, LinkedIn, computernetworks, and metabolic network etc. The problem of detectingclusters or communities is of great importance. An effective andcommonly used measurement on the quality of a clustering is calledmodularity, and algorithms that maximize this quantity are amongthe most popular network clustering approaches nowadays. Un-fortunately, modularity method has resolution limit when networkis large. Here, we propose a novel network clustering algorithm,which provides user control on this limitation. In addition, we pro-vide detailed results of applying our algorithm in various networks.An analysis about the validity of our algorithm is also included.

Sparse Partially Linear Additive ModelsYin Lou1, �Jacob Bien2, Rich Caruana3 and Johannes Gehrke21LinkedIn Corporation2Cornell University3Microsoft [email protected] generalized partially linear additive model (GPLAM) is a flex-ible and interpretable approach to building predictive models. Itcombines features in an additive manner, allowing each to haveeither a linear or nonlinear effect on the response. However, thechoice of which features to treat as linear or nonlinear is typicallyassumed known. Thus, to make a GPLAM a viable approach insituations in which little is known a priori about the features, onemust overcome two primary model selection challenges: decidingwhich features toinclude in the model and determining which ofthese features to treat nonlinearly. We introduce the sparse partiallylinear additive model (SPLAM), which combines model fitting andboth of these model selection challenges into a single convex op-timization problem. SPLAM provides a bridge between the Lassoand sparse additive models. Through a statistical oracle inequalityand thorough simulation, we demonstrate that SPLAM can outper-form other methods across a broad spectrum of statistical regimes,including the high-dimensional setting. We develop efficient algo-rithms that are applied to real data sets with half a million samplesand over 45,000 features with excellent predictive performance.

Clustering by Propagating Probabilities Between Data Points�Guojun Gan, Yuping Zhang and Dipak DeyUniversity of [email protected] this paper, we propose a graph-based clustering algorithm called“probability propagation,” which is able to identify clusters havingspherical shapes as well as clusters having non-spherical shapes.Given a set of objects, the proposed algorithm uses local densitiescalculated from a kernel function and a bandwidth to initialize theprobability of one object choosing another object as its attractor andthen propagates the probabilities until the set of attractors becomestable. Experiments on both synthetic data and real data show thatthe proposed method performs as expected.

Clustering Time Series: A PSLEX-Based Approach�Priya Kohli1, Nalini Ravishanker2 and Jane Harvill31Connecticut College2University of Connecticut3Baylor [email protected]

Time series clustering is common in various areas ranging from sci-ence,engineering, business, finance, economics, health care,to geo-physical studies. Considerable research has been carried out to ad-

dress the clustering of stationaryand linear time series. However,in most real situations the time series rarely satisfy the assump-tions of stationarity and/or linearity. In time series literature, thereis a scarcity of methods for nonstationary, nonlinear time series,and there are several open research questions with respect to theirstatistical and computational efficiency. In this work, we proposea clustering scheme based on the use of Polyspectral Smooth Lo-calized Complex Exponential (PSLEX) approach which can han-dle the challenges arising with nonstationarity and/or nonlinearityof the time series. The orthogonality property of the SLEX libraryof complex-valued orthogonal transforms facilitates the analysis ofhigh dimensional massive time series data due to its mathemati-cal elegance. We illustrate our approach using simulated time se-ries from several nonstationary and/or nonlinear models. We alsodemonstrate the use of proposed method to an interesting area offinance which has applications in portfolio evaluation and diversi-fication, identifying misclassified stocks, and quantifying the effectof trends to draw interesting conclusions about specific stocks.

Session 89: Recent Advances in Biostatistics

Promoting Similarity of Sparsity Structures in IntegrativeAnalysisShuangge MaYale [email protected]

For data with high-dimensional covariates but small to moderatesample sizes, the analysis of single datasets often generates unsat-isfactory results. The integrative analysis of multiple independentdatasets provides an effective way of pooling information and out-performs single-dataset analysis and some alternative multi-datasetsapproaches including meta-analysis. Under certain scenarios, mul-tiple datasets are expected to share common important covariates,that is, their models have similarity in sparsity structures. However,the existing methods do not have a mechanism to promote the sim-ilarity of sparsity structures in integrative analysis. In this study,we consider penalized variable selection and estimation in integra-tive analysis. We develop a penalization based approach, which isthe first to explicitly promote the similarity of sparsity structures.Computationally it is realized using a coordinate descent algorithm.Theoretically it has the much desired consistency properties. In sim-ulation, it significantly outperforms the competing alternative whenthe models in multiple datasets share common important covariates.It has better or similar performance as the alternative when there isno shared important covariate. Thus it provides a “safe” choice fordata analysis. Applying the proposed method to three lung cancerdatasets with gene expression measurements leads to models withsignificantly more similar sparsity structures and better predictionperformance.

Graphical Models and its Application in Genomics�Zhandong Liu1, Genevera Allen2 and Ying-Wooi Wan1

1Baylor College of Medicine2Rice [email protected]

Undirected graphical models, also known as Markov networks, en-joy popularity in a variety of applications. The popular instances ofthese models such as Gaussian Markov Random Fields (GMRFs),Ising models, and multinomial discrete models, however do not cap-ture the characteristics of data in many settings. We introduce anew class of graphical models based on generalized linear mod-els (GLMs) by assuming that node-wise conditional distributions

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 103

Page 112: Published by: International Chinese Statistical

Abstracts

arise from exponential families. Our models allow one to estimatemultivariate Markov networks given any univariate exponential dis-tribution, such as Poisson, negative binomial, and exponential, byfitting penalized GLMs to select the neighborhood for each node.When applied to Genomics data, we demonstrated that our modelscan capture important dependency structures that are undetectableby other traditional network models.

Threshold Regression with Censored Covariates�Jing Qian1, Folefac Atem2 and Rebecca Betensky21University of Massachusetts2Harvard [email protected]

The problem of censored covariates arises frequently in family his-tory studies, in which an outcome of interest is regressed on an ageof onset, as well as in cohort studies in which it may be necessaryto adjust for duration of disease. We develop new threshold regres-sion approaches for linear regression models with covariates subjectto random censoring. Compared with existing methods, the pro-posed methods are simple but effective as they avoid complicatedmodeling in dealing with censored covariate values. In addition toestimating the regression coefficient of the censored covariates, thethreshold regression methods can also be used to test whether theeffect of a censored covariate is significant. We discuss the choiceof optimal threshold which yields the most powerful test. The finitesample performance of the proposed methods are assessed throughextensive simulation studies. The methods are illustrated by analyz-ing a study in Alzheimer’s disease.

Jointly Analyzing Spatially Correlated Visual Field Data to De-tect Glaucoma Progression�Joshua Warren1, Jean-Claude Mwanza2, Angelo Tanna3 and Don-ald Budenz21Yale University2The University of North Carolina at Chapel Hill3Northwestern [email protected]

Glaucoma is a leading cause of irreversible blindness worldwide.Once a diagnosis is made, careful monitoring of the disease is re-quired to prevent vision loss. However, determining if the disease isprogressing remains the most difficult task in the clinical setting. Acommon method for detecting progression includes the analysis ofa time series of peripheral visual fields (VF) for a patient by expertclinicians. We introduce new methodology in the Bayesian settingin order to properly model the progression status of a patient (as de-termined by a group of expert clinicians) as a function of changesin spatially correlated sensitivities at each VF location jointly. Pastmodeling attempts include the analysis of global VF measures orthe separate analyses of sensitivities at individual VF locations overtime. The first set of methods ignores important spatial informationregarding the location of vision loss on the VF while the second setis inefficient and fails to account for spatial similarities in visionloss across the VF. Our spatial probit regression model jointly in-corporates all highly correlated VF changes in a single frameworkwhile accounting for structural similarities between neighboring VFregions. Results indicate that our method provides improved modelfit and predictions when compared to previously introduced mod-els. Additionally, the mapping of spatially referenced parametersacross the VF provides insight into the clinicians’ decision makingprocess. This model may be clinically useful for detecting the glau-coma progression status of an individual.

Session 90: Adaptive Designs and Personalized Medicine

Interpretable and Parsimonious Treatment Regimes Using De-cision ListsYichi Zhang, �Eric Laber, Anastasios Tsiatis and Marie DavidianNorth Carolina State [email protected]

A treatment regime formalizes personalized medicine as a func-tion from individual patient characteristics to a recommended treat-ment. A high-quality treatment regime can improve patient out-comes while reducing cost, resource consumption, and treatmentburden. Thus, there is tremendous interest in estimating treatmentregimes from observational and randomized studies. However, thedevelopment of treatment regimes for application in clinical prac-tice requires the long-term, joint effort of statisticians and clinicalscientists. In this collaborative process, the statistician must inte-grate clinical science into the statistical models underlying a treat-ment regime and the clinician must scrutinize the estimated treat-ment regime for scientific validity. To facilitate meaningful infor-mation exchange, it is important that estimated treatment regimesbe interpretable in a subject-matter context. We propose a simple,yet flexible class of treatment regimes whose members are repre-sentable as a short list of if-then statements. Regimes in this classare immediately interpretable and are therefore an appealing choicefor broad application in practice. We derive a robust estimator of theoptimal regime within this class and demonstrate its finite sampleperformance using simulation experiments. The proposed methodis illustrated with data from two clinical trials.

Regression Analysis for Cumulative Incidence Function underTwo-stage RandomizationIdil Yavuz1, �Yu Cheng2 and Abdus Wahed2

1Dokuz Eylul University2University of [email protected]

In this talk we focus on regression analysis under a two-stage ran-domization setting. Even though extensive research is being car-ried out by researchers on the regression problem for dynamic treat-ment regimes, few research have been done on modeling the cu-mulative incidence function (CIF) when a two-stage randomizationhas been carried out. We extend the multi-state, the Fine and Gray,and the Scheike et al. regression models for modeling the CIF ofdynamic treatment regimes and provide ways to implement the pro-posed models in R using the existing packages. We show the im-provement our methods provide by simulation.

Optimal, Two Stage, Adaptive Enrichment Designs for Ran-domized Trials, using Sparse Linear Programming�Michael Rosenblum1, Xingyuan (Ethan) Fang2 and Han Liu2

1Johns Hopkins University2Princeton [email protected]

Adaptive enrichment designs involve preplanned rules for modify-ing enrollment criteria based on accruing data in a randomized trial.These designs can be useful when it is suspected that treatment ef-fects may differ in certain subpopulations, such as those defined bya biomarker or risk factor at baseline. Two critical components ofadaptive enrichment designs are the decision rule for modifying en-rollment, and the multiple testing procedure. We provide a generalmethod for simultaneously optimizing both of these components fortwo stage, adaptive enrichment designs. The optimality criteria aredefined in terms of expected sample size and power, under the con-straint that the familywise Type I error rate is strongly controlled.

104 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 113: Published by: International Chinese Statistical

Abstracts

It is infeasible to directly solve this optimization problem since itis not convex. The key to our approach is a novel representation ofa discretized version of this optimization problem as a sparse lin-ear program. We apply advanced optimization tools to solve thisproblem to high accuracy, revealing new, optimal designs.

Session 91: Recent Developments of High-DimensionalData Inference and Its Applications

Segmenting Multiple Time Series by Contemporaneous LinearTransformation: PCA for Time Series�Jinyuan Chang1, Bin Guo2 and Qiwei Yao3

1University of Melbourne2Peking University3London School of [email protected] seek for a contemporaneous linear transformation for a p-variatetime series such that the transformed series is segmented into sev-eral lower-dimensional subseries, and those subseries are uncorre-lated with each other both contemporaneously and serially. Themethod may be viewed as an extension of principal component anal-ysis (PCA) for multiple time series. Technically it also boils downto an eigenanalysis for a positive definite matrix. When pis large,an additional step is required to perform a permutation in terms ofeither maximum cross-correlations or FDR based on multiple tests.The asymptotic theory is established for both fixed p and diverg-ing p when the sample size n tends to infinity. Numerical experi-ments with both simulated and real datasets indicate that the pro-posed method is an effective initial step in analysing multiple timeseries data, which leads to substantial dimension-reduction in mod-elling and forecasting high-dimensional linear dynamical structures.The method can also be adapted to segment multiple volatility pro-cesses.

Projection Test for High-Dimensional Mean Vectors with Opti-mal DirectionRunze Li1, �Yuan Huang1, Lan Wang2 and Chen Xu1

1Pennsylvania State University2University of [email protected] the population mean is fundamental in statistical infer-ence. When the dimensionality of a population is high, traditionalHotelling’s T 2 test becomes practically infeasible.In this paper, wepropose a new testing method for high-dimensional mean vectors.The new method projects the original sample to a lower-dimensionalspace and carries out a test with the projected sample. We derivethe theoretical optimal direction with which the projection test pos-sesses the best power under alternatives. We further propose an esti-mation procedure for the optimal direction, so that the resulting testis an exact t-test under the normality assumption and an asymptoticchi2-test with 1 degree of freedom without the normality assump-tion. Monte Carlo simulation studies show that the new test can bemuch more powerful than the existing methods, while it also wellretains Type I error rate.The promising performance of the new testis further illustrated in a real data example.

Thresholding Tests for Signal Detection on High-DimensionalCount Distributions�Yumou Qiu1, Songxi Chen2 and Dan Nettleton2

1University of Nebraska-Lincoln2Iowa State [email protected]

We consider the problem of detecting rare and faint signals in high-dimensional count data. This problem arises, for example, in theanalysis of RNA sequencing (RNA-seq) data to detect genes dif-ferentially expressed across multiple conditions. In this paper, weconsider the signal detection problem under generalized linear mod-els (GLMs) and their extensions, which include the linear model asa special case. Based on maximum likelihood estimators (MLEs), athresholding statistic with a single threshold level is proposed to testfor the existence of rare and faint signals. A Cramer type moderatedeviation result for multi-dimensional MLEs with non-identicallydistributed data is derived, which is the prerequisite to study theproperties of thresholding test statistics. For the case of linear re-gression, the detection boundary is determined, and it is shown thatthe proposed thresholding test can attain the boundary. A multi-threshold test is constructed by maximizing the standardized thresh-olding statistic over a set of thresholds. Extensions to generalizedlinear mixed models are made, where Gaussian-hermite quadratureand data cloning are used to approximate the MLEs of such mod-els. Numerical simulations and a case study on maize RNA-seqdata are conducted to confirm and demonstrate the proposed testingapproaches.

Projected Principal Component Analysis in Factor ModelsJianqing Fan1, Yuan Liao2 and �Weichen Wang1

1Princeton University2University of [email protected]

This paper introduces a Projected Principal Component Analysis(Projected-PCA), which is based on the projection of the data matrixonto a given linear space before performing the principal compo-nent analysis. When it applies to high-dimensional factor analysis,the projection removes idiosyncratic noisy components. We showthat the unobserved latent factors can be more accurately estimatedthan the conventional PCA if the projection is genuine, or more pre-cisely, when the factor loading matrices are related to the projectedlinear space, and that they can be estimated accurately when thedimensionality is large, even when the sample size is finite. In aneffort to more accurately estimating factor loadings, we propose aflexible semi-parametric factor model, which decomposes the factorloading matrix into the component that can be explained by subject-specific covariates and the orthogonal residual component. The co-variates effect on the factor loadings are further modeled by the ad-ditive model via sieve approximations. By using the newly pro-posed Projected-PCA, the rates of convergence of the smooth factorloading matrices are obtained, which are much faster than those ofthe conventional factor analysis. The convergence is achieved evenwhen the sample size is finite and is particularly appealing in thehigh-dimension-low-sample-size situation. This leads us to devel-oping nonparametric tests on whether observed covariates have ex-plaining powers on the loadings and whether they fully explain theloadings. Finally, the proposed method is illustrated by both simu-lated data and the returns of the components of the S&P 500 index.

Session 92: Issues in Probabilistic Models for RandomGraphs

Exponential-family Random Hypergraph Models for GroupRelations�Ryan Haunfelder, Haonan Wang and Bailey FosdickColorado State [email protected]

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 105

Page 114: Published by: International Chinese Statistical

Abstracts

Social network data is often recorded and studied as pairwise re-lationships. However, in many scenarios these relationships areoriginally observed at a group level, involving two or more actors.Representing the relations as dyadic associations is a misrepresen-tation of the data, which can affect inference and understanding ofthe social system under study. Hypergraphs are graph structureswhich allow relations that involve more than two actors. In this talkwe discuss extensions of common social network features such astransitivity, shared partners, shortest paths and centrality for hyper-graphs. We also introduce a probabilistic exponential random hy-pergraph model, which builds off the exponential-family randomgraph model (ERGM) for dyadic relations. We discuss proper-ties of our hypergraph model and describe how the Markov depen-dence structures have nice interpretations related to social theoriesof group dynamics.

Exponential-family Random Graph Models with Local Depen-denceMichael SchweinbergerRice [email protected]

Dependent phenomena, such as relational, spatial, and temporalphenomena, tend to be characterized by local dependence in thesense that units which are close in a well-defined sense are de-pendent.However, in contrast to spatial and temporal phenomena,relational phenomena tend to lack a natural neighborhood struc-ture in the sense that it is unknown which units are close andthus dependent. An additional complication is that the numberof observations is 1, which implies that the dependence structurecannot be recovered with high probability by using conventionalhigh-dimensional graphical models. Therefore, researchers haveassumed that the dependence structure has a known form. Thebest-known forms of dependence structure are inspired by the Isingmodel in statistical physics and Markov random fields in spatialstatistics and are known as Markov random graphs. However, ow-ing to the challenge of characterizing local dependence and con-structing random graph models with local dependence, conventionalexponential-family random graph models with Markov dependenceinduce strong dependence and are not amenable to statistical infer-ence.We take first steps to characterize local dependence in random graphmodels and show that local dependence endows random graph mod-els with desirable properties which make them amenable to statis-tical inference. We show that random graph models with local de-pendence satisfy a natural domain consistency condition which ev-ery model should satisfy, but conventional exponential-family ran-dom graph models do not satisfy. In addition, we discuss concen-tration of measure results which suggest that random graph modelswith local dependence place much mass in the interior of the sam-ple space, in contrast to conventional exponential-family randomgraph models. We discuss how random graph models with localdependence can be constructed by exploiting either observed or un-observed neighborhood structure. In the absence of observed neigh-borhood structure, we take a Bayesian view and express the uncer-tainty about the neighborhood structure by specifying a prior on aset of suitable neighborhood structures. We present simulation re-sults and applications to two real-world networks with ground truth.

Local Structure Graph Models with Higher-Order Dependence�Emily Casleton1, Mark Kaiser2 and Daniel Nordman2

1Los Alamos National Laboratory2Iowa State University

[email protected] Structure Graph Models (LSGMs) provide a Markov Ran-dom Field (MRF) modeling approach for random graphs, wherebyeach edge in the graph has a specified conditional distribution, i.e.,probability of edge occurrence, dependent on explicit neighbor-hoods, or subcollections of other graph edges, that define a con-ditional distribution. As a consequence of the conditional specifi-cation, LSGMs have the advantage of allowing direct control andseparate interpretation of parameters influencing large-scale (e.g.,marginal means) and small-scale (i.e., dependence) structures in agraph model. This is possible through centered parameterization ofMRF models, which are applied in LSGMs. However, current tech-nology for centered parameterizations in MRFs assumes pairwise-only dependence, i.e, dependence is modeled between pairs of ran-dom variables only. This creates limitations in specifying condi-tional distributions for graph edges in LSGMs. As a remedy, we ex-tend the centered parameterization for MRFs to account for triplesof dependent edges in LSGMs. We also explain and numericallyillustrate the importance of centered parameterizations when inter-preting model parameters and, using a MRF framework. Centeredparameterizations and their increased interpretation are particularlycrucial when attribute/covariate information is included in a graphmodel. This work advances the modeling of graph data in severalimportant ways related to conditional model specifications, state-of-the-art parameterizations and inclusions of higher-order depen-dence, and appropriate model incorporation of covariates.

Session 93: Negotiation Skills Critical for Statistical Ca-reer Development

Negotiation Skills Critical for Statistical Career Development�Ivan Chan1, �Mary Gary2, �Susan Murphy3 and �Wei Shen4

1Merck & Co2American University3University of Michigan4Eli Lilly and Companyivan [email protected]; [email protected];[email protected]; shen wei [email protected]

With the growing diversity and topics in the statistical field, careerdevelopment becomes an important topic. According to J. E. Millerand J. Miller (2011), three keys to successful negotiation are as fol-lows: (1) be confident, (2) be prepared, and (3) be willing to walkaway. In this invited panel discussion session, a group of distin-guished statisticians from three different sectors, academia, indus-try and government, will address key questions regarding how tobetter negotiate in one’s statistical career. This panel consists ofesteemed statisticians and leaders from various sectors of the sta-tistical profession. Specifically, the panelists will provide their per-sonal experience and guidance on successful negotiations based ontheir own career journeys and advancements. For example, theywill discuss the issues arising from initial position, mid-career roles,and in leadership positions. Throughout the entire paths of theirdistinguished careers in different career sectors, and inevitably theassociated struggles, the panelists will give their suggestions andrecommendations on improving one’s negotiation skills for atten-dees who are post-graduate statisticians, faculty members, researchstatisticians, practicing consultants, as well as those in leadershippositions in their institutions or in professional associations includ-ing the International Chinese Statistical Association. This sessionwill have a wide appeal to many junior, mid-career and senior atten-dees regardless of gender and career tracks. It will be particularly

106 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 115: Published by: International Chinese Statistical

Abstracts

beneficial to those junior statisticians who are encouraged to askimportant questions such as: To negotiate or not to negotiate? andHow to negotiate effectively?

Session C01: Disease Models, Observational Studies, andHigh Dimensional Regression

Evaluate the Most Accurate Animal Model With Application toPediatric Medulloblastoma�Lan Gao1, Behrouz Shamsaei2 and Stan Pounds31University of Tennessee at Chattanooga2The University of Tennessee at Chattanooga3St. Jude Children’s Research [email protected] models of human disease are commonly utilized to gain pre-clinical insight into the potential efficacy and action mode of noveldrugs. The development and selection of an animal model that ac-curately mimics the human disease profoundly reduces the researchtimeline and resources needed to make meaningful advances in thetreatment and prevention of the human disease under study. Here,we propose a statistical procedure to select the animal model thatmost accurately mimics the human disease in terms of genome-widegene expression. Our procedure is designed for studies that havegene expression profiles for a cohort of human disease tissue spec-imens from different subjects and gene expression profiles for co-horts of disease tissue specimens for each of several animal models.First, we define and compute a metric of similarity between eachhuman gene expression profile and animal gene expression profilewhich result in multiple groups of similarities. Then a random blockANOVA model is used to compare the group means of similaritiesbetween different animal models. Finally post-hot multiple com-parison is applied to seek the “best” animal model of the humandisease. The advantages of the proposed method are observed insimulation studies and a real example of pediatric Medulloblastoma.

A Generalized Mover-Stayer Model for Disease Progressionswith Death in Consideration of Age at the Study Entry�Yi-Ran Lin1, Wei-Hsiung Chao2 and Chen-Hsin Chen1,3

1Institute of Statistical Science, Academia Sinica2National Dong Hwa University3 National Taiwan [email protected] models have been widely used in assessing the dynamicdisease progression under investigation. In some scenarios, a frac-tion of the population may be risk free for disease progression. Amover-stayer model is used to fit longitudinal data based on a mix-ture combining the sub-population of “movers” (for those who un-dergo a specific disease progression) with the sub-population of“stayers” (for those who do not). Conventional statistical litera-ture of mover-stayer models deal with the stayers only in the ini-tial state. We generalize it to a mover-stayer model, via viewingas a mixture of finite Markov models, by allowing study subjectsto have various disease progressions with probabilities of staying insome subsequent state before death. The maximum likelihood es-timation procedure is implemented with the Fisher scoring methodand some relevant diagnostic tools are developed for model check-ing. Using the longitudinal follow-up data from the REVEAL-HBVstudy which is a community-based cohort study carried out in seventownships of Taiwan, we pursue the risk evaluation of viral loadelevation and associated liver disease/cancer with hepatitis B virus(HBV). A six-state mover-stayer model taking account of subjects’

different ages at study entry is presented to analyze the multi-pathprogression from chronic hepatitis B to hepatocellular carcinoma,possibly via cirrhosis, and ending with HBV-related death or non-HBV-related death. The proposed regression analysis method canalso be applied to the analysis and interpretation for studying otherdiseases in research of epidemiology and biobanks.

Improving Cancer Mortality Rate Estimation UsingPopulation-specific Structure in Direct Age-standardization�Beverly Fu1 and Wenjiang Fu2

1Okemos High School2University of [email protected]

Age-standardization is a popular statistical procedure in comparingcancer mortality rate among different populations or estimating therate of a population across periods of time. It summarizes the age-specific mortality rates of each population through a weighted aver-age using a common population age structure as reference weightsand makes the summary rates comparable across different age struc-ture. Although such practice has been employed in demography andpublic health studies for more than a century, the practice of select-ing a standard population structure as the reference, such as the USyear 2000 population, has been shown to be lack of theoretical jus-tification, leading to a series of problems. In this study, we exam-ine cancer mortality rate of given US populations by sex and racialgroup. We found that although age-standardization is necessary incomparing mortality rate in different periods of a given population,taking the age structure of the given population in the year 2000 asreference largely corrects the bias introduced by using the US year2000 population as reference, leading to improved accuracy in esti-mating the mortality rate and its trend of a given population acrossperiods.

An Augmented ADMM Algorithm for Linearly RegularizedStatistical Estimation ProblemsYunzhang ZhuThe Ohio State [email protected]

We present a fast and stable algorithm for solving a class of lin-early regularized statistical estimation problem. This type of prob-lems arises in many statistical estimation procedures, such as high-dimensional linear regression with fused lasso regularization, con-vex clustering, and trend filtering, among others. We propose aso-called augmented alternating direction methods of multipliers(ADMM) algorithm to solve this class of problems. As compared tothe standard ADMM algorithm, our proposal significantly reducesthe amount of computation at each iteration, while maintaining thesame overall rate of convergence. We demonstrate the superiorperformance of the augmented ADMM algorithm on a generalizedlasso problem. We also consider a new acceleration scheme for theADMM algorithm, which works quite well in practice, especiallywhen solving a sequence of similar problems. Finally, we discussa possible extension and some interesting connections to two well-known algorithms in imaging literature.

Public Health Impacts Following the World Trade Center At-tacks of September 11th 2001; Statistical analyses of data fromresidents of lower Manhattan, New York�L. laszlo Pallos, Vinicius Antao, Jay Sapp and Youn ShimAgency for Toxic Substances and Disease [email protected]

Public Health Impacts Following the World Trade Center Attacks ofSeptember 11th 2001:Statistical analyses of data from residents of

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 107

Page 116: Published by: International Chinese Statistical

Abstracts

lower Manhattan, New York L. Laszlo Pallos, Vinicius C Antao, JaySapp, Youn Shim Environmental Health Surveillance Branch Divi-sion of Toxicology and Human Health Sciences Agency for ToxicSubstances and Disease RegistryIntroduction In the aftermath of the World Trade Center attacks ofSeptember 11th 2011, many tens of thousands of people, aside fromoccupants of the Twin Towers, were exposed to dust and debriswhen the Towers collapsed. Owing to concerns of potential long-term health impacts, a large-scale longitudinal health registry wasestablished with a planned duration of at least twenty years. In thispresent endeavor, we analyzed health effect impacts on residents oflower Manhattan.Data: The World Trade Center Health Registry (WTCHR) was ini-tially established by means of Agency for Toxic Substances and Dis-ease Registry (ATSDR) funds. The Registry consists of a baselineor wave 1 (2003-2004), first adult follow-up or wave 2 (2007) and athird wave (2011+). At the time these analyses were developed andconducted, wave 3 data was not yet available.) The Registry was im-plemented by New York City Department of Health and Mental hy-giene and the data is administered by them. For this study, we haveobtained wave 1 and wave 2 of the survey data. Wave 1 consists of71,437 registrants of whom 14,665 were residents of lower Manhat-tan. Wave 2 consists of 46,602 registrants, all of whom had to haveparticipated in wave 1. Of these, N=7,219 were residents of lowerManhattan and constituted the observations used in these analyses.As our focus was persistent and long term effects, we included thoseregistrants who were residents of lower Manhattan and had com-pleted both waves. The case definitions of the six health outcomesinvolve several variables with differing time aspects; however, ina nutshell, by persistent we mean being identified in both wave 1and wave 2. Seven data points which were clearly ‘outliers’ wereinitially removed. An additional 24 data points were removed sincethey were coded as being in Census blocks with zero population andwere thus judged as not good data. We point out that the Registryremains supported by NIOSH.Methods: Our focus was to study possible associations between in-door exposures (e.g., presence of dust or debris, cleaning practices,and replacement of damaged household items) and new or worsenedrespiratory symptoms and diseases (persistent shortness of breath,persistent wheezing, persistent chronic cough, persistent upper res-piratory symptoms, asthma, and chronic obstructive pulmonary dis-ease or COPD). Logistic regression was used to identify key de-mographic and exposure (explanatory) variables which potentiallyimpact and explain the self-reported health effects observed in thesurveys. The Registry contains hundreds of variables, and the poolof several dozens of potential exposure and demographic variableswere reduced by logical considerations to 21 for statistical modelbuilding (stepwise selection). We performed multivariate logisticregression analyses, controlling for demographics, smoking status,and exposure to the outdoor cloud of dust and debris that was gener-ated by the collapse of the WTC Towers. As the number of potentialexplanatory variables was rather large, stepwise model building wasused to find the parsimonious models which captured the relevantinformation content of the data.Results: We found significant (p=0.05) Odds Ratios (ORs) – foreach of the six outcomes we examined – for age (ranging 1.11-1.33per decade of age), for avoiding dust cloud exposure (0.32-0.73),and for priority group (0.36-0.58 versus the lowest number, or thegroup closest to ground zero). Other factors significantly affect-ing some but not all of the health outcomes (ORs different from1.0) were as follows: Having ever smoked (1.19-1.45); race (1.54-

2.35 compared with whites); sex (1.36-1.82, females comparedto males); education (0.41-0.61 compared to not finishing highschool); income (0.50-0.70 compared to the lowest income); expo-sures in the home such as debris, damage, fine dust, or heavy dust(1.36-1.82); cleaning behavior (1.31-1.65 elevated OR for thosehaving dusted or mopped or vacuumed); and replacement of house-hold items such as carpeting, air-conditioning, drapes, or furniture(1.23-1.70 elevated ORs for having replaced various items).Conclusions Although these are all self-reported data, this analysisindicates that Lower Manhattan residents who suffered home dam-age and other exposures in their homes following the 9/11 attacksare more likely to report new or worsened persistent respiratorysymptoms and diseases in the WTCHR.

Estimation of Discrete Survival Function through the Modelingof Diagnostic Accuracy for Mismeasured Outcome Data�Hee-Koung Joeng1, Abidemi k. Adeniji2, Naitee Ting2 and Ming-Hui Chen1

1University of Connecticut2Boehringer-Ingelheim Pharmaceuticals [email protected]

Standard survival methods are inappropriate for mismeasured out-comes. Previous research has shown that outcome misclassificationcan bias estimation of the survival function. We develop methods toaccurately estimate the survival function when the diagnostic toolused to measure the outcome of disease is not perfectly sensitiveand specific. Since the diagnostic tool used to measure disease out-come is not the gold standard, the true or error-free outcomes arelatent, they cannot be observed. Our method uses the negative pre-dictive value (NPV) and the positive predictive values (PPV) of thediagnostic tool to construct a bridge between the error-prone out-comes and the true outcomes. We formulate an exact relationshipbetween the true (latent) survival function and the observed (error-prone) survival function as a formulation of time-varying NPV andPPV. We specify models for the NPV and PPV that depend onlyon parameters that can be easily estimated from a fraction of theobserved data. Furthermore, we conduct an in depth study to accu-rately estimate the latent survival function based on the assumptionthat the biology that underlies the disease process follows a stochas-tic process. We further examine the performance of our method byapplying it to the VIRAHEP-C data.

Session C02: Design and Analysis of Clinical Trials

Sample Size Re-Estimate of BE Studies with Adaptive DesignPeng Roger QuPfizer China R&D [email protected]

Adequate sample size is essential to the success of clinical trials.Within the paradigm of adaptive design, sample size re-estimate(SSRE) is relatively mature and may have seen the most applica-tions with adaptive design. Most of the SSRE methodology is un-der hypothesis testing frame work. This presentation will focus onthe SSRE for bioequivalence trials, built upon the repeated confi-dence interval of group sequential design. Closed form of samplesize determination based on conditional power of final analysis isderived which ensures the desired power. Hybrid version suitable tothe practical consideration is recommended, with simulation resultsdemonstrating the desired operating characteristics.

Sequential Phase II Clinical Trial Design for Molecularly Tar-

108 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 117: Published by: International Chinese Statistical

Abstracts

geted Agents�Yong Zang1 and Ying Yuan2

1Florida Atlantic University2M. D. Anderson Cancer [email protected] the early phase development of molecularly targeted agents(MTAs) for targeted therapy, a commonly encountered situationis that the MTA is expected to be more effective for a certainbiomarker subgroup, say marker-positive patients, but there is noadequate evidence to preclude that the MTA does not work for theother subgroup, i.e., marker-negative patients. After establishingthat marker-positive patients benefit from the treatment, it is oftenof great clinical interest to determine whether the treatment benefitextends to marker-negative patients. We propose multi-stage opti-mal sequential trial (MOST) designs to address this practical issuein the context of phase II clinical trials. The MOST designs evalu-ate the treatment effect first in marker-positive patients and then inmarker-negative patients if needed. The designs are optimal in thesense that they minimize the expected sample size or the maximumexpected sample size when the MTA is futile for both the marker-positive patients and marker-negative patients. We proposed an ef-ficient, accurate optimization algorithm to find the optimal designparameters.

On Sensitivity Analysis for Missing Data using Control-basedImputationFrank LiuMerck & Co.guanghan frank [email protected] imputation (CBI) methods have been proposed assensitivity analyses for longitudinal clinical trials with missing data.The CBI methods multiply impute the missing data in treatmentgroup based on an imputation model built from the control groupdata. This will yield a conservative treatment effect estimate com-pared to multiple imputation (MI) under missing at random (MAR).However, the CBI analysis based on regular MI approach can beoverly conservative because it not only applies discount to treatmenteffect estimate but also posts penalty on the variance estimate. Inthis talk, we will investigate the statistical properties of CBI meth-ods, and propose approaches to get accurate variance estimates us-ing both frequentist and Bayesian methods. Simulation studies un-der various missing data mechanism are conducted to illustrate thestatistical properties and performance of the methods.

Choosing Covariates for Adjustment in Non-Inferiority TrialsBased on Influence and Disparity�Katherine Nicholas, Viswanathan Ramakrishnan and ValerieDurkalskiMedical University of South [email protected] has been shown that type I error is inflated when important co-variates are excluded from a non-inferiority analysis (Nicholas etal, 2014). Traditionally, whether or not to adjust for a covariate in amodel is based solely on statistical significance or some other crite-ria such as AIC that relates to the magnitude of the effect. In addi-tion, one may also check for colinearity with other covariates (usingVIF for example) or perform tests of baseline imbalance. However,several authors suggest that these aspects should be considered si-multaneously. For example, Canner et. al. (1991) developed astatistic to determine the relative importance of including a covariatein a model based on both its effect on the outcome (which he callsinfluence)and its association with treatment (which he calls dispar-ity). Although Canner et. al.’s approach is under the null, Beach

et. al. (1989) extended this to non-null treatment effects in the con-text of linear regression. The current research seeks to combine themethods of Canner et. al for binary outcomes with the methods ofBeach et. al for non-null treatment effect in order to quantify therelative importance of including covariates in a non-inferiority trialwith a binary outcome. Theoretical results are presented and appliedvia simulation, followed by practical application.

Statistical Assessment for Establishing Biosimilarity in Follow-On Biological Product�Jung-Tzu Liu1,2, Hsiao-Hui Tsou2,3, Chin-Fu Hsiao2, Yi-HsuanLai4, Chi-Tian Chen2, Wan-Jung Chang2 and Chyng-Shyan Tzeng1

1National Tsing Hua University2National Health Research Institutes 3China Medical University4Foxconn International [email protected]

Various biological drugs will lose patent protection for upcomingfew years, such as Avastin for metastatic colon cancer, Remicade forrheumatoid arthritis, and Herceptin for breast cancer. The expensivebiological agent could be replaced with affordable cost follow-onbiologics (biosimilar products). Biosimilar development is a com-parison of a complex biosimilar and the approved biologic agents(reference products). However, there may exist the structural differ-ences and functional differences between the two products, unlikeresemble of generic drugs. There is enthusiasm that to establishan invented statistical approach for evaluating the biosimilarity be-tween a biosimilar products and a reference product. This presen-tation considers a complex design including K minus 1 referencegroups, and an experimental biosimilar product. We propose a con-fidence interval approach to determine if observed treatment effectare within an acceptable range for claiming consistency (highly sim-ilarity) of a primary treatment effect between two biological prod-ucts. The determination of the sample size is considered to ensurethat the similarity is maintained at a desired power level, say 80or 90%. Accordingly, a simulation result of power for claimingbiosimilarity is also given. The proposed confidence interval ap-proach and the general moment-based criterion are compared nu-merically. A real example is given for illustrating the applications ofthe proposed approach in the biosimilarity assessment between a re-combinant human growth hormone (rhGH) drug and a new biosim-ilar product.

Session C03: Functional Data, Semi-parametric andNon-parametric Methods

An Unbiased Measure of Integrated Volatility in the FrequencyDomainFangfang WangUniversity of Illinois at [email protected]

We propose an unbiased measure of ex-post price variation in thefrequency domain. It is periodogram-based. When intraday pricesare contaminated by market microstructure noise, the proposed esti-mator behaves like a filter: it removes the noise by filtering out highfrequency periodograms. In other words, the proposed estimatorconverts the high frequency data into low frequency periodograms.We show, via a simulation study and an application to Microsofttransaction prices, that the proposed estimator is insensitive to thechoice of sampling frequency and it is competitive with other exist-ing noise-corrected volatility measures.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 109

Page 118: Published by: International Chinese Statistical

Abstracts

Functional data analysis for density functions by transforma-tion to a Hilbert space�Alexander Petersen and Hans-Georg MullerUniversity of California, [email protected] data that are non-negative and have a constrained inte-gral can be considered as samples of one-dimensional density func-tions or derivatives of distributions. Such data are ubiquitous. Dueto the inherent constraints, densities do not live in a vector space andtherefore common Hilbert space based methods of functional dataanalysis are not applicable. To address this problem, we introduce atransformation approach, mapping probability densities to a Hilbertspace of functions through a continuous and invertible map. Com-mon methods of functional data analysis, such as the constructionof functional modes of variation, functional regression or classifi-cation, are then implemented by using representations of the densi-ties in this linear space. Representations of the densities themselvesare obtained by an application of the inverse map from the linearfunctional space to the density space. Transformations of interestinclude log quantile density and log hazard transformations, amongothers. Rates of convergence are derived for the representations thatare obtained for a general class of transformations that satisfy cer-tain structural properties. If the subject-specific densities need tobe estimated from data, these rates correspond to the optimal ratesof convergence for density estimation. The proposed methods areillustrated through simulations and applications in brain imaging.

Cross-covariance Functions for Divergence-free and Curl-freeTangent Vector Fields on the Sphere�Minjie Fan1 and Tomoko Matsuo2

1University of California, Davis2University of Colorado at [email protected] this paper, we introduce valid parametric models of cross-covariance functions for divergence-free and curl-free tangent vec-tor fields on the sphere. They are constructed by applying the sur-face curl or the surface gradient operator to a univariate sufficientlysmooth random field in the quadratic mean sense. Based on thecelebrated Helmholtz-Hodge decomposition, we further propose aflexible parametric model for general tangent vector fields on thesphere called Mixed Matern. It has a close connection with the non-stationary covariance models through differential operators (Jun andStein (2008); Jun (2011)) and thus fast likelihood evaluation is avail-able for large datasets when the observations are on a regular grid.The application of the Mixed Matern model is illustrated by anocean surface wind dataset called QuikSCAT. The results show thatsome important characteristics of the data are captured by our pro-posed model.

Empirical Likelihood-based Inference for Linear Componentsin Partially Linear ModelsHaiyan SuMontclair State [email protected] propose an empirical likelihood (EL)-based inference for thelinear component coefficient in partially linear models and partiallylinear mixed-effect models. The proposed method combines theprojection method with the EL method. The project method is usedto remove the nuisance parameter in the model and then EL methodis used to construct confidence intervals for the linear component.Bartlett correction method is used to correct the EL-based confi-dence intervals. The test statistic is shown to follow regular chi-square distribution asymptotically. The numerical performance of

the method under normal and non-normal error terms is evaluatedthrough simulation studies and a real data example.

Consistency of Bayesian Semiparametric Models through JointDensity EstimationYuefeng WuUniversity of Missouri-St. [email protected]

The studies on the consistency of the Bayesian semi-parametricmodels are limited to the models that either have the priors on theparametric and nonparametric parts separately or model the para-metric part by some smooth or linear functionals on the space ofthe density functions of the observable random variables. A groupof Bayesian semi-parametric models fall in neither of the two, e.g.,a Bayesian ordinal regression based on Bayesian nonparametric es-timation for the joint probability of latent responses and observedcovariates. These models are excellent in both the flexibility andthe interpretation power, and getting more and more popular. Theconsistency of them is obtained by showing the L1 consistency ofthe joint density estimation and the smoothness of the correspond-ing functionals from the density function space to the parametricspace. Due to the latent variables, new technique is necessary andhas been developed to show the L1 consistency.

Analysis of Water Quality in New Jersey�Kaitlyn Scrudato and Haiyan SuMontclair State [email protected]

To model the quality of the water at any given time with availablepredictors, data from bodies of water across New Jersey from 1999to 2013 was collected from the database STORET. The water qual-ity parameters studied were Escherichia coli (E. coli) and entero-coccus with the predictors as Dissolved oxygen (DO), pH, Salinity,Temperature, Total Dissolved Solids (TDS) and Total SuspendedSolids (TSS). Multiple linear regression was fitted first but didn’tfit the data well. Logistic regression models indicated that the oddsof having unsafe water (having more than 35 cfu of enterococcus)for salt water is 0.176705 times the odds of fresh water when allother values are held constant. To improve the poor fit of the mul-tiple regression models, the lasso regression method was also usedto model the data. The lasso method concluded that DO, TDS andTSS were significant to predict the amount of E. coli. Where asfor enterococcus, the lasso method concluded that DO, temperatureand TSS were significant in prediction. For both enterococcus andE. coli, DO had a negative relationship with the amount of bacteriain the water.

Session C04: Multiple Comparisons, Meta-analysis, andMismeasured Outcome Data

Generalized Holm’s Procedure for Multiple Testing ProblemHuajiang Li1, Yi Ma2 and �Hong Zhou3

1Allergan, INC.2Quintiles, Inc.3Arkansas State [email protected]

Holm’s procedure is a stepwise multiple testing procedure that canreject only one null hypothesis at each step. A generalized Holm’sprocedure is proposed in this article. It has been proven that thisnew procedure has the ability to reject several null hypotheses ateach step sequentially and also strongly controls the familywise er-ror rate regardless of the dependence of individual test statistics. An

110 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 119: Published by: International Chinese Statistical

Abstracts

example in the clinical trial is illustrated using the newly proposedprocedure.

Generalized Confidence Interval Approach for Combining Mul-tiple Comparisons�Atiar Rahman and Ram TiwariU.S. Food and Drug [email protected]: We introduce a generalized confidence approach for com-bining multiple comparisons using techniques originally proposedby Weerhandi (1993) with application to long term animal carcino-genicity studies. We describe estimation methods of (1-alpha)%generalized confidence interval. This method controls the overallfalse positive rate.

Considerations for Two Correlated Cochran-Armitage TrendTests�Yihan Li, Su Chen, Ying Zhang and Yijie ZhouAbbVie [email protected] Phase 2 clinical trials, it is often of interest to investigate the re-lationship between the increasing dosage and the effect of the drug.The Cochran-Armitage trend test (Cochran, 1954; Armitage, 1955)is one of the most frequently used methods to study the underly-ing trends for binary endpoints. In some cases, more than one doseresponse relationship is studied within one trial. For example, aphase 2 dosing finding trial that includes both BID and QD regi-mens (with the same total daily doses) that share a common placebocontrol arm. Cochran-Armitage trend tests can be used to test forthe dose response relationships for BID and QD regimens respec-tively, resulting in two correlated trend tests. We derived the jointdistribution of the two trend test statistics under both the null and thealternative hypotheses. We then investigated the impact of the cor-relation on the type I error as well as the power. Simulation studieswere conducted to verify the theoretical results.

Goodness-of-fit Test for Meta-analysis�Zhongxue Chen1, Guoyi Zhang2 and Jing li11Indiana University2University of New [email protected] is a very useful tool to combine information fromdifferent sources. Fixed effect and random effect models are widelyused in meta-analysis. Despite their popularity, they may give usmisleading results if the models don’t fit the data but are blindlyused. Therefore, like any statistical analysis, checking the model fit-ting is an important step. However, in practice, the goodness-of-fitin meta-analysis is rarely discussed. In this paper, we propose sometests to check the goodness-of-fit for the fixed and random effectmodels in meta-analysis. Through simulation study, we show thatthe proposed tests control type I error rate very well. To demon-strate the usefulness of the proposed tests, we also apply them tosome real data sets. Our study shows that the proposed tests areuseful tools in checking the goodness-of-fit of the models used inmeta-analysis.

Pitfalls in Assessing Relative Efficacy Across TrialsXiao SunMerck & Co.xiao [email protected] it is well known that the gold standard for assessingrelative efficacy of treatments A vs. B is by way of head-to-head comparison in a randomized controlled trial (RCT), there isa widespread use of cross-trial comparisons in HIV and oncology

when the only available data are from two independent trials A vs.C and B vs. C. A synthesis method is used to assess the relativeefficacy of treatment arms A vs. B through the common referencearm C. The synthesized across-trial is observational in nature andsubject to pitfalls of various confounding factors. In this presenta-tion, we will show even the two trials seemed very similar in termsof baseline prognostic factors; the missing data actually introducedbias and made the comparison invalid. Therefore, caution shouldbe exercised when interpreting results from cross-trial comparisons.Cross-trial comparisons may have some role in hypothesis generat-ing such as identifying promising treatments for further investiga-tion, RCTs are still essential to make important clinical decisions.

Session P01: Poster Session

Correction for Confounding Effect in Random Forests Analysis�Yang Zhao and Donghua LouNanjing Medical [email protected]

Random Forests (RF) is an emsemble machine learning method,which is a powerful tool in analyzing high dimensional data. Itcan be used to screen for risk factors and build predictive mod-els.We found that RF may produce inacurate result if the datasetincludes variables with confounding effect. Failing to remove theconfouding effect may produce spurious association. We proposeto correct for the confounding effect by using a residual basedmethod.Simulations demonstrate that the proposed method can im-prove the probability that the causal factor to be identified. We alsoprovide an example on genome-wide assoication studies to illustratethe application of the proposed method.

Strategies of Genetic Risk Prediction with Lung Cancer GWASData�Donghua Lou, Weiwei Duan, Zhibin Hu and Feng ChenNanjing Medical [email protected]

Objective To investigate the performance of 3 genetic risk predic-tions methods–weighted genetic risk score(wGRS), support vectormachine(SVM) and random forest(RF)–applied to high dimensionaldata of lung cancer with two strategies. Methods This study servedNanjing and Beijing samples of GWAS data as training set and test-ing set respectively. We made use of the two strategies of Fullpredictive subset”(FS) and “Best predictive subset”(BS) and com-pared the prediction accuracy within the three methods mentionedabove with the value combination of Linkage Disequilibrium(LD)and hypothesis testing levels(?).Results Under a high LD structure,the prediction accuracy of wGRS was on the rise with the increasing-log(?). RF and SVM are not sensitive to LD structures as wGRS,but the predictive accuracy of each method applied with a low LDstructure(r2¡0.2) was mainly better than itself with a high LD struc-ture. Moreover, BS were slightly better than, approximately equal toor tiny less than and worse than FS when the methods were respec-tively wGRS, SVM and RF. Conclusion The prediction accuracycan be improved with the condition of LD-pruning and adopting aproper ?-value, meanwhile, wGRS is better than SVM and RF inthat condition.

Hierarchical Model for Genome-wide Association Study�Honggang Yi, Hongmei Wo, Yang Zhao, Ruyang Zhang, JunchenDai, Guangfu Jin and Hongxia MaNanjing Medical [email protected]

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 111

Page 120: Published by: International Chinese Statistical

Abstracts

With the rapid development of high-throughput genotyping tech-nologies in recently years, genome-wide association study (GWAS)has emerged as one of the most important tools for identifying ge-netic variants involved in complex diseases. Although this has im-proved our understanding of genetic basis of these complex diseasesand trait, there are still many analytic challenges in GWAS. Mostexisting methods for GWAS are single-locus-based approaches, inwhich each variant is tested individually for association with a spe-cific phenotype in the whole genome-wide. However, such a single-locus-based analysis strategy of GWAS has many limitations. Thereare many statistical challenges for GWAS, such as how to incorpo-rate biological information into a GWAS and how to mine fromGWAS data for getting more information, and so on. It is veryapparent that new strategies and methods are urgently needed forGWAS.Here we proposed a hierarchical model GWAS strategy withthe inclusion of prior biological information and applicated it in areal GWAS data. With the help of computer simulations, the sta-tistical properties and the effectiveness for actual GWAS data wereevaluated from application’s point of view, and the research detailswere as follows: In Section 1, two simulated studies were con-ducted based on the prior biological information which simulatedby binomial distributions and the results of gene function classifi-cation from a real GWAS data, respectively. Base on the two sim-ulation studies, the effects of different prior biological informationfor hierarchical model (HM) were evaluated thoroughly. The re-sults showed that both hierarchical model and logistic regression(LR) model are less powerful and perform similarly when OR equalto or less than 1.1 at the GWAS significance level of 1E-5 and 1E-7. However, HM always performs powerful than LR when the ORgreat than 1.1. In Section 2, three simulation studies were imple-mented to explore the effect of applying HM when incomplete in-formation, additional noisy information and uninformative informa-tion were included, respectively. The results showed as follows: Thetrue relevant biological information have a major impact on the per-formance of HM. If the true relevant biological information wereincluded in HM, even though other incomplete information or unin-formative information were also included, the power of HM alwaysgreater than LR’s. On the contrary, HM lost more power than LRwithout true relevant biological information. The results of the areaunder the ROC curve for HM had the similar conclusions. In Sec-tion 3, the HM GWAS strategy was applied in a real GWAS data oflung cancer in Chinese Han populations.

A Review of Nonparametric Methods for Testing Isotropy inSpatial Data�Zachary Weller and Jennifer HoetingColorado State [email protected] of the most important aspects of modeling spatial data is appro-priately specifying the second order properties of the random field.A practitioner working with spatial data is presented a number ofchoices regarding the structure of the dependence between observa-tions. One of these choices is determining whether or not the covari-ance function is isotropic. Misspecification of isotropy propertiescould lead to misleading inferences, such as inaccurate predictionsand parameter estimates. In a fashion similar to checking assump-tions for simple linear regression by looking at residual plots, a re-searcher may use graphical diagnostics, such as directional samplevariograms, to decide whether the assumption of isotropy is reason-able. These graphical techniques can be difficult to assess, open tosubjective interpretations, and misleading. An objective hypothesistests of the assumption of isotropy may be more desirable. To this

end, a number of tests of isotropy have been developed using boththe spatial and spectral representations of random fields. We providean overview of nonparametric methods used to test the hypothesesof isotropy and symmetry in spatial data. We include a summaryof key test properties, give insights on important considerations inchoosing and implementing a test, and provide a brief simulationstudy comparing some of the methods.

A Bayes Testing Approach to Metagenomic Profiling in Bacteria�Camilo Valdes1, Bertrand Clarke2, Adrian Dobra3 and JenniferClarke21University of Miami2University of Nebraska-Lincoln3University of [email protected]

Using Next-Generation Sequencing (NGS) data, we use a multi-nomial with a Dirichclet prior to detect the presence of bacterialgenomes in Metagenomic samples via marginal Bayes testing forbacterial strains in a reference database. The NGS data (sequencingreads) per strain are counted fractionally, with each sequencing readcontributing an equal amount to each strain that it might represent.The threshold for detection is strain-dependent, and we apply a cor-rection for the dependence amongst the sequencing reads by findingthe knee in a curve representing a tradeoff between detecting toomany strains, and not enough strains. As a check, we evaluate thejoint posterior probabilities for the presence of two strains in bacte-ria, and find relatively little dependence. We apply our techniquesto two human metagenomic data sets, and compare our results withthe results found by the Human Microbiome Project (HMP).

A Dynamical Model for Networks of Neuron Spike Trains�Hongyu Tan, Phillip Chapman and Haonan WangColorado State [email protected]

Recurrent event data arise in fields such as medicine, business andsocial sciences. In general, there are two types of recurrent eventdata. One is from a relatively large number of processes exhibitinga relatively small number of recurrent events, and the other is froma relatively small number of processes generating a large number ofevents. Many statistical models and methods have been developedto analyze the first type of data, but few approaches are available forthe second one. We focus on situations in which one process gen-erates a large number of events over the observational period.Ourmotivating application is a collection of neuron spike trains froma rat brain, recorded during performance of a task. The goal is tomodel the relationship between a response spike train and a set ofpredictor spike trains, as well as the spike history of the responseitself. We propose a multiplicative intensity model, based on mod-ulated renewal processes, for a single realization from the responsespike train. The model includes time-dependent neural spike histo-ries and extrinsic variables. The impact strengths of the functionalpredictors are modeled by coefficient functions that could be ap-proximated by B-spline basis functions. Spareness of the estimatedcoefficient functions is achieved by using the penalized partial like-lihood principle. Performance of the proposed method is demon-strated through simulation and real data analysis.

Bio-insecticidal Effects of Two Plant Extracts (Marru-bium Vulgare Artemisia Herba-alba) on Culex Pipiens(Diptera:Culicidae) under Laboratory ConditionsAmel Aouati1 and �Selima Berchi21Universit’e Constantine 32Ecole Nationale Superieure de Biotechnologie

112 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 121: Published by: International Chinese Statistical

Abstracts

[email protected] extracts of Marrubium vulgare and Artemisia herba alba weretested against 4th instar larvae of the mosquito Culex pipiens L. Theobtained results indicated a sensitivity of Culex pipiens larvae forplant species aroused. This sensitivity is even higher when expo-sure of the larvae to insecticides is extended in time. Among theextracts used Artemisia herba-alba generates the greatest mortalityrate 94Based on the percentage mortality, LC50 value of leaf ex-tract of Marrubium vulgare and Artemisia herba-alba on Culex pip-iens were obtained separately by calculating the regression line em-ploying Probit analysis of (Finney 1971) as described by (Busvine1971). The probit regressions are used to model the effect of dosesto determine the LC50 and their 95Keywords: Plants extracts, Mortality, LC50, 4th instars, Statisticalanalysis, Culex pipiens

Hypothesis Testing for an Extended Cox Model with Time-varying Coefficients�Takumi Saegusa, Chongzhi Di and Ying ChenFred Hutchinson Cancer Research [email protected] log-rank test has been widely used to test treatment effects un-der the Cox model for censored time-to-event outcomes, though itmay lose power substantially when the model’s proportional haz-ards assumption does not hold. In this presentation, we consideran extended Cox model that uses B-splines or smoothing splinesto model a time-varying treatment effect and propose score teststatistics for the treatment effect. Our proposed new tests com-bine statistical evidence from both the magnitude and the shapeof the time-varying hazard ratio function, and thus are omnibusand powerful against various types of alternatives. In addition, the

new testing framework is applicable to any choice of spline ba-sis functions, including B-splines, and smoothing splines. Simu-lation studies confirm that the proposed tests performed well in fi-nite samples and were frequently more powerful than conventionaltests alone in many settings. The new methods were applied to theHIVNET 012 Study, a randomized clinical trial to assess the effi-cacy of single-dose Nevirapine against mother-to-child HIV trans-mission conducted by the HIV Prevention Trial Network.

The Stragety for Selecting Target Population Using AdaptivePhase II/III Seamless Design Based on Time-to-event Data�Hao Yu, Dandan Miao and Feng ChenNanjing Medical [email protected]

ABSTRACT Objective: Although subgroups can be identified onthe basis of post-analysis, it needs an additional confirmatory trialand this may lead to an inflation in development time and cost. Wepresent an approach that view treatment comparisons in both a pre-defined subgroup and the full population in the design period of aseamless trial, then evaluate the statistical characteristics. Method:It is based on the adaptive phase IIIII design. The decision of con-tinuing seamlessly either in a subgroup or the full population is onthe basis of analysis of PFS and OS obtained from the first stage.Final analysis is conducted only for OS using Fisher combinationmethod after the second stage trial. Results: It is shown that thetype-I-error rate is less than 2.5% and is independent of the cor-relation of OS and PFS. The simulations demonstrate that correctconclusions are reached sufficiently often in the various scenarios.Conclusion: In oncology trials if there is an priori hypothesis aboutincreased efficacy in a defined subgroup and this subgroup can bewell characterized ,our design can shorten the time and cost.

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 113

Page 122: Published by: International Chinese Statistical

Index of Authors

Abanto-Valle, C, 27, 50Abanto-Valle, CA, 27, 50Abecasis, G, 30, 49Adeniji, AK, 36, 108Afri-Mehennaoui, F, 31, 77Albert, P, 38, 95Allen, G, 32, 103Amaravadi, L, 25, 71An, D, 42, 99An, L, 37, 66Anderes, E, 25, 72Anderson, A, 35, 85Antao, V, 36, 107Antonijevic, Z, 27, 27, 44,

44Aouati, A, 33, 112Asakura, K, 28, 58Atem, F, 32, 104Aue, A, 37, 70Azevedo, CL, 27, 50

Baladandayuthapani, V, 31,35, 90, 99

Ball, G, 40, 82Ban, Y, 37, 66Banerjee, S, 25, 73Bell, J, 41, 47Bentellis, A, 31, 77Berchi, S, 33, 112Berg, E, 25, 31, 73, 78Berger, JO, 38, 95Betensky, R, 32, 104Bhat, KS, 25, 69Bien, J, 43, 103Billor, N, 24, 45Binkowitz, B, 40, 76Bretz, F, 30, 59Brock, M, 36, 64Brown, E, 42, 63Brutnell, T, 24, 46Budenz, D, 32, 104Burdick, R, 33, 45

Cai, L, 36, 57Cai, T, 35, 38, 77, 82Candille, S, 38, 79Cao, G, 38, 92Cao, H, 34, 80Cao, J, 41, 62

Carmichael, O, 29, 35, 71,91

Caruana, R, 43, 103Casanova, R, 28, 62Casleton, E, 39, 106Cassese, A, 37, 74Castellanos, L, 39, 70castro, MD, 41, 51Castro-Nallar, E, 37, 66Chakraborty, B, 31, 86Chan, AH, 26, 97Chan, G, 42, 85Chan, I, 106Chan, K, 30, 49Chan, KW, 37, 70Chan, T, 39, 66Chang, C, 33, 39, 53, 67Chang, J, 24, 41, 43, 47, 56,

105Chang, W, 27, 109Chao, W, 36, 107Chapman, P, 33, 112Chatterjee, N, 27, 54Chaturvedi, P, 27, 44Chekouo, T, 37, 74Chen, B, 41, 60Chen, C, 27, 36, 107, 109Chen, CI, 36, 53Chen, CT, 30, 39, 59, 76Chen, F, 32, 33, 111, 113Chen, G, 34, 36, 64, 81Chen, H, 41, 47Chen, J, 27, 42, 54, 85Chen, K, 30, 35, 42, 49, 63,

92Chen, L, 30, 49Chen, M, 27, 29, 34, 36, 41,

50, 51, 79, 82,108

Chen, P, 28, 65Chen, Q, 41, 41, 52, 60Chen, S, 26, 32, 43, 90, 93,

111Chen, SX, 43, 105Chen, T, 37, 74Chen, Y, 33, 113Chen, Z, 43, 111Cheng, G, 32, 39, 96, 100Cheng, Y, 26, 41, 52, 104

Chesi, A, 28, 62Cheung, K, 31, 42, 86, 99Cheung, YK, 40, 100Chiang, A, 24, 54Chiavacci, R, 28, 62Chien, W, 36, 36, 52, 53Christopher, D, 27, 45Chu, L, 36, 64Chun, H, 24, 54Chung, D, 41, 47Churpek, MM, 34, 80Ciarleglio, A, 29, 71Clarke, B, 33, 51, 112Clarke, J, 33, 33, 51, 112Cook, B, 25, 73Cook, T, 31, 86Coram, M, 38, 79Costagliola, D, 31, 76Countryman, P, 26, 96Cox, N, 30, 49Crainiceanu, C, 24, 46Crandall, K, 37, 66Crimin, K, 25, 71Crowe, B, 40, 83Crowley, J, 37, 74Cui, Y, 28, 55

Dai, J, 32, 111Davidian, M, 26, 104Davidson, K, 31, 86Davis R, 98Davis, R, 33Demuth, G, 25, 73Deng, H, 36, 53Deng, Q, 34, 63Devanarayan, V, 32, 95Dey, D, 27, 42, 43, 43, 50,

63, 102, 103Di, C, 33, 113Di, Y, 41, 56Dicarlo, J, 26, 96Ding, W, 41, 60Ding, Y, 25, 83Do, K, 30, 37, 61, 74Dobra, A, 33, 51, 112Doecke, J, 37, 74Dongmo Jiongo, V, 31, 78Du, J, 30, 61Du, Y, 31, 68

Duan, F, 34, 57Duan, W, 32, 111Duchesne, P, 31, 78Durkalski, V, 26, 109

Emerson, S, 41, 56Espeland, M, 28, 62Evans, S, 28, 58

Fan, J, 27, 35, 43, 48, 82,105

Fan, M, 40, 110Fan, S, 42, 65Fan, Y, 27, 48Fang, X, 26, 104Feng, S, 25, 71Feng, X, 33, 54Feng, Y, 38, 96Feng, Z, 34, 57Fine, J, 41, 52Fine, JP, 34, 80Finley, A, 25, 73Fishman, E, 36, 64Flaherty, P, 41, 48Fong, Y, 36, 58Fosdick, B, 39, 105Foster, J, 38, 95Frommlet, F, 28, 59Fu, B, 36, 107Fu, H, 34, 38, 80, 91Fu, W, 36, 107Fuentes, M, 39, 70Furrey, T, 29, 79

Gamazon, E, 30, 49Gan, G, 43, 103Gao, B, 28, 55Gao, C, 24, 27, 46, 48Gao, F, 34, 63Gao, L, 36, 107Gao, X, 33, 38, 51, 92Garcia, T, 41, 52Gardiner, J, 42, 85Gardner, I, 27, 50Gardner, J, 27, 44Gary, M, 106Gehrke, J, 43, 103Gelernter, J, 41, 47Gelfand, A, 30, 61

114

Page 123: Published by: International Chinese Statistical

Bold-faced are presenting authors. Index of Authors

George, S, 29, 89Ghosh D, 73Ghosh, D, 37, 77Gill, M, 37, 67Gilmore, D, 26, 90Glimm, E, 30, 59Godin, O, 31, 76Greven, S, 24, 46Gu, X, 25, 84Guan, Y, 35, 91Guindani, M, 37, 74Guinness, J, 39, 70Guo, B, 42, 43, 65, 105

Hamasaki, T, 28, 28, 58, 58Han, F, 24, 47Han, P, 32, 93Han, S, 30, 101Han, X, 27, 48Harrell, L, 36, 57Harrington, P, 37, 67Harris, S, 27, 44Harvill, J, 43, 103Haunfelder, R, 39, 105Haynes, B, 36, 58Haziza, D, 31, 78He, C, 33, 39, 51, 66He, J, 29, 71He, X, 37, 39, 67, 69He, Z, 29, 94Heitjan, D, 42, 85Hennessey, V, 31, 90Hobbs, B, 30, 40, 61, 100Hochberg, M, 37, 69Hoeting, J, 33, 112Hoffman, E, 30, 49Holan, S, 39, 71Holland, D, 30, 61Holland, E, 39, 66Honerkamp-Smith, G, 37,

69Hong, C, 37, 66Hong, F, 32, 95Horrell, M, 39, 70Hsiao, C, 27, 28, 30, 39, 58,

59, 76, 109Hsieh, D, 28, 65Hsieh, R, 36, 52, 53Hsu, F, 28, 62Hsu, J, 25, 83Hu, F, 35, 84Hu, J, 30, 40, 61, 88Hu, M, 29, 34, 63, 79Hu, T, 29, 87Hu, Y, 40, 88Hu, Z, 32, 111Huang, C, 32, 93Huang, J, 31, 38, 39, 67, 75,

92Huang, ML, 28, 65Huang, P, 36, 64

Huang, S, 28, 28, 65, 65Huang, W, 28, 58Huang, Y, 26, 38, 43, 79, 97,

105Hung, H, 28, 65Hung, HMJ, 28, 58Hunt, K, 33, 53

Im, HK, 30, 49

Jaeger, A, 35, 88James, G, 38, 96Janes, H, 26, 97Jeong, J, 37, 68Ji, H, 41, 47Ji, Y, 37, 42, 65, 66Jia, X, 42, 99Jiang, D, 41, 56Jiang, F, 38, 94Jiang, H, 37, 38, 41, 41, 56,

56, 66, 91Jiang, X, 27, 50Jiao, F, 30, 49Jin, F, 29, 79Jin, G, 32, 111Jin, P, 29, 79Jin, Z, 38, 77Joeng, H, 36, 108Johns, D, 25, 71Johnson, B, 41, 60Johnson, D, 30, 61Johnson, W, 27, 50Johnson, WE, 37, 66Jung, Y, 40, 88

Kaiser, M, 39, 106Kang, G, 28, 55Kashyap, V, 31, 74Kass, R, 39, 70Kechris, K, 40, 87Kelly, A, 28, 62Kim, M, 41, 60Koch, A, 28, 59Kohli, P, 43, 103Kong, L, 24, 39, 46, 66Kong, S, 25, 84Kong, X, 35, 98Kong, Y, 27, 48Kosorok, M, 34, 81Kundu, S, 42, 83Kuo, L, 33, 37, 51, 67

Laber, E, 26, 104Lachos Davila, VH, 27, 50Lai, H, 37, 68Lai, Y, 27, 109Lan, KKG, 39, 76Lazar, N, 35, 88Lee, BL, 42, 65Lee, C, 36, 53Lee, J, 27, 37, 50, 66

Lee, JJ, 31, 68Lee, K, 40, 102Lee, M, 37, 69Lee, S, 42, 99Lee, T, 31, 74Lee, TCM, 25, 69Lei, J, 35, 92Lei, S, 38, 91Levina, E, 40, 102Li, B, 29, 40, 79, 102Li, C, 41, 47Li, D, 27, 42, 48, 73Li, G, 25, 31, 38, 83, 86, 91Li, H, 35, 43, 82, 110Li, J, 30, 41, 56, 101li, J, 43, 111Li, M, 38, 78Li, Q, 29, 93Li, R, 35, 41, 43, 52, 82, 105Li, S, 36, 64Li, X, 25, 71Li, Y, 29, 35, 35, 42, 43, 65,

79, 85, 91, 111Li, Z, 29, 94Liang, L, 30, 49Liao, Y, 43, 105Liberles, D, 35, 98Lim, C, 28, 56Lim, CY, 28, 56Lin, C, 30, 36, 49, 53Lin, L, 29, 79Lin, N, 39, 67Lin, Q, 26, 90Lin, X, 30, 49Lin, Y, 28, 36, 61, 65, 107Lin, YK, 36, 36, 53, 53Lindborg, S, 25, 71Lipkovich, I, 25, 75Liu, A, 26, 29, 38, 89, 95, 98Liu, B, 34, 80Liu, D, 25, 26, 38, 71, 95, 97Liu, F, 26, 109Liu, G, 34, 63Liu, H, 24, 26, 29, 47, 94,

104Liu, J, 27, 29, 36, 39, 42, 53,

63, 76, 80, 94,109

Liu, L, 27, 35, 42, 44, 85, 98Liu, P, 24, 46Liu, Q, 33, 53Liu, R, 26, 97Liu, S, 40, 100Liu, T, 40, 102Liu, X, 28, 38, 55, 77Liu, Y, 34, 37, 63, 74, 81Liu, Z, 32, 35, 84, 103Loh, W, 38, 94Lok, A, 34, 57Long, Q, 41, 60Loo, GY, 37, 69

Lou, D, 32, 32, 111, 111Lou, Y, 43, 103Lu, K, 35, 98Lu, N, 39, 76Lu, S, 28, 61Lu, W, 34, 80Lu, Y, 24, 26, 42, 54, 65, 96Luo, C, 42, 63Luo, S, 24, 33, 53, 54Luo, Z, 42, 85

Muller, H, 40, 110Ma, C, 30, 61Ma, H, 32, 111Ma, J, 29, 32, 40, 89, 100,

101Ma, P, 31, 37, 66, 75Ma, S, 28, 32, 34, 36, 55, 64,

81, 103Ma, X, 24, 54Ma, Y, 32, 41, 43, 52, 101,

110Ma, Z, 24, 27, 46, 48Maadooliat, M, 38, 92Maceachern, S, 27, 50Mahapatra, P, 25, 69Mai, Q, 34, 81Maiti, T, 28, 56Mallinckrodt, C, 25, 75Manimaran, S, 37, 66Manner, D, 38, 91Mao, X, 40, 100Marcy, P, 25, 69Marder, K, 41, 52Mary-Krause, M, 31, 76Matsuo, T, 40, 110Maurer, W, 30, 59Mayer, C, 38, 91Mazroui, Y, 31, 76Mcauliffe, J, 41, 48Mcpeek, MS, 41, 56Mebane, D, 25, 69Mebirouk, O, 31, 77Mehennaoui, S, 31, 77Mehta, C, 27, 44Mentch, F, 28, 62Mesbah, M, 26, 96Meyer, M, 31, 78Miao, D, 33, 113Millen, B, 30, 59Miller, E, 27, 44Mitchell, J, 28, 62Mizera, I, 24, 46Molenberghs, G, 25, 75Montes, R, 33, 45Morris, J, 29, 72Mueller, H, 29, 35, 71, 91Mueller, P, 37, 42, 65, 66Murphy S, 98Murphy, S, 24, 106Murray, S, 38, 78

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 115

Page 124: Published by: International Chinese Statistical

Index of Authors Bold-faced are presenting authors.

Mwanza, J, 32, 104Myers, DB, 39, 71

Najibi, SM, 38, 92Nandy, S, 28, 56Nettleton, D, 43, 105Ni, Y, 35, 99Nicholas, K, 26, 109Nicolae, DL, 30, 49Nie, L, 26, 98Ning, J, 30, 34, 57, 68Nordman, D, 39, 106Norris, M, 27, 50

O’Kelly, M, 25, 75Ogden, T, 29, 71Opsomer J, 98Opsomer, J, 31, 78Ouyang, G, 43, 102Ouyang, Z, 40, 102

Paik, M, 41, 60Pallos, LL, 36, 107Pan, Q, 33, 53Pan, W, 38, 79Park, Y, 29, 80Paul, D, 37, 70Peng, H, 26, 35, 38, 79, 88,

90Peng, J, 41, 48Peng, L, 34, 39, 67, 81Perel, S, 39, 70Permar, S, 36, 58Perou, C, 40, 88Peterfy, C, 26, 96Petersen, A, 35, 40, 91, 110Petkova, E, 29, 71Pfeiffer, R, 34, 57Posch, M, 28, 59Potard, V, 31, 76Pounds, S, 36, 107Price, K, 38, 91

Qian, J, 32, 104Qiao, X, 38, 39, 96, 96Qin, G, 25, 90Qin, J, 32, 41, 60, 93Qin, Y, 35, 85Qin, Z, 29, 79Qiu, Y, 43, 105Qu, PR, 26, 108Qu, S, 29, 72Quinlan, M, 27, 45

Rached, O, 31, 31, 77, 77Radchenko, P, 38, 96Rahman, A, 43, 111Ramakrishnan, V, 26, 109Raman, S, 36, 64Randolph, T, 36, 58Rathbun, S, 35, 84

Ratitch, B, 25, 75Ravishanker, N, 43, 103Ren, J, 32, 93Ren, Z, 24, 46Revzin, E, 38, 92Ristl, R, 28, 59Rosenblum, M, 26, 104Rosner, G, 31, 90Roy, S, 28, 62Ruberg, S, 25, 83

Saddiki, H, 41, 48Saegusa, T, 33, 113Sahli, L, 31, 77Sapp, J, 36, 107Saville, B, 38, 91Schindler, J, 34, 63Schliep, E, 30, 61Schroeder, J, 36, 64Schwartz, A, 39, 70Schweinberger, M, 39, 106Scrudato, K, 41, 110Seetharaman, I, 30, 49Sen, K, 30, 59Sengupta, S, 37, 66Seshan, V, 25, 71Severini, T, 42, 85Shamsaei, B, 36, 107Shao, J, 29, 93Shao, Q, 35, 82Shao, Y, 42, 83She, Y, 27, 48Shen, J, 28, 61Shen, L, 25, 25, 38, 84, 84,

95Shen, R, 25, 71Shen, W, 34, 57, 106Shen, Y, 35, 35, 85, 85Shi, M, 36, 64Shi, X, 28, 55Shih, M, 31, 68Shih, T, 42, 85Shih, W, 31, 86Shih, WJ, 28, 61Shim, Y, 36, 107Shou, H, 24, 46Sidor, L, 27, 44Siegmund, K, 29, 80Simon, N, 40, 100Sinha, R, 35, 85Siska, C, 40, 87Song, C, 32, 95Song, P, 41, 60Song, R, 24, 54Soon, G, 26, 98Stein, M, 39, 70Stingo, F, 33, 35, 37, 40, 51,

74, 99, 100Storlie, C, 25, 69Storlie, CB, 25, 69Stranger, B, 30, 49

Street, RC, 29, 79Stroup, W, 27, 45Su, H, 40, 41, 110, 110Su, S, 42, 74Su, X, 30, 33, 53, 59Suchard, M, 37, 67Sudduth, K, 39, 71Sullivan, P, 29, 79Sun, D, 25, 33, 51, 73Sun, J, 29, 32, 87, 101Sun, W, 31, 37, 39, 40, 74,

75, 88, 96Sun, X, 43, 111Swartz, M, 33, 51Szulwach, KE, 29, 79

Tan, F, 26, 35, 88, 90Tan, H, 33, 112Tan, L, 26, 97Tan, M, 31, 89Tang, H, 38, 79Tang, X, 42, 85Tanna, A, 32, 104Tarpey, T, 29, 71Tayob, N, 38, 78Teufel, A, 35, 98Tian, L, 26, 38, 94, 96Ting, N, 34, 36, 63, 108Tiwari, R, 43, 111Tong, X, 38, 96Trippa, L, 31, 42, 65, 68Tsai, H, 32, 93Tseng, GC, 30, 101Tsiatis, A, 26, 104Tsou, H, 27, 39, 76, 109Tu, I, 28, 28, 64, 65Tu, W, 29, 94Tzeng, C, 27, 39, 76, 109Tzeng, J, 28, 40, 65, 88

Valdes, C, 33, 33, 51, 112van Dyk, DV, 31, 74Vandergrift, N, 36, 58Vannucci, M, 33, 37, 51, 74Verde, F, 36, 64Volgushev, S, 32, 100Vu, V, 39, 70

Wahed, A, 26, 104Wan, Y, 32, 103Wang, B, 25, 90Wang, C, 28, 30, 31, 41, 49,

60, 65, 90Wang, D, 26, 90Wang, F, 40, 109Wang, H, 33, 33, 35, 39, 54,

88, 105, 112Wang, J, 29, 29, 30, 38, 49,

71, 72, 92Wang, L, 24, 30, 37, 38, 43,

46, 68, 70, 92,96, 105

Wang, M, 24, 34, 36, 54, 63,64

Wang, N, 35, 91Wang, P, 34, 81Wang, R, 41, 48Wang, S, 28, 29, 35, 39, 58,

66, 67, 88, 93Wang, W, 37, 40, 43, 68, 82,

105Wang, X, 25, 29, 29, 30, 38,

42, 72, 73, 89,95, 99, 101

Wang, Y, 30, 31, 39, 41, 52,61, 66, 86

Warren, J, 32, 104Weerahandi, S, 30, 101Wei, LJ, 38, 94Wei, Y, 36, 64Wei, Z, 33, 51Weller, Z, 33, 112Weng, Y, 30, 59Whitmore, GA, 37, 69Wiens, B, 42, 73Wikle, C, 39, 71Wittes, J, 40, 83Wo, H, 32, 111Wong, R, 31, 74Wong, RKW, 25, 69Wu H, 73Wu, C, 28, 55Wu, H, 29, 79, 80Wu, J, 31, 78Wu, M, 32, 40, 87, 95Wu, W, 28, 56Wu, Y, 37, 41, 74, 110

Xi, D, 30, 59Xi, Y, 42, 62Xiao, R, 28, 62Xie, J, 35, 82Xie, M, 26, 97Xu, C, 43, 105Xu, G, 39, 67Xu, J, 42, 73Xu, N, 42, 83Xu, R, 37, 69Xu, T, 29, 79Xu, Y, 25, 37, 39, 42, 65, 66,

76, 84Xu, Z, 29, 79Xue, L, 34, 81

Yan, J, 37, 68Yang, C, 29, 41, 47, 94Yang, H, 35, 98Yang, J, 34, 81Yang, W, 39, 71Yang, Y, 34, 81Yao, B, 29, 79Yao, Q, 43, 105Yau, CY, 37, 70

116 | 2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17

Page 125: Published by: International Chinese Statistical

Bold-faced are presenting authors. Index of Authors

Yavuz, I, 26, 104Yee, L, 42, 85Yi, H, 32, 111Yi, M, 33, 53Yin, Y, 34, 63Yu, C, 31, 78Yu, D, 24, 46Yu, H, 33, 113Yu, L, 35, 98Yu, M, 29, 93Yu, T, 38, 79Yu, Z, 29, 94Yuan, M, 40, 102Yuan, Y, 26, 34, 40, 57, 100,

109

Zang, Y, 26, 40, 100, 109Zavala, N, 41, 47Zeng, D, 29, 34, 80, 87Zhan, X, 30, 49Zhang H, 73Zhang, B, 29, 86

Zhang, C, 24, 46Zhang, D, 42, 85Zhang, F, 29, 79Zhang, G, 29, 32, 43, 79,

101, 111Zhang, H, 27, 32, 54, 95Zhang, J, 28, 34, 42, 61, 63,

80Zhang, L, 33, 45Zhang, M, 29, 86Zhang, N, 31, 41, 47, 75Zhang, P, 29, 89Zhang, R, 32, 111Zhang, S, 41, 42, 62, 62Zhang, T, 32, 101Zhang, X, 29, 32, 38, 71, 92,

101Zhang, Y, 26, 30, 37, 40, 41,

43, 51, 67, 101,102–104, 111

Zhang, Z, 26, 34, 37, 57, 68,98

Zhao, A, 38, 96Zhao, H, 36, 37, 40, 41, 47,

64, 67, 102Zhao, J, 35, 98Zhao, M, 29, 79Zhao, N, 32, 40, 87, 95Zhao, S, 35, 82Zhao, X, 42, 99Zhao, Y, 32, 32, 33, 53, 111,

111Zhao, YC, 26, 35, 90, 98Zhao, YQ, 29, 36, 64, 87Zheng, Q, 39, 67Zheng, T, 26, 97Zheng, W, 24, 26, 54, 90Zheng, Y, 38, 77Zheng, Z, 27, 48Zhong, J, 25, 71Zhong, P, 26, 32, 35, 88, 90,

93Zhong, S, 41, 56Zhong, W, 31, 75

Zhou, H, 24, 27, 37, 43, 46,48, 74, 110

Zhou, L, 38, 92Zhou, M, 42, 83Zhou, Q, 29, 87Zhou, W, 24, 24, 35, 46, 47,

47, 82Zhou, X, 30, 49Zhou, Y, 25, 43, 73, 111Zhu L, 73Zhu, H, 24, 29, 30, 54, 68,

72Zhu, J, 28, 40, 57, 102Zhu, R, 36, 64Zhu, Y, 36, 37, 66, 107Zhu, Z, 25, 73Zhuo, B, 41, 56Zimmerman, D, 25, 72Zipunnikov, V, 24, 46Zou, F, 37, 74Zou, H, 34, 81

2015 ICSA/Graybill Joint Conference, Fort Collins, Colorado, June 14-17 | 117

Page 126: Published by: International Chinese Statistical