structuring data to facilitate analysis jerry j. vaske jay beaman colorado state university warner...
TRANSCRIPT
![Page 1: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/1.jpg)
Structuring Data to Facilitate Analysis
Jerry J. VaskeJay Beaman
Colorado State UniversityWarner College of Natural Resources
Human Dimensions of Natural ResourcesFort Collins, CO
Workshop at the 2008 Pathways to Success Conference:Integrating Human Dimensions into Fish & Wildlife Mgmt.
![Page 2: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/2.jpg)
Workshop Foundation
![Page 3: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/3.jpg)
Workshop Objectives
• Illustrate strategy for:
– Facilitating analysis of 2006 National Survey ofFishing, Hunting, and Wildlife-Associated Recreation (FHWAR)
– Increasing the usability of FHWAR data formanagement, planning & policy
• Compare two types of data structures:
– Flat files
– Relational Entities
![Page 4: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/4.jpg)
Traditional Flat File
Rows = RespondentsColumns = Variables
![Page 5: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/5.jpg)
Flat Files – Journal Article Example
Every journal article has:
• One or more authors
• Title
• Journal name
• Specifics about date of publication:YearVolume numberIssue numberPage numbers
• Potentially keywords
![Page 6: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/6.jpg)
Flat File Data Structure for Journal Articles
![Page 7: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/7.jpg)
Potential Issues with Flat Files
• Problem– Diefenbach et al (2005) article had 7 co-authors– 7 columns (variables) necessary to accommodate
all authors’ last names– 19 of 26 articles in flat file had only 1 or 2 authors– 67% of author fields empty– If first names included – more empty fields
• Solution – Relational database
![Page 8: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/8.jpg)
Relational Databases• Definition
– Set of tables containing data for predefined categories– Data stored in separate files (tables) that are linked
• Terminology– Table = Entity (E)
– Rows (tuples) in table = information about an object(e.g., journal article or respondent)
– Columns (attributes) = variables
– Two types of relations (R)1. Set of tuples – a table with attributes (these R’s store data)2. Algebraic (Person ID in Table A = Person ID in Table B)
(these relations use data stored in entities)
![Page 9: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/9.jpg)
Relational Data Structure for Journal Articles
Article EntityArticle IDJournal IDArticle titleYear, Issue, Pages
Journal EntityJournal IDJournal namePublisher info
Author EntityAuthor IDLast, First nameContact info
Keyword EntityKeyword ID(attitudes, norms)
(R2)Relation
Journal ID
(R3)Relational Table
Article IDKeyword ID
(R1)Relational Table
Article IDAuthor ID
![Page 10: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/10.jpg)
Comparison Flat File vs. Relational Database
Flat file
Relational Database R1 = table multiple authors (AuthorID) linked to given article (ArticleID)
ArticleID AuthorID
2059 314
2059 59
2059 233
Author entity and R1 (author–article relation)can have any number of rowsso all authors of an articlecan be identified
![Page 11: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/11.jpg)
FHWAR Flat File Example
• Fishing, Hunting & Wildlife-Associated Recreation (FHWAR)
• National Survey – Conducted about every 5 years– 1955 – first survey
– 2006 – most recent survey
• Data on hunters, anglers, wildlife watchers:– Sportsperson expenditures
– Species sought in different states
• Data collection costs (1991–2006) $55 million (in 2008 $)
• 1991-2006 data comparable within limits but not integrated
![Page 12: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/12.jpg)
2006 – FHWAR Flat File Data
Data distributed on CD containing 3 ASCII text files:
1. Screening data
2. Sportsperson (hunting & fishing) data
3. Wildlife Watcher data
Data file # of Records # of Variables
Screening 144,509 56
Sportsperson 21,942 3,765
Wildlife Watcher 11,285 772
![Page 13: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/13.jpg)
FHWAR Flat File – Analytical Issues
• Important issues– 4,500+ vars with obtuse variable names
(e.g., NCU_STD1)
– 200 pages of documentation
– Census conversion programs do not create variable labels or value labels
• Major issues– Data compression
– Conceptual complexity
![Page 14: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/14.jpg)
Analytical Issues Affecting Use
• Data compression– No hunters hunt in all 50 states (at most 8 in 2006 data)
– To avoid numerous empty cells data are compressed(e.g., the values for 3 vars are combined into a single var)
– For example:“days” of participation is combined with an “activity” (e.g., big game or small game hunting)
in a given “state” (in the order states are mentioned)
– Compressed vars cannot be directly analyzed by SAS or SPSS • Conceptual complexity
– When uncompressed to blocks of 50 states ≈ 20,000 variables
– Difficult to visualize analysis strategy
– Flat FHWAR files hide data structure
![Page 15: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/15.jpg)
Relational File Structure Illustration
Entity
1. PERSON
2. SPORTSPERSON
3. HUNTING_ACTIVITY
4. TRIP_EXPENDITURES
Based on flat file:
Screening
Sportsperson
Sportsperson
Sportsperson
Four entities ≈ half of the 2006 FHWAR flat file data
![Page 16: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/16.jpg)
PERSON Entity
• 6 control variables (e.g., Person_Weight)
• 10 demographic variables (e.g., Age, Sex)
• 8 hunting variables (e.g., Hunted_2005)
• 8 fishing variable (e.g., Fished_2005)
• 6 residential wildlife watching variables(e.g., Home_Observe_2005)
• 5 non-residential wildlife watching variables(e.g., Trip_Watch_2005)
![Page 17: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/17.jpg)
SPORTSPERSON Entity
• 6 control variables(e.g., Person_ID, Sportsperson_Weight)
• 11 demographic variables (e.g., Age, Sex)
• 15 national summary variables (e.g., Hunted_2006)
PERSON variables in SPORTSPERSON could be “obtained” from PERSON but also included SPORTSPERSON to simplify analyses
![Page 18: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/18.jpg)
TRIP_EXPENDITURES Entity
TRIP_EXPENDITURES entity reduces 844 compressed vars to 10 vars
Person_ID Unique person ID
Sportsperson_Weight Sportsperson weight
Spender Person in TRIP_EXPENDITURES entity
State_of_Residence State of residence
Location_Trip_Spending State of spending
Spend_State_of_Residence Expenditure in state of residence
Fish_Hunt Fishing or hunting expenditure
Fish_Hunt_Type Fishing or hunting type
Trip_Expend_Category Sportsperson expenditure categories
Dollars Amount spent in dollars
![Page 19: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/19.jpg)
HUNTING_ACTIVITY EntityPerson_ID Unique person ID
Sportsperson_Weight Sportsperson weight
Hunter Person in HUNTING_ACTIVITY entity
Table_Cell_Description Description of tables (e.g., relation to FH3 variables)
Sub_Table_ID Sportsperson sub-table identifier
State_of_Residence State of residence (start Wave 3)
In_State_of_Residence Participation in state of residence
Activity_Location Geographic location for activity (USA or a State)
Private_Public Activity of private or any public land
Fish_Hunt_Type Fishing or hunting type
Response_Unit Participation = 1, Days = 2, Trips = 3
Response Participation (1 = Yes), # of days, or # of trips
HUNTING_ACTIVITY entity reduces 840 compressed vars to 12 vars
![Page 20: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/20.jpg)
Variable: Sub_Table_ID
HUNTING_ACTIVITY entityA collection of state-level sub-tables to facilitate analysis
![Page 21: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/21.jpg)
Visualizing the 4 Entities
Person(Screening data)
Control Variables Person ID Person Weight
Demographics (11)
Hunting (8) Hunted Ever Hunt Intentions
Fishing (8)
Wildlife Watching Residential (6) Trips (5)
Sportsperson(Sportsperson data)
Control Variables Person ID Sportsperson Weight
Demographics (11)
National summary “species” variables (15) Hunted 2006 Big game hunted Days big game hunted Trips hunting big game
Hunting Activity(Sportsperson data)
Person IDSportsperson WeightSub Table IDResponse UnitResponse
Trip Expenditure(Sportsperson data)
Person IDSportsperson WeightSpending categoriesDollars
![Page 22: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/22.jpg)
Summary
• About 1,750 flat file variables reduced to < 60
• Obtuse variable names replaced with intuitive names
• Compressed flat file variables cannot be directly used in SPSS or SAS
Variables in relational entities can be used in analysis
• Details in Beaman & Vaske (2008)
![Page 23: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/23.jpg)
Entity Data FilesEntity SAS filename SPSS filename
PERSON Person.sas7bdat Person.sav
SPORTSPERSON Sportsperson.sas7bdat Sportsperson.sav
HUNTING_ACTIVITY Hunting_Activity.sas7bdat Hunting_Activity.sav
TRIP_EXPENDITURES Trip_Expenditures.sas7bdat Trip_Expenditures.sav
(http://welcome.warnercnr.colostate.edu/~jerryv/)
To simplify analyses 2 additional entities:Hunting_Activity_and_DemographicsTrip_Expenditures_and_Demographics
![Page 24: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/24.jpg)
SAS Code & SPSS Syntax
Figure number: SAS code SPSS syntax
Figure 3 Figure_3_Syntax.sps
Figure 4 Figure_4.sas Figure_4_Syntax.sps
Figure 5 Figure_5_Syntax.sps
Figure 6 Figure_6.sas
Figure 7 Figure_7_Syntax.sps
Figure numbers based on Beaman & Vaske (2008)
http://welcome.warnercnr.colostate.edu/~jerryv/
![Page 25: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/25.jpg)
Example – Hypothesis
Average days of elk huntingvaries between Colorado vs. Wyoming
and by hunter’s sex
![Page 26: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/26.jpg)
Flat File to Entity for HypothesisData FHWAR6.Hunt_BGspecies_States ;
Length Person_ID 5 Sportsperson_Weight 4 Sex 3 State_of_Residence Activity_Location Fish_Hunt_Type Response_Unit Response 4 ;
Set FHWAR6.fh3 (rename = (sex = xsex)) ;
Keep Person_ID Sportsperson_Weight Sex State_of_Residence In_State Response Activity_Location Fish_Hunt_Type Response_Unit ; Person_ID = PersonID ; Sportsperson_Weight = spwgt ; Sex = Xsex ; State_of_Residence = put (resstate, $st2num2.) ;
* Array stores info to identify state when decompressing ;Array a1( 2, 8 ) HUNTSTD1-HUNTSTD8 STDAYSHD1-STDAYSHD8 ;
![Page 27: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/27.jpg)
* Array stores info to associate species with variables ;
Array gam1( 9) g1-g9 ;Retain g1 1 g2 2 g3 3 g4 4 g5 5 g6 6 g7 7 g8 40 g9 41 ;Array a7( 2, 9 , 8 ) bgame1d1--bgdifday9d8 ;Do m = 1 To 2 ; Do j=1 To 9 ; Do k=1 To 8 ;If a1( 1, k) = ' ' Then Goto End7 ; Fish_Hunt_Type = gam1(j) ; If m = 1 Then Do ; Response_Unit = 1 ; End ; Else Do ; Response_Unit = 2 ; End ; Response = a7(m, j, k) ; Activity_Location = put(a1( 1, k), $st2num2. ) ; If Activity_Location = State_of_Residence Then In_State = 1 ; Else In_State = 0 ;
* Outputs data for hypothesis;
If Response > 0 Then Output ;End7: End ; End ; End ;run ;
![Page 28: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/28.jpg)
SAS Entity to SPSS Entity
Get SAS Data = ‘C:\Hunt_BGspecies_States.sas7bdat’.
Add Value labels
Save Outfile = ‘C\Hunt_BGspecies_States.sav’.
![Page 29: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/29.jpg)
Testing Hypothesis with Relational EntityGET File = 'C:\Hunt_BGspecies_States.sav'.
WEIGHT BY Sportsperson_Weight.
Select if (Activity_Location = 8 or Activity_Location = 56).
Select if (Fish_Hunt_Type = 2).Select if (Response_Unit = 2).
UNIANOVA Response BY Sex Activity_Location.
Opens data
Weights data
CO huntersWY hunters
Elk huntersDays of participation
ANOVA
GET File = 'C:\FHWAR\Hunting_Activity.sav'.Select if (Sub_Table_ID = 10).
WEIGHT BY Sportsperson_Weight.
Select if (Activity_Location = 8 or Activity_Location = 56).
Select if (Fish_Hunt_Type = 2).Select if (Response_Unit = 2).
UNIANOVA Response BY Sex Activity_Location.
![Page 30: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/30.jpg)
Results
![Page 31: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/31.jpg)
Conclusions
• Analyses that are difficult to perform with flat file data are possible with relational structure
• Restructuring all of 2006 FHWAR data as well as data from 1991, 1996, & 2001 would:
– Yield similar analysis capabilities
– Allow for trend analysis
– New practical opportunities for state agencies
![Page 32: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/32.jpg)
Practical Opportunity• State agencies have accurate records of license
sales (e.g., hunting only, fishing only, combos)
• With potentially 100s of licenses, permits, & stamps sold, not practical to ask about specific licenses in a flat file
• Moving to relational structure for obtaining license data has advantages …
![Page 33: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/33.jpg)
Advantages of Relational License Data
1. Can ask about actual state license salesAll state license info can be “pre-stored” in one entitySize of entity would not impact other data entities
2. Questions about specific license cost not necessary; correct information pre-stored
3. Establishing relationship between state specific license sales & FHWAR dataprovides foundation for benchmarking / calibratingmeaningful estimates based on FHWAR
![Page 34: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/34.jpg)
From Analysis to Data Collection
• Entity based models:– facilitate analyses– can also enhance data collection
• Currently working with software company Techneos (www.techenos.com) toimplement pilot models that yield:– more consistent and – accurate data collection
![Page 35: Structuring Data to Facilitate Analysis Jerry J. Vaske Jay Beaman Colorado State University Warner College of Natural Resources Human Dimensions of Natural](https://reader033.vdocument.in/reader033/viewer/2022060106/5519b5a35503466f578b479a/html5/thumbnails/35.jpg)
Questions?