common problems with data what to look out for esther hughes & becky seeley mba data team and...

18
Common problems with data What to look out for Esther Hughes & Becky Seeley MBA data team and Marine Environmental Data Information Network

Upload: douglas-sims

Post on 25-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Common problems with data

What to look out for

Esther Hughes & Becky Seeley MBA data team and

Marine Environmental Data Information Network

Location, Location, Location…

• Plot station and sample coordinates. Do they make sense?

• Are they projected in the correct CRS?• Are there any missing? Why? Typos?• Points outside station – number of decimal places• East –West - confusion

Geographical checks

• GIS• www.StreetMap.co.uk (OS map)• www.Gridreferencefinder.com

Transect

Station

Stills

Decimal points

Number of decimal points for transect start and transect end corrected

Transect within station

Bad conversions

• Degrees and decimal minutes versus degrees minutes seconds

• (D)+(M/60)• (D)+(M/60)+(S/3600)

East West example

Sample Latitude LongitudeGrab 1 52.855864 4.4192505Grab 2 52.879076 4.3478394Grab 3 52.829321 4.4000244Grab 4 52.811063 4.3890381Grab 5 52.794458 4.3890381Grab 6 52.782831 4.3807983Grab 7 52.857523 4.3203735Grab 8 52.83098 4.3148804Grab 9 52.821023 4.2929077Grab 10 52.900619 4.2681885Grab 11 52.884049 4.2572021Grab 12 52.865814 4.2352295Grab 13 52.845912 4.2242432

Sample Latitude LongitudeGrab 1 52.855864 -4.4192505Grab 2 52.879076 -4.3478394Grab 3 52.829321 -4.4000244Grab 4 52.811063 -4.3890381Grab 5 52.794458 -4.3890381Grab 6 52.782831 -4.3807983Grab 7 52.857523 -4.3203735Grab 8 52.83098 -4.3148804Grab 9 52.821023 -4.2929077Grab 10 52.900619 -4.2681885Grab 11 52.884049 -4.2572021Grab 12 52.865814 -4.2352295Grab 13 52.845912 -4.2242432

Unhelpful computing mistakes

Series/copying errors Transposition errors Mixed abundance types e.g. presence/Absence and counts

ESPECIALLY in matrices!

Copying/ series errorsSurvey Name

Exciting Rock Exciting Rock Exciting Rock Exciting Rock Exciting Rock Exciting Rock

Date 08/03/2012 09/03/2012 10/03/2012 11/03/2012 12/03/2012 13/03/2012

Habitat

Rippled sand with infaunal polychaetes and burrows

of macrofauna

Gravel with pebbles,

cobbles and sand with hydroids,

bryozoans and mobile fauna

Gravel with pebbles,

cobbles and sand with hydroids

Sparse fauna on deep

circalittoral sand with very few burrows

and tubes

Gravel with pebbles,

cobbles and sand with hydroids

Gravelly sand with

occasional cobbles and pebbles with

burrowing and encrusting

fauna

Sample Ref WS_C_01 WS_C_03 WS_C_08 WS_C_08 WS_C_08 WS_C_10

Replicate Ref 1 2 3 4 5 6

Method Camera sledge Camera sledge Camera sledge Camera sledge Camera sledge Camera sledgeStart time 13:29:00 09:52:16 11:33:57 11:42:16 11:44:27 22:41:05End time 13:38:19 10:03:54 11:42:16 11:44:27 11:47:17 22:50:37Duration 00:09:19 00:11:38 00:08:19 00:02:11 00:02:50 00:09:32

Abundance Type Mix up!Sample Ref Species Abundance Type AbundanceWS_C_01 Porifera SACFOR RWS_C_01 Ciocalypta penicillus SACFOR RWS_C_01 Hydrozoa SACFOR OWS_C_01 Actiniaria SACFOR RWS_C_01 Nephtyidae SACFOR RWS_C_01 Sabellidae SACFOR PWS_C_01 Serpulidae SACFOR RWS_C_01 Paguridae SACFOR OWS_C_01 Flustra foliacea SACFOR PWS_C_01 Asterias rubens SACFOR RWS_C_01 Ophiura SACFOR CWS_C_01 Callionymus SACFOR RWS_C_02 Hydrozoa SACFOR RWS_C_02 Diphasia SACFOR FWS_C_02 Caryophylliidae SACFOR 1WS_C_02 Serpulidae SACFOR CWS_C_02 Paguridae SACFOR O

Presence/Absence

Presence/Absence

Count

Missing zeros and ‘not founds’Sample Ref Species Abundance Type AbundanceWS_C_01 Porifera Count 2WS_C_01 Ciocalypta penicillus Count -WS_C_01 Hydrozoa Count 16WS_C_01 Actiniaria Count 1WS_C_01 Nephtyidae Count -WS_C_01 Sabellidae Count 3WS_C_01 Serpulidae Count 2WS_C_01 Paguridae Count 1WS_C_01 Flustra foliacea Count 5WS_C_01 Asterias rubens Count 1WS_C_01 Ophiura Count 2WS_C_01 Callionymus Count 8WS_C_02 Polymastia Count -WS_C_02 Diphasia Count 15WS_C_02 Caryophylliidae Count 11WS_C_02 Serpulidae Count 3WS_C_02 Paguridae Count 1

Not seen?ORSearched for and Not Found?

0

0

0

Column headers and code definitions

Transposition errors

Taxonomic errors

• Typos!

• Misidentification

• Common name associated with a number of different scientific names e.g. Sea Oak: Phycodrys rubens or Halidrys siliquosa

• Names change all the time! The older the data the more work there is!

• More work on controlled vocabs and taxonomic errors tomorrow.

Sponge Fish

Other common missing data

• Depths• Biotope info, e.g. determination date• Times (SOL, EOL)• Unique references, e.g. sample refs

Missing report information

• Surveyor names (full list needed)• Parts of methodology• CRS used• Sampling equipment• Sieve mesh size• Grab/core size, type, name• Camera model etc.• Media formats (e.g. .jpeg, .avi., .tif)