data dan pengolahan data.pdf
TRANSCRIPT
DATA DAN PENGOLAHAN DATA
Dr. Pudji Lestari, dr Mkes
Public Health Dept
Faculty of Medicine Airlangga University
Data
Data are the pure and simple facts without any
particular structure or organization, the basic atoms
of information,
Information
Information is structured data, which adds meaning to
the data and gives it context and significance,
Knowledge
Knowledge is what we know.
Exp : the map Like a physical map, it helps us know
where things are –
Knowledge is the ability to use information
strategically to achieve one’s objectives,
Wisdom
is the capacity to choose objectives consistent with
one’s values within a larger social context.
Data to Information…..
Proses dalam statistik
Kumpulkan
Data di dapat dari Penelitian, Survay, observasi dsb
Data mentah
Tujuan, jumlah dan jenis data menentukan alat yang dipakai
Olah
Saji
Jumlah dan
jenis data
Hasil
Interpretasi
< 10 alat sederhana, manual
> 30, > 100 dgn soft ware
Tujuan
Pengolahan data
Pengolahan data
Kualitas juga tergantung pada proses sebelumnya-
pengumpulan data; coding; transfer/entry;
cleaning data
tergantung tujuan-- informasi apa yg diinginkan
Disesuaikan dengan –jenis data, jumlah variable,
hubungan antar variable
Alat pengolahan mengikuti jumlah data
Olah data
paling sederhana :
Urutkan (ARRAY) dari yang terkecil s/d terbesar
>30 ;> 100 tabel distribusi frequensi
Olah Data
Termasuk
Menghitung Ukuran
pemusatan (mean, median,
mode)
Menghitung ukuran
pencaran (Standard
Deviasi, Varians, quartil,
persentil, skewness, kurtosis)
Melakukan uji statistik
Sajikan data
Adalah kombinasi dari text dan grafik/tabel
dengan prinsip sejelas dan seinformatif mungkin.
Bila cukup sederhana sajikan dengan text, bila tak
cukup lengkapi dengan grafik/tabel
Sajikan data
Penyajian data tergantung pada
Tujuan
deskripsi saja atau ada tujuan analitik lain
Sasaran
Lingkungan akademis, koran, masy awam
Rambu Tabel
Selain angka absolut, gunakan rate/ratio untuk
memberi gambaran lebih jelas
Numerator, denumerator, konstanta harus jelas
Tabel harus bisa menerangkan dirinya sendiri (self
explain)
Meaningful, Unambiguous and efficient
Type of data
Classification for its
measurement scale:
Qualititative
Binary - dichotomous
Ordinal
Nominal
Quantitative
Discrete
Continuous
Level of Measurement / Skala Data
Nominal Level of Measurement
numbers or other symbols are assigned to a set of categories for the purpose of naming, labeling, or classifying the observations not imply anything about the magnitude or quantitative difference between the categories.
example Gender
rank-ordered categories ranging from
low to high example : Social class status as
"upper class", "middle class", or "working class".
"upper class" has a higher class
position than a person in a "middle class" category,
do not know magnitude of the
differences between categories
Ordinal Level of Measurement
the categories (or values) of a variable can be rank-ordered, and if the measurements for all the cases are expressed in the same units, then it is interval-ratio level of measurement.
Examples age, income, and SAT scores. how much larger or smaller one is compared with another.
Interval/Ratio Level of Measurement
Skala Pembeda Urutan Jarak
Nominal +
Ordinal + +
Ratio /Interval
+ + +
Summary of categorical data
We can obtain frequencies of categorical data
and summary them in a table or graphic.
Example: we have 21 agents of parasitic
diseases isolated from children.
Giardia lamblia
Entamoeba histolytica
Ascaris lumbricoides
Enterobius vermicularis
Ascaris lumbricoides
Enterobius vermicularis
Giardia lamblia
Giardia lamblia
Entamoeba histolytica
Ascaris lumbricoides
Enterobius vermicularis
Ascaris lumbricoides
Enterobius vermicularis
Giardia lamblia
Giardia lamblia
Entamoeba histolytica
Ascaris lumbricoides
Enterobius vermicularis
Ascaris lumbricoides
Enterobius vermicularis
Giardia lamblia
Summary of categorical data
List of parasites detected show us an idea of the
frequency of each parasite, but that is not clear.
If we ordered them, the idea is more clear.
Giardia lamblia
Giardia lamblia
Giardia lamblia
Giardia lamblia
Giardia lamblia
Giardia lamblia
Ascaris lumbricoides
Ascaris lumbricoides
Ascaris lumbricoides
Ascaris lumbricoides
Ascaris lumbricoides
Ascaris lumbricoides
Enterobius vermicularis
Enterobius vermicularis
Enterobius vermicularis
Enterobius vermicularis
Enterobius vermicularis
Enterobius vermicularis
Entamoeba histolytica
Entamoeba histolytica
Entamoeba histolytica
Summary of categorical data
We can show the results in a frequency distribution.
Parasite n
Giardia lamblia 6
Ascaris lumbricoides 6
Enterobius vermicularis 6
Entamoeba histolytica 3
Total 21
Frequency distribution of intestinal parasites detected in children from CAISES Celaya, n=21
Source: Laboratory report
Summary of categorical data
It is useful to show the frequency of each category, expressed as percentage of the total frequency.
It is called distribution of relative frequencies.
Parásito n %
Giardia lamblia 6 28.57
Ascaris lumbricoides 6 28.57
Enterobius
vermicularis
6 28.57
Entamoeba
histolytica
3 14.29
Total 21 100.00
Source: Laboratory report
Frequency distribution of intestinal parasites detected in children from CAISES Celaya, n=21
Summary of categorical data
Sometimes, the number of categories is high and should
diminish the number of categories.
Death cause n %
Cardiovascular disease 12,525 21.96
Cancer 10,321 18.10
Lower respiratory
infections
8,745 15.34
Other 25,435 44.60
Total 57,026 100.00
Distribution by death cause in Celaya, Gto, during 2007
Source: Certification of deaths
Frequency distributions for quantitative
data
With quantitative data, we need group the data, before of
show it in a frequencies or relative frequencies table.
Age (years) n %
19 52 14.70
20 32 9.00
21 46 12.99
22 67 18.94
23 26 7.35
24 77 21.76
25 54 15.26
Total 534 100.00
Distribution of frequencies in students of FEOC that have smoked at least once. n=534
Source: Health survey
With quantitative data, it is useful calculate cumulative
frequency.
Age (years) n % % cumulative
19 52 14.70 14.70
20 32 9.00 23.70
21 46 12.99 36.69
22 67 18.94 55.63
23 26 7.35 62.98
24 77 21.76 84.74
25 54 15.26 100.00
Total 534 100.00
Source: Health survey
Frequency distributions for
quantitative data
Distribution of frequencies in students of FEOC that have smoked at least once. n=534
Bar chart
The frequency or relative frequency of a
categorical variable can be show easily in a bar
chart.
It is used with categorical or numerical discrete data.
Each bar represent one category and its high is the
frequency or relative frequency.
Bars should be separated.
It is very important that Y axis begin with 0.
Bar chart
Gastrintestinal infections
0
12
3
4
56
7
Cryptos. E.histolyt. E.coli Giardia Rotavirus Shigella
Agents
Freq
uen
cy
Grouped bar chart
If we have a nominal categorical variable, divided
in two categories, can show data with a grouped
bar chart.
It allow easy comparison between groups.
Grouped bar chart
Gastrointestinal infections
0
1
2
3
4
5
Crypt. E.histolyt. E.coli Giardia Rotavirus Shigella
Agents
Fre
qu
en
cy
Males
Females
Pie chart
It is an alternative to show categorical variable.
Each slice of pie correspond at frequency or relative
frequency of categories of variable.
It only shows one variable in each pie chart.
If we want to make comparisons, we need to build two pie
charts.
Pie chart
Civil status of women in a community
Single
28%
Married
44%
Divorced
11%
Widowed
8%
Free union
9%
Distribution of frequency charts: histograms
It is useful to quantitative variables.
There are not spaces between bars.
The area bar, not its high, represent its frequency.
X axis should be continuous.
Y axis should begin in 0.
Width represent the interval for each group.
Number of sons in women from
Celaya
0
100
200
300
400
500
600
700
1 2 3 4 5 6 7 8+
Number of sons
Nu
mb
er
of
wo
ma
n
Distribution of frequency charts:
histograms
Distribution of frequency charts: frequencies
polygon
It is another form to show the frequency distribution
of a numerical variable.
It is building, joining the middle point higher of each
bar of histogram.
We should be take into account the width of each
bar.
We can plot more than one polygon in each chart,
to make comparisons.
Number of sons of women from
Celaya
0
100
200
300
400
500
600
700
1 2 3 4 5 6 7 8+
Number of sons
Nu
mb
er
of
wo
me
n
Distribution of frequency charts:
polygon of frequencies
Distribution of frequencies: cumulative
histogram
We can plot directly from a cumulative frequencies
table.
It is not necessary to make adjustments to the high
of the bars, because the cumulative frequencies
represent the total frequency superior, including the
superior limit of the interval.
Cumulative frequency of birthweight
0
20
40
60
80
100
120
501- 1001- 1501- 2001- 2501- 3001- 3501- 4001- 4501- 5000+
Weight
Cu
mu
lati
ve
freq
uen
cy (
%)
New borns
Distribution of frequencies:
cumulative histogram
We use them to see proportions below o above of
a point in the curve.
We can read median and percentiles, directly.
If the distribution is symmetrical, it has S form
symmetrical.
If it is skewed to the right or to the left, will be
flatten in that side.
Distribution of frequencies:
cumulative polygon of frequencies
Cumulative frequencies of birthweight
0
20
40
60
80
100
120
501- 1001- 1501- 2001- 2501- 3001- 3501- 4001- 4501- 5000+
Weight
Cu
mu
lati
ve
freq
uen
cy (
%)
New borns
Distribution of frequencies:
cumulative polygon of frequencies
Other charts: tree and leafs
We use it to show directly quantitative data or
preliminary step in the build a frequency
distribution.
We organize data determining the number of divisions
(5-15).
We plot a vertical line and put the first digit of
category to the left of the line (tree) and the second
digit to the right of the vertical line (leafs).
Other charts: tree and leafs
Patie
nt
Age
1 54
2 35
3 49
4 61
5 58
6 64
7 32
8 57
9 43
10 42
3 5 2
4 932
5 487
6 14
Other charts: box plot
We plot a vertical line that represents the range of
distribution.
We plot a horizontal line that represents third
quartile and another that represents the first
quartile (box).
The point middle of distribution is show as a
horizontal line in the center of box.
Other charts: box plot
5500
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
Box plot
Table 8