storage types display format string numeric (dis)connect

25
Storage Types Display Format String Numeric (Dis)connect Characters Jeehoon Han [email protected] Fall 2017 Jeehoon Han [email protected] Storage Types Display Format String Numeric (Dis)connect Cha

Upload: others

Post on 15-Oct-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Storage Types Display Format String Numeric (Dis)connect

Storage TypesDisplay Format

String ↔ Numeric(Dis)connect Characters

Jeehoon [email protected]

Fall 2017

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 2: Storage Types Display Format String Numeric (Dis)connect

Storage TypesI Storage types

I Numbers (digits of accuracy)I Integers: byte(2), int(4), long(9)

I Floating points: float(7), double(16)

I Strings: str1, str2, ..., str#

where str# can hold words with # characters or less

I The default storage type is float

I Storing a variable containing numbers > 7 digitsI 8-9 digit integer: gen long varname

I Otherwise: gen double varname

I Changing the storage type of an existing variable:recast type varname

I Use compress to save memory by storing variables in the smallesttypes without losing precision

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 3: Storage Types Display Format String Numeric (Dis)connect

Storage TypesI Storage types

I Numbers (digits of accuracy)I Integers: byte(2), int(4), long(9)

I Floating points: float(7), double(16)

I Strings: str1, str2, ..., str#

where str# can hold words with # characters or less

I The default storage type is float

I Storing a variable containing numbers > 7 digitsI 8-9 digit integer: gen long varname

I Otherwise: gen double varname

I Changing the storage type of an existing variable:recast type varname

I Use compress to save memory by storing variables in the smallesttypes without losing precision

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 4: Storage Types Display Format String Numeric (Dis)connect

Storage Types: ExampleI set obs 1

gen var = 0.2

tab var if var == 0.2

⇒ no observation

I ProblemsI Numbers are stored in binary form and most decimals have no

exact representations in binary (0.2 → 0.00110011...)

I 0.2 is stored as 0.20000000298023224 in float

0.20000000000000001 in double

I When you create the variable var, 0.2 is stored in float

but Stata does all calculations in double precision

I Two ways to deal with this issueI Store data as double

I tab var if var==float(0.2)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 5: Storage Types Display Format String Numeric (Dis)connect

Storage Types: ExampleI set obs 1

gen var = 0.2

tab var if var == 0.2

⇒ no observation

I ProblemsI Numbers are stored in binary form and most decimals have no

exact representations in binary (0.2 → 0.00110011...)

I 0.2 is stored as 0.20000000298023224 in float

0.20000000000000001 in double

I When you create the variable var, 0.2 is stored in float

but Stata does all calculations in double precision

I Two ways to deal with this issueI Store data as double

I tab var if var==float(0.2)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 6: Storage Types Display Format String Numeric (Dis)connect

Storage Types: ExampleI set obs 1

gen var = 0.2

tab var if var == 0.2

⇒ no observation

I ProblemsI Numbers are stored in binary form and most decimals have no

exact representations in binary (0.2 → 0.00110011...)

I 0.2 is stored as 0.20000000298023224 in float

0.20000000000000001 in double

I When you create the variable var, 0.2 is stored in float

but Stata does all calculations in double precision

I Two ways to deal with this issueI Store data as double

I tab var if var==float(0.2)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 7: Storage Types Display Format String Numeric (Dis)connect

Display Format

I Specify the display formatformat varlist %fmt

I Numeric formatsI Fixed format: %w.df

General format: %w.dg

where w : the total width of the displayd : the number of decimals (fixed format)

For general format, Stata decides the number of decimals todisplay (if d > 0, d indicates the maximum number of decimalplaces)

I String format: %wswhere w : the width of characters

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 8: Storage Types Display Format String Numeric (Dis)connect

Display Format: example

I Default formatbyte, int: %8.0glong: %12.0gfloat: %9.0gdouble: %10.0g

I Examplesclear

set obs 1

gen double pi = 3.1415926535

list pi ⇒ 3.1415927format pi %8.0 g⇒ 3.14159format pi %8.5 f⇒ 3.14159

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 9: Storage Types Display Format String Numeric (Dis)connect

Inspecting DataI sysuse uslifeexp, clear

browse

I list

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 10: Storage Types Display Format String Numeric (Dis)connect

Inspecting DataI describe

I codebook region

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 11: Storage Types Display Format String Numeric (Dis)connect

Strings (pure text)� NumericsI String variable → numeric variable

encode country, gen(country code)

I Numeric variable → string variabledecode country code, gen(county str)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 12: Storage Types Display Format String Numeric (Dis)connect

Strings (numeric text)� Numerics

I String variable → numeric variable

I destring varlist, {gen(varname)|replace} [option]

I [option]

I ignore(‘‘chars’’): remove the nonnumeric charactersspecified

I force: treat any values containing nonnumeric characters asmissing values

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 13: Storage Types Display Format String Numeric (Dis)connect

Strings (numeric text)� Numerics

I Example:use http://www.stata-press.com/data/r13/destring2

I destring price, gen(priceA) ignore(‘‘$ ,’’)

destring price, gen(priceB) force

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 14: Storage Types Display Format String Numeric (Dis)connect

Strings (numeric text)� Numerics

I Numeric variable → string variableI tostring varlist, {gen(varname)|replace} [option]

I [option]

I format(%fmt): convert using specified formatI force: convert to string even if it entails information loss

I tostring priceA, gen(price strA)

tostring priceA, gen(price strB) format(%8.1f) force

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 15: Storage Types Display Format String Numeric (Dis)connect

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 16: Storage Types Display Format String Numeric (Dis)connect

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year =

substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 17: Storage Types Display Format String Numeric (Dis)connect

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month =

substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 18: Storage Types Display Format String Numeric (Dis)connect

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day =

substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 19: Storage Types Display Format String Numeric (Dis)connect

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 20: Storage Types Display Format String Numeric (Dis)connect

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 21: Storage Types Display Format String Numeric (Dis)connect

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer =

substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 22: Storage Types Display Format String Numeric (Dis)connect

Disconnect the Characters of VariablesI substr(str,(-)n,m): extract a substring from str starting at

position n (from the end of a string) for a length of m

I gen year = substr(date,1,4)

gen month = substr(date,6,2)

gen day = substr(date,-2,.)

I gen length = strlen(priceA)

gen decimal = substr(date,-2,.)

gen integer = substr(date,1,length-3)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 23: Storage Types Display Format String Numeric (Dis)connect

Connect the Characters of Variables

I gen date1 = year+‘‘ ’’+month+‘‘ ’’+day

egen date2 = concat(year month day), punct(‘‘ ’’)

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 24: Storage Types Display Format String Numeric (Dis)connect

SortI Arrange the data in ascending order

I sysuse uslifeexp, clear

I sort le sort year le

I gsort -year

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters

Page 25: Storage Types Display Format String Numeric (Dis)connect

Sort: Applications

I Create a lagged variableI sort year

gen le lag = le[ n-1]

I Finding duplicatesI sort year

list if year == year[ n-1]

Jeehoon Han [email protected] Storage Types Display Format String ↔ Numeric (Dis)connect Characters