riva quiroga · mauricio vargas riva quiroga. polls! ... 2 326 0.21 premium e si1 59.8 61 3.89 3.84...

Post on 23-Jul-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

@rivaquirogaRiva Quiroga

Translating datasets using {datalang}: the development of {datos} package for the R4DS Spanish translation

#useR2019: the French keyboard experience

#useR2019: the French keyboard experience

> when technologyis not built around yourown language

language gap

language gap

500 dollarsenglish course

450 dollarsminimum wage

language gap

happy R users

2017

2017

2017

2017

2017

2017

🤷

2017

2018

2018

2018

2018

2018

2018

2018

2018

2018

sticker-drivenproject

2018

2018

the translators

translators + reviewers

≠ varieties of Spanish

≠ backgrounds

≠ R experience

≠ github experience

≠ github experience��

2017 2018

2017 2018

translation: text

2017 2018

translation: text + datasets

2017 2018

infrastructure

github.com/cienciadedatos

github.com/cienciadedatos

github.com/cienciadedatos

github.com/cienciadedatos

github.com/cienciadedatos

Edgar Ruiz

github.com/cienciadedatos

Edgar Ruiz Mauricio Vargas

github.com/cienciadedatos

Edgar Ruiz Riva QuirogaMauricio Vargas

polls!

mtcars🚗

mtcars🚗

mtcars🚗 -> mtautos

%>%

%>%

🤯

%>%

2017 2018

translation: text + datasets

2017 2018

translation: text + datasets

programminghistorian.org / @ProgHist

2017 2018

translation: text + datasets

> head(diamonds)

# A tibble: 6 x 10

price carat cut color clarity depth table x y z

<int> <dbl> <ord> <ord> <ord> <dbl> <dbl> <dbl> <dbl> <dbl>

1 326 0.23 Ideal E SI2 61.5 55 3.95 3.98 2.43

2 326 0.21 Premium E SI1 59.8 61 3.89 3.84 2.31

3 327 0.23 Good E VS1 56.9 65 4.05 4.07 2.31

4 334 0.29 Premium I VS2 62.4 58 4.2 4.23 2.63

5 335 0.31 Good J SI2 63.3 58 4.34 4.35 2.75

6 336 0.24 Very good J VVS2 62.8 57 3.94 3.96 2.48

> head(diamantes)

# A tibble: 6 x 10

precio quilate corte color claridad profundidad tabla x y z

<int> <dbl> <ord> <ord> <ord> <dbl> <dbl> <dbl> <dbl> <dbl>

1 326 0.23 Ideal E SI2 61.5 55 3.95 3.98 2.43

2 326 0.21 Premium E SI1 59.8 61 3.89 3.84 2.31

3 327 0.23 Bueno E VS1 56.9 65 4.05 4.07 2.31

4 334 0.29 Premium I VS2 62.4 58 4.2 4.23 2.63

5 335 0.31 Bueno J SI2 63.3 58 4.34 4.35 2.75

6 336 0.24 Muy bueno J VVS2 62.8 57 3.94 3.96 2.48

> diamantes <- rename(diamonds, quilate = carat, corte = cut, …

one possible approach

> diamantes <- rename(diamonds, quilate = carat, corte = cut, … %>%

one possible approach

> diamantes <- rename(diamonds, quilate = carat, corte = cut, … %>%

mutate(corte = case_when(cut == “Good” ~ “Bueno” …

one possible approach

{datalang}

another approach

github.com/edgararuiz/datalang

> library(datalang)> my_spec <- system.file("specs/diamonds.yml", package = "datos")> translate_data(my_spec)

> library(datalang)> my_spec <- system.file("specs/diamonds.yml", package = "datos")> translate_data(my_spec)

precio quilate corte color claridad profundidad tabla x y z

1 326 0.23 Ideal E SI2 61.5 55 3.95 3.98 2.43

2 326 0.21 Premium E SI1 59.8 61 3.89 3.84 2.31

3 327 0.23 Bueno E VS1 56.9 65 4.05 4.07 2.31

4 334 0.29 Premium I VS2 62.4 58 4.2 4.23 2.63

5 335 0.31 Bueno J SI2 63.3 58 4.34 4.35 2.75

6 336 0.24 Muy bueno J VVS2 62.8 57 3.94 3.96 2.48

df:

source: ggplot2::diamonds

name: diamantes

variables:

price:

trans: precio

desc: Precio en dólares americanos

values:

Good: Bueno

Very Good: Muy bueno

Fair: Regular

help:

> head(diamantes)

# A tibble: 6 x 10

precio quilate corte color claridad profundidad tabla x y z

<int> <dbl> <ord> <ord> <ord> <dbl> <dbl> <dbl> <dbl> <dbl>

1 326 0.23 Ideal E SI2 61.5 55 3.95 3.98 2.43

2 326 0.21 Premium E SI1 59.8 61 3.89 3.84 2.31

3 327 0.23 Bueno E VS1 56.9 65 4.05 4.07 2.31

4 334 0.29 Premium I VS2 62.4 58 4.2 4.23 2.63

5 335 0.31 Bueno J SI2 63.3 58 4.34 4.35 2.75

6 336 0.24 Muy bueno J VVS2 62.8 57 3.94 3.96 2.48

> ?diamantes

{datos}

translating book + datasets

{datos}

it’s easy to contribute!

2017 2018 2019

2017 2018 2019

conference-driven package

{datos}

too bulky!

{datos}

create_promise <- function(path, package = "datos") {

spec_loc <- system.file(path, package = package)

spec <- read_yaml(spec_loc)

new_name <- spec$df$name

origin <- spec$df$source

envir <- as.environment(paste0("package:", package))

rlang::env_bind_lazy(

envir,

!! new_name := translate_data(spec_loc, eval(parse_expr(origin))

))

}

{datos}

> datos::diamantesError: diamantes is not an exported object from 'namespace:datos'

{datos}

delayedAssign('diamantes',

eval(parse(file.path(system.file('scripts','diamantes.txt', package = 'datos')))))

cienciadedatos.github.io/datos

cienciadedatos.github.io/datos

Edgar Ruiz

cienciadedatos.github.io/datos

Edgar Ruiz Riva Quiroga Mauricio Vargas

Mauro Lepore

cienciadedatos.github.io/datos

Edgar Ruiz Riva Quiroga Mauricio Vargas

Daniela VázquezMauro Lepore Rayna Harris

datasets

code

translations

reviews

datasets

code

translations

reviews

🎉

datasets

code

translations

reviews

🎉🎉

datasets

code

translations

reviews

🎉🎉

datasets

code

translations

reviews

🎉🎉

but are we near the end of this project?

2018

2018 ∞

what we have created is not just a book translation

and a package

human and technical infrastructure

our biggest contribution developing {datos}

> brand new errors

2018 ∞

Riva Quiroga / @rivaquiroga

top related