tools for data analysis - university of warwick · unix history in a nutshell developed at at&t...

37
Foundations of Data Analytics Tools for Data Analysis

Upload: others

Post on 15-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Foundations of Data

Analytics

Tools for Data

Analysis

Page 2: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Objectives

Introduce a broad collection of tools used in data analytics

Outline the capabilities and uses of each tool

– Provide examples of tool usage

Allow you to select the appropriate tools to work with

– Based on your preferences, e.g. GUI or command line

Very quick introduction to each tool

– More information on the web, in the library

CS910 Foundations of Data Analytics2

Page 3: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

The philosophy of tool choice

A huge array of tools is available, with overlapping functionality

A good data analyst knows a good selection of tools

– Can pick the right one for the job

Many people know only one or two (often unsuitable) tools

– Hence, much of the world’s analytics is performed in spreadsheets

Knowing that a tool exists and what it can do is often enough

– Can decide if learning to use it is time-effective

– Will introduce some options here

– Will not give formal training

No single answer to tool selection

– Often a matter of personal choice

CS910 Foundations of Data Analytics3

Page 4: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Unix Tools

“Unix tools” covers many simple tools developed as part of the Unix operating system

– They manipulate data files represented as lines of text: flat files, comma separated value (CSV) files

– Allow simple analysis and data preparation

– Widely available in Linux, MacOS, Windows

“I use all these nearly every day. The best part is, once you know they exist, these tools are available on every unix machine you will ever use. Nothing else (except maybe perl) is as universal – you don’t have to worry about versions or anything. Being comfortable with these tools means you can get work done anywhere – any EC2 instance you boot up will have them, as will any unixserver you ssh into.”

CS910 Foundations of Data Analytics4

Page 5: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Unix History in a nutshell

Developed at AT&T Bell Labs in late 1960’s for PDP11

– Made available in mid-1970’s

– Developed and sold by AT&T in the 1980’s

– Commercial variants emerged: Solaris, SCO…

Standardized via POSIX in 1989

– POSIX: Portable Operating System Interface based on Unix

GNU foundation launched free implementations in 1980s

– Linux started in 1991 as a free POSIX-compliant OS kernel

– Many Linux distributions available: Ubuntu, Fedora, Debian…

CS910 Foundations of Data Analytics5

Page 6: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Tool availability

Available on any Unix machine

Available on any Linux machine

– Such as those in DCS, e.g. joshua

Available on any modern Mac

– Based on BSD kernel

– Open the ‘console’ and type away

On Windows:

– Various ports of individual tools or collections of tools

– Cygwin, open source port of many linux tools to Windowshttp://cygwin.com/install.html

CS910 Foundations of Data Analytics6

Page 7: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Command line tools

These are command line tools – no fancy GUI

Each tool performs a single simple function

– Additional functionality has crept in over time

– Now some are more like a swiss army knife

Can be combined via scripts, piping

Information available on each tool:

– Via ‘man’ command: e.g. man cat

– Via program itself: sort –help

– Via the web: many instructions/examples online

Short course on unix tools from Cambridge:

– http://www.cl.cam.ac.uk/teaching/1213/UnixTools/materials.html

CS910 Foundations of Data Analytics7

nmap.org/movies/

Page 8: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Example Data Set

Show examples using the “adult census data”

– http://archive.ics.uci.edu/ml/machine-learning-databases/adult/

– File: adult.data

~32K individuals, one per line

– Age, Gender, Employment Type, Years of Education…

Widely studied in Machine Learning community

– Prediction task: is income > 50K?

CS910 Foundations of Data Analytics8

Page 9: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Standard input, output and Piping

Unix commands can read and write files

– Special case: standard input (stdin) and standard output (stdout)

– By default, a command reads from stdin, writes to stdout

Some commonly used tools are ‘wc’ and ‘cat’

– wc does a simple wordcount

– cat reads a file, writes it to stdout

– Pipe ‘|’ connects the stdout of one command to stdin of next

Examples:

– cat adult.data | wc

◼ Output: 32562 488415 3974305 [lines words characters]

– cat adult.data| wc | wc

◼ Output: 1 3 24

CS910 Foundations of Data Analytics9

Page 10: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Redirection

Can use < to redirect a file to stdin, and > to redirect stdout

– >> appends to an existing file

Examples:

– wc < adult.data

◼ 32562 488415 3974305

– wc < adult.data > wordcountcat wordcount

◼ 32562 488415 3974305

– cat adult.data | wc >> wordcount

wc options:

– -l / -w / -c : print number of lines / words / characters

CS910 Foundations of Data Analytics10

Page 11: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Basic Commands

ls: list files in a directory

– ls adult

◼ adult.data adult.names adult.test

Options to commands are often single letters preceded by -

– ls –l adult◼ total 5852

-rwx------ 1 grahamc dcsstaff 3974305 Oct 8 18:03 adult.data

-rwx------ 1 grahamc dcsstaff 5229 Oct 8 18:03 adult.names

-rwx------ 1 grahamc dcsstaff 2003153 Oct 8 18:04 adult.test

– ls –la public_html◼ total 5860

drwx------ 2 grahamc dcsstaff 4096 Oct 8 18:04 .

drwx------ 39 grahamc dcsstaff 4096 Oct 8 18:04 ..

-rwx------ 1 grahamc dcsstaff 3974305 Oct 8 18:03 adult.data

-rwx------ 1 grahamc dcsstaff 5229 Oct 8 18:03 adult.names

-rwx------ 1 grahamc dcsstaff 2003153 Oct 8 18:04 adult.test

CS910 Foundations of Data Analytics11

Page 12: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Viewing files: cat, head, tail

cat file shows contents of file

head shows first few lines of a file

– head adult.data◼ 39, State-gov, 77516, Bachelors, 13, Never-married,

50, Self-emp-not-inc, 83311, Bachelors, 13, Married-civ-spouse,

38, Private, 215646, HS-grad, 9, Divorced,

53, Private, 234721, 11th, 7, Married-civ-spouse, Handlers-cleaners,

28, Private, 338409, Bachelors, 13, Married-civ-spouse, Prof-specialty,

37, Private, 284582, Masters, 14, Married-civ-spouse, Exec-managerial,

49, Private, 160187, 9th, 5, Married-spouse-absent, Other-service,

52, Self-emp-not-inc, 209642, HS-grad, 9, Married-civ-spouse,

31, Private, 45781, Masters, 14, Never-married, Prof-specialty,

42, Private, 159449, Bachelors, 13, Married-civ-spouse, Exec-managerial,

Tail shows last few lines of a file– Tail –n 5 adult.data

◼ 40, Private, 154374, HS-grad, 9, Married-civ-spouse, Machine-op-inspct,

58, Private, 151910, HS-grad, 9, Widowed, Adm-clerical, Unmarried, White,

22, Private, 201490, HS-grad, 9, Never-married, Adm-clerical, Own-child,

52, Self-emp-inc, 287927, HS-grad, 9, Married-civ-spouse, Exec-managerial,

CS910 Foundations of Data Analytics12

Page 13: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Viewing files: more or less

more lets you page through a file

– Page down/space to advance

less is a more flexible replacement

– Can page up to go back

– Q to quit

CS910 Foundations of Data Analytics13

Page 14: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

The sort command

sort: sorts the input

Default: sort lines by alphabetic order

– sort adult.data

Configurable

– -r: reverse sort

– -n: numeric sort

– -k: column on which to sort (assume space separates fields)

◼ sort adult.data –k5 | less

◼ sort adult.data –n –k5 | less

– -f: ignore (upper/lower) case

– -m: merge multiple sorted files together

CS910 Foundations of Data Analytics14

Page 15: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Cut

cut: select certain columns from the file

– Default: assume tab separates columns

– -f: specifiy which fields to select

– -c: specify which character positions in each line to select

◼ cut –c1-9 adult.data | head

◼ cut –c1,3,5,7,9 adult.data | head

– -d: specify the field delimiter

◼ cut –f1 adult.data | head

◼ cut –f1,2 –d, adult.data | head

◼ cut –f1,2 –d\ adult.data | head

◼ cut –f1,3,5,7,9 –d, adult.data | head

CS910 Foundations of Data Analytics15

Page 16: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Uniq

uniq: omit (or report) repeated lines

◼ cut –f1 –d, adult.data | uniq | head

◼ cut –f1 –d, adult.data | sort –n | uniq | head

– Count the number of occurrences with -c

◼ cut –f1 –d, adult.data | sort –n | uniq –c | head

◼ cut –f1 –d, adult.data | sort –n | uniq –c | sort –rn | head

CS910 Foundations of Data Analytics16

Page 17: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

grep

grep: search for lines that match some text

◼ grep Masters adult.data | head

◼ grep Masters adult.data | wc –l

– -i: ignore case

– -v: invert behaviour, select non-matching lines

– -An, -Bn: print n lines of context appearing After / Before the match

◼ grep –A1 –B2 Hungary adult.data | less

Can handle regular expressions for flexible matching

◼ grep Married.*England adult.data | less

◼ grep ^90 adult.data | less

CS910 Foundations of Data Analytics17

Page 18: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Grep + regular expressions

Grep’s regular expression syntax:

– ^ : start of line

– $ : end of line

– \ : “escape” next character: \$ to match a $ sign

– [abc] : match any character of abc

– [a-z] : match any character in range a to z

– . (dot) : match any character

– * : match 0 or more occurrences of preceding expression

– \{n\} : match n instances of preceding expression

Example: grep “\(21\)\{2\}” adult.data

egrep for “extended” regular expressions:

◼ egrep “England|Mexico” adult.data | headCS910 Foundations of Data Analytics

18

Page 19: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

sed

sed: stream editor

– Most commonly used to substitute some text for others

– sed ‘s/expression/replacement/g’

◼ sed ‘s/Private/Secret/g’ adult.data | head

◼ sed ‘s/, /\t/g’ adult.data | head

◼ sed ‘s/, /\n/g’ adult.data | head

CS910 Foundations of Data Analytics19

Page 20: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

join

join: do a database-style join on two sorted text files

– -1 n -2 m: try to match n’th field of first file with m’th field of second

– Output all combinations of matches

– e.g. join list of people + postcodes with average income in postcode

Example:

◼ grep United-States -v adult.data | head -n 20 | cut -f 4,14 -d, | sort –k 2 > adult.join1grep United-States -v adult.data | head -n 20 | cut -f 1,14 -d, | sort –k 2 > adult.join2join -1 2 -2 2 adult.join1 adult.join2

CS910 Foundations of Data Analytics20

Page 21: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Editors: nano, pico, emacs

Unix editors were once notoriously unfriendly

– vi, vim, and ed all required memorizing complex commands

Modern editors are now much more usable

– pico and nano are easy to pick up and use

– emacs is very powerful and configurable

If working on a GUI based system, many options

– Local text editors in Windows, Macs, Linux

CS910 Foundations of Data Analytics21 h

ttp

://x

kcd

.co

m/3

78

/

Page 22: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

scripting

Don’t have to write ever longer command lines

Can put sequences of commands into scripts

– With loop controls: automate processing, reduce errors

◼ #/bin/bashfor i in 1 2 do

wc adult.join$ifor ((j=1; j<=2; j++))do

echo $((i+j))done

donedate

CS910 Foundations of Data Analytics22

Page 23: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Programming

Can write programs in your language of choice

– Java: powerful, general purpose language

– Python: popular, mathematical language

– Perl: popular for processing text

Teaching a language is definitely out of scope of this module

– Foundations (CS917) module gives crash course in Java

– You can use any language you know for homeworks, project

◼ Data Analytics is about getting an answer, less about how

Will give a brief introduction to R, a statistical tool/language

CS910 Foundations of Data Analytics23

Page 24: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Tools for working with statistical data: R

R: flexible language with a lot of support for statistical operations

– Successor to ‘S’ language

– Open-source, available in Windows, Mac, Linux, Cygwin

Inbuilt support for many data manipulation operations

– Read in data from CSV (comma-separated values) format

– Compute sample mean, variance, quantiles

– Find line of best fit (linear regression)

– Flexible plotting tools, output to screen or file

– Lots more statistical tools available as libraries

Steep learning curve, but GUIs and help is available

– Will use the R Studio GUI https://www.rstudio.com/products/rstudio/download/

CS910 Foundations of Data Analytics24

Page 25: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Quick example in R

data <- read.csv(“adult.test“, header=F)# read in data in comma-separated value format

summary(data) # show a summary of all attributessummary (data[5]) # show a summary of years of educationd <- table(data[5]) # tabulate the dataplot (d) # plot the frequency distributionplot(ecdf(data[5]$V5)) # plot the (empirical) CDF

data2 <- read.csv(“adult.data”, header=F)qqplot(data[5]$V5, data2[5]$V5), type=“l”) # make a quantile-quantile plot of two (empirical) dbns

pdf(file=“qq.pdf”) # send output to a PDF fileqqplot(data[5]$V5, data2[5]$V5), type=“l”) dev.off() # close the file!quit() # quit!

CS910 Foundations of Data Analytics25

Page 26: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Spreadsheets

Many options: Excel, OpenOffice, Google Spreadsheets

Great for quick viewing, exploration and plotting of small data

– Excel 2003: 65536 rows

– Excel 2007, 2010, 2013, 2016: 1M rows

– Google sheets: up to 256 columns, or up to 200,000 cells

Quick plotting tools:

– Select data to plot, hit ‘plot’ button, fiddle with options

– Sometimes takes a long time to make plots how you want

– Tricky to get multiple plots with the same formatting

CS910 Foundations of Data Analytics26 0

5

10

15

20

0 5 10 15 20

adu

lt.t

est

ye

ars

of

ed

uca

tio

n

Adult.data years of education

Page 27: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Data Processing in Spreadsheets

Decent data manipulation functionality

– Sort, selection, reformatting

– Some tasks more difficult within the spreadsheet metaphor

Limitations of data processing in spreadsheets

– Capacity limits (row limits, cell limits)

– Can’t always keep a record of what was done (repeatability)

◼ Can put sequence of unix tool commands in a script

– Prone to errors: may select wrong range of cells etc.theconversation.com/economists-an-excel-error-and-the-misguided-push-for-austerity-13584

◼ An economics paper argued in favour of austerity measures

◼ Missed out Australia, Austria, Belgium, Canada, and Denmark from calculations, skewing the conclusion

CS910 Foundations of Data Analytics27

Page 28: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Data Processing in Spreadsheets

Sort: select data and click on ‘sort’

Aggregation:

– =sum(range), =count(range), =average(range), =median(range)

=if(test, [value if true], [value if false])

– “Smart filling” lets you drag to extend

=countif(range, condition)

Pivot tables let you explore the data cube

Exercise: compute the number of people from each country in adult.data

– Compare to the effort to do this with unix tools (cut, sort, uniq)

CS910 Foundations of Data Analytics28

Page 29: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Plotting in Excel

Scatter plot of age vs years of education

– Select columnns

– Insert - ‘scatter plot’

Bar chart of gender breakdown

– Derive necessary counts

– Insert - ‘Column’

CS910 Foundations of Data Analytics29

0

2

4

6

8

10

12

14

16

18

0 20 40 60 80 100

Series1

0

5000

10000

15000

20000

25000

Male Female

Series1

Page 30: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Gnuplot

Powerful plotting tool, driven by a script

– Easier to generate multiple, consistent plots

– Write script as a text file

– Call gnuplot scriptname

Pros and cons:

– Flexible output: create PDF, JPG, PNG, EPS, EMF…

– Plot data and functions

– Configure almost every aspect of the output

– Sometimes arcane commands, cryptic abbreviations

CS910 Foundations of Data Analytics30

Page 31: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Gnuplot function plotting

– set term emf enhanced font "Calibri,18" size 600,400set output "pareto.emf" set log yset log xset xrange [1: 1e6]set yrange [1e-6: 1]set format y "10^{%L}”set format x "10^{%L}”unset keyplot x**(-1.0)

– set output "exp.emf"plot x**(-1.0)*exp(-0.0001*x)

– cdf_lognormal(x)=0.5+0.5*erf((x)/sqrt(2.0))set output "lognorm.emf"plot 1.0-cdf_lognormal(0.5*log(0.01*x))

CS910 Foundations of Data Analytics31

10-6

10-5

10-4

10-3

10-2

10-1

100

100

101

102

103

104

105

106

10-6

10-5

10-4

10-3

10-2

10-1

100

100

101

102

103

104

105

106

10-6

10-5

10-4

10-3

10-2

10-1

100

100 101 102 103 104 105 106

Page 32: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Gnuplot data plotting

Scatter plot of age versus years of education:– set term emf enhanced font "Calibri,18"

set output "ageeducation.emf"set title "Age versus Education"set xlabel "Age"set ylabel "Years of Education"set key underplot "adult/adult.data" using 1:5 \with points title 'Adult data'

Add a line of best fit:– y(x)=a*x+b

fit y(x) "adult/adult.data" using 1:5 via a,bplot "adult/adult.data" u 1:5 w p t 'Adult', y(x) w l t ‘Fit'

CS910 Foundations of Data Analytics32

0

2

4

6

8

10

12

14

16

10 20 30 40 50 60 70 80 90

Year

s o

f Ed

uca

tio

n

Age

Age versus Education

Adult data

Page 33: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Gnuplot data plotting

Bar chart of gender breakdown:

– Process data to generate sums:

◼ cut -f 10 -d, adult/adult.data | sort | uniq -c > gendercount.txt

– Gnuplot script:◼ set term emf enhanced font "Calibri,18"

set output "gender.emf"set style data histograms set style histogram cluster gap 1set style fill solid border -1set yrange [0:]plot "gendercount.txt" using 1:xticlabel(2) title " "

CS910 Foundations of Data Analytics33

0

5000

10000

15000

20000

25000

Female Male

Page 34: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Report writing: Wordprocessors

Many options: MS Word, OpenOffice Writer, Google Docs

Adequate for report writing (e.g. project report)

– Nice GUI interface, configurable

– Can be difficult if you have many figures

– 3rd party support for bibliographic data (Endnote)

CS910 Foundations of Data Analytics34

Page 35: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Report writing: LaTeX

LaTeX: a scientific document preparation system

Describe how you want your document to be, and compile it

More of a learning curve, but very powerful

– Stops you getting too involved in fine details

– Support for producing beautiful mathematical formulae

– Produce PDF output easily from LaTeX (text) source file:

◼ pdflatex myfile.tex

– Support automatic bibliography creation via bibtex

– Automatic updating cross-references via \label and \ref

Covered in more detail in CS908 Research Methods

CS910 Foundations of Data Analytics35

Page 36: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

Is this on the test?

From 2014 exam:

Many acceptable answers for each question (and also poor/wrong answers…)

Background reading

Warwick past papers http://www2.warwick.ac.uk/services/exampapers?q=cs910&department=Any&year=Any

http://www.cl.cam.ac.uk/teaching/1213/UnixTools/materials.html

CS910 Foundations of Data Analytics36

Page 37: Tools for Data Analysis - University of Warwick · Unix History in a nutshell Developed at AT&T ell Labs in late 1960s for PDP11 –Made available in mid-1970s –Developed and sold

LaTeX example

\documentclass{article}\usepackage[margin=2cm]{geometry}\usepackage{graphicx}\title{This is my report}\author{Your name}\begin{document}\maketitle\begin{abstract}This is an abstract for the document\end{abstract}

\section{Introduction}This is the introduction to my document

\begin{figure}\includegraphics{figure.pdf}\caption{This is a figure}\label{fig:first}\end{figure}Please see figure~\ref{fig:first}.\end{document}

CS910 Foundations of Data Analytics37