other formats for data linked lists, hash tables, json, big data, hadoop & mapreduce. rest....
Post on 12-Jan-2016
214 Views
Preview:
TRANSCRIPT
Other formats for data
Linked lists, Hash tables, JSON, Big Data, Hadoop & MapReduce. REST. Parallel
processing exerciseHomework: Plans for group sorting. Prepare
for RSA talk. Postings
Linked list
• Big array for data• Array of arrays: think of rows• Each row has information + one or more
pointers to other rows. Various ways:– Forward pointing list: next item– Forward and back: next and previous item– Tree: first child item and next sibling
• or first child, next sibling, parent• or first child, next sibling, parent1, parent2
Family example: name, a parent, 1st child, next sibling
Esther -1 1 7
Anne 0 6 2
Jeanine 0 3 -1
Daniel 2 5 4
Aviva 2 -1 -1
Annika 3 -1 -1
Exercise
• Make your family tree• each row has a name, parent1, (optionally
include second parent), first child, next sibling
• you need to start somewhere• Put down Not defined for things not in the
table.• Put down -1 for cases of no children, no
next sibling
Hash tables
• Problem: how to find elements in a table?– no intrinsic order. If there was, you could use
binary search.– Binary search: Compare value (or the key) to
the middle value, if less than, search the lower half, if greater than, search the upper half, keep going…
– Aside: Meyer family geography game
Hash table approach• Have key-value pairs.• Have task of finding if current key is in the table.
– Assume there is a hash function that inputs the key and outputs the hash which corresponds to a slot in the table.
• fixed time to compute the function• go to that spot. If empty, then store key-value there. If not
empty, compare the keys, if it matches, then …. If not, check the next position, continue.
– http://en.wikibooks.org/wiki/Data_Structures/Hash_Tables
Associative array
• Normal arrays use indices, typically starting with 0.
• An associative array uses values. Consider a set of 4 products: table, desk, chair, lamp. An associative array could be used to store the prices:
table=>100, desk=>150, chair=>50, lamp=>20
key-value pairs
• so called key-value pairs is generalization of associative array and used in other systems.
• At its most general, there can be more than one key-value for a given key and the basic software OR your program needs to take care of this situation.
JSON• http://www.json.org/ • Format (syntax) for information
– smaller than XML– available in many language
• name / value pairs – create using brackets. Use dot notation to
access and modify
• arrays– create using square brackets. Square
brackets with indices to access and modify.
Example
var course = {"name":"Topics", "teacher": "Jeanine Meyer", "days": "MR"};course.name =>"Topics"
course.teacher => "Jeanine Meyer"
course.days => "MR"
Example
var list = { "class_list": [ {"firstname":"Groucho", "lastname": "Marx"}, {"firstname":"Harpo", "lastname": "Marx"}, {"firstname":"Zeppo", "lastname": "Marx"}, {"firstname":"Curly", "lastname": "Stooge"} ]};
list[2].firstname => "Zeppo"
Big Data
• buzz word more than specific product• Data that is
– large in Volume– changes rapidly [or application requires up-to-date
values] Velocity– different formats Variable
• PLUS not necessarily all owned by the organization attempting to use it.– in this case, can only query, no changes/updates,
deletions or additions
Note
• A company / organization can store data in its own CLOUD (on servers) or cloud service offered by a vendor and still have total control.– Could even be relational database– Very large data bases, may be just key-value
pairs
Cloud
… can refer to one, some or all of the following
• where the programs are
• where the data is
• where the processors (aka computers) are for doing the calculations
REST
• Representational State Transfer– a "standard" / framework / style of
communicating with Web services– typically, get information in the form of XML or
JSON or something else
• Posting opportunity: find a specific service that provides REST connections….
Parallel processing / distributed processing
• Large amounts (volumes) of data
• Multiple number of processors
• How to speed up accomplishment of tasks?– Embarrassingly parallel refers to tasks that is
easy to parallelize• Take a list of numbers (say, prices) and increase
each by 10%• ?
What about
• Tasks in which some parts can be done in parallel, but some cannot
• How to devise ways to take advantage of multiple processors
Parallel exercise
• Divide into groups of 5
• Each take a deck of cards
• Shuffle
• Devise plan to sort into order– suits hearts, spades, diamonds, clubs, – each suit A, 2, …. J, Q, K
Hadoop
• open source utilities for distributed computing
• http://hadoop.apache.org/
• Includes MapReduce
MapReduce
A MapReduce job
• map sets up tasks to be done in parallel
• reduce combines the results– may be local combine step and then a
reduce across all output steps
• Requires a file system
• Data is in key/value pairs
Applications
• What are applications that using multiple processors for a [big] gain in speed?
Homework
• Come up with improved parallel sorting
• Postings: more on Hadoop, MapReduce, Big Data, etc.
top related