empirejs: hacking art with node js and image analysis

86
Analyzing Japanese Art with Node.js and Computer Vision John Resig

Upload: jeresig

Post on 08-Sep-2014

30 views

Category:

Software


7 download

DESCRIPTION

Talk at EmpireJS, May 6th, 2014.

TRANSCRIPT

Page 1: EmpireJS: Hacking Art with Node js and Image Analysis

Analyzing Japanese Art with Node.js and Computer Vision John Resig

Page 2: EmpireJS: Hacking Art with Node js and Image Analysis
Page 3: EmpireJS: Hacking Art with Node js and Image Analysis
Page 4: EmpireJS: Hacking Art with Node js and Image Analysis
Page 5: EmpireJS: Hacking Art with Node js and Image Analysis
Page 6: EmpireJS: Hacking Art with Node js and Image Analysis
Page 7: EmpireJS: Hacking Art with Node js and Image Analysis
Page 8: EmpireJS: Hacking Art with Node js and Image Analysis
Page 9: EmpireJS: Hacking Art with Node js and Image Analysis
Page 10: EmpireJS: Hacking Art with Node js and Image Analysis
Page 11: EmpireJS: Hacking Art with Node js and Image Analysis
Page 12: EmpireJS: Hacking Art with Node js and Image Analysis
Page 13: EmpireJS: Hacking Art with Node js and Image Analysis
Page 14: EmpireJS: Hacking Art with Node js and Image Analysis
Page 15: EmpireJS: Hacking Art with Node js and Image Analysis
Page 16: EmpireJS: Hacking Art with Node js and Image Analysis
Page 17: EmpireJS: Hacking Art with Node js and Image Analysis

Lot 55: 20 Japanese Woodblock PrintsEach depicting a female/Geisha figure with calligraphy throughout each print. Prints measure 13.75" H x 9.375" W. Toning to each print, some losses around edges.

Estimated Price: $400 - $600

Page 18: EmpireJS: Hacking Art with Node js and Image Analysis

Step 1: Acquire and read tons of expensive books.

Page 19: EmpireJS: Hacking Art with Node js and Image Analysis

Step 2: Learn to read Japanese. *

Japanese from the 17th to 19th century. *

You’re not going to learn this from Rosetta Stone.

Page 20: EmpireJS: Hacking Art with Node js and Image Analysis

Step 3: Learn to read Japanese calligraphy.

Page 21: EmpireJS: Hacking Art with Node js and Image Analysis

Solution: A fast-loading, responsive, i18ned, web site: Ukiyo-e.org

Page 22: EmpireJS: Hacking Art with Node js and Image Analysis
Page 23: EmpireJS: Hacking Art with Node js and Image Analysis
Page 24: EmpireJS: Hacking Art with Node js and Image Analysis
Page 25: EmpireJS: Hacking Art with Node js and Image Analysis
Page 26: EmpireJS: Hacking Art with Node js and Image Analysis

https://github.com/jeresig/i18n-node-2

var greeting = i18n.__('Hello %s, how are you today?', 'Marcus');

i18n.__n('%s cat', '%s cats', 3);

Node i18n 2 (npm install i18n-2)

setLocaleFromSubdomain([request])

Page 27: EmpireJS: Hacking Art with Node js and Image Analysis

https://github.com/jeresig/i18n-node-2

{! "Hello": "Hello",! "Hello %s, how are you today?": "Hello %s, how are you today?",! "weekend": "weekend",! "Hello %s, how are you today? How was your %s.": "Hello %s, how are you today? How was your %s.",! "Hi": "Hi",! "Howdy": "Howdy",! "%s cat": {! "one": "%s cat",! "other": "%s cats"! },! "There is one monkey in the %%s": {! "one": "There is one monkey in the %%s",! "other": "There are %d monkeys in the %%s"! },! "tree": "tree"!}!

Node i18n 2 (npm install i18n-2)

Page 28: EmpireJS: Hacking Art with Node js and Image Analysis

Digital Ocean

Amazon S3

Amazon Cloudfront

Digital Ocean

ImagesData

(HTML, XML, JSON)

Images JS, CSS

Images JS, CSSnginx

(w/ cache)

node.js express

node.js express

naught

mongodb ElasticSearch

Scraper

Page 29: EmpireJS: Hacking Art with Node js and Image Analysis
Page 30: EmpireJS: Hacking Art with Node js and Image Analysis

https://github.com/jeresig/jquery-imgscrubber

Page 31: EmpireJS: Hacking Art with Node js and Image Analysis

Collecting Tons of Woodblock Print Data

Page 32: EmpireJS: Hacking Art with Node js and Image Analysis

Search

Page Page Page

HTML

Image

HTML

Image

HTML

Image

Search

Page Page Page

HTML

Image

HTML

Image

HTML

Image

Queue-based Crawling using PhantomJS

Processing Queue

Page 33: EmpireJS: Hacking Art with Node js and Image Analysis

Some Website

WebKit

PhantomJS

CasperJS

SpookyJS

Save DataXML Files

Mongo Log

libxml (+ xpath)

MongoDB

Extract Data

Process Data

Artists

Images

Correct Artist and Date

Add to Site!

Page 34: EmpireJS: Hacking Art with Node js and Image Analysis

module.exports = function() {! return {! scrape: [! {! start: "http://ukiyo-e.org/search",! visit: "//a[@class='img']",! next: "//a[contains(@rel,'next')]"! },! {! extract: {! "title": "//p[contains(@class, 'title')]//span",! "dateCreated": "//p[contains(@class, 'date')]//span",! "artists[]": "//p[contains(@class, 'artist')]//a",! "images[]": "//div[contains(@class,'imageholder')]//a/@href"! }! }! ]! };!};!

Page 35: EmpireJS: Hacking Art with Node js and Image Analysis

"surname" : "Hashimoto", "surname_kana" : "はしもと", "name" : "Hashimoto Okiie", "ascii" : "Hashimoto Okiie", "plain" : "Hashimoto Okiie", "kana" : "はしもとおきいえ", "_id" : ObjectId("530c0825d9a80976b2000437") } ], "names" : [ { "original" : "Hashimoto Okiie (橋本興家)", "locale" : "ja", "kanji" : "橋本興家", "given" : "Okiie", "given_kana" : "おきいえ", "surname" : "Hashimoto", "surname_kana" : "はしもと", "given_kanji" : "興家", "surname_kanji" : "橋本", "name" : "Hashimoto Okiie", "ascii" : "Hashimoto Okiie", "plain" : "Hashimoto Okiie", "kana" : "はしもとおきいえ", "_id" : ObjectId("530c0825d9a80976b2000439") } ], "extract" : [ "53dfc997cbf9fa7501d78e4820b24a9c" ], "created" : ISODate("2014-02-25T03:04:05Z"), "__v" : 0 }

Page 36: EmpireJS: Hacking Art with Node js and Image Analysis

“Stack Scraper”

https://github.com/jeresig/stack-scraper

https://github.com/jeresig/ukiyoe-scrapers

Page 37: EmpireJS: Hacking Art with Node js and Image Analysis

Image Similarity

Page 38: EmpireJS: Hacking Art with Node js and Image Analysis

https://github.com/jeresig/node-matchengine

Page 39: EmpireJS: Hacking Art with Node js and Image Analysis
Page 40: EmpireJS: Hacking Art with Node js and Image Analysis
Page 41: EmpireJS: Hacking Art with Node js and Image Analysis
Page 42: EmpireJS: Hacking Art with Node js and Image Analysis
Page 43: EmpireJS: Hacking Art with Node js and Image Analysis
Page 44: EmpireJS: Hacking Art with Node js and Image Analysis
Page 45: EmpireJS: Hacking Art with Node js and Image Analysis
Page 46: EmpireJS: Hacking Art with Node js and Image Analysis
Page 47: EmpireJS: Hacking Art with Node js and Image Analysis

Image Similarity Search

Page 48: EmpireJS: Hacking Art with Node js and Image Analysis
Page 49: EmpireJS: Hacking Art with Node js and Image Analysis
Page 50: EmpireJS: Hacking Art with Node js and Image Analysis
Page 51: EmpireJS: Hacking Art with Node js and Image Analysis
Page 52: EmpireJS: Hacking Art with Node js and Image Analysis
Page 53: EmpireJS: Hacking Art with Node js and Image Analysis
Page 54: EmpireJS: Hacking Art with Node js and Image Analysis

Idyll: Offline Image Cropping

• https://github.com/jeresig/idyll

• Crop images offline and on a mobile device.

• Saves the selections back to a server.

• Data is synced and saved using HTML 5 appcache.

• https://github.com/jeresig/node-appcache-glob

Page 55: EmpireJS: Hacking Art with Node js and Image Analysis

by David Chesterat Shutterstock

https://github.com/dchester/perl-image-crop-calibration-target

Page 56: EmpireJS: Hacking Art with Node js and Image Analysis

http://www.ersatzlabs.com/

Page 57: EmpireJS: Hacking Art with Node js and Image Analysis

Aiding Woodblock Print Studies with Image Analysis

Page 58: EmpireJS: Hacking Art with Node js and Image Analysis
Page 59: EmpireJS: Hacking Art with Node js and Image Analysis
Page 60: EmpireJS: Hacking Art with Node js and Image Analysis
Page 61: EmpireJS: Hacking Art with Node js and Image Analysis
Page 62: EmpireJS: Hacking Art with Node js and Image Analysis
Page 63: EmpireJS: Hacking Art with Node js and Image Analysis
Page 64: EmpireJS: Hacking Art with Node js and Image Analysis
Page 65: EmpireJS: Hacking Art with Node js and Image Analysis
Page 66: EmpireJS: Hacking Art with Node js and Image Analysis
Page 67: EmpireJS: Hacking Art with Node js and Image Analysis
Page 68: EmpireJS: Hacking Art with Node js and Image Analysis

Correcting Print Data

Page 69: EmpireJS: Hacking Art with Node js and Image Analysis

Japanese Names

• Utagawa Hiroshige

• Ando Hiroshige

• Andō Hiroshige

• Hiroshige

• 歌川広重 • 広重

Page 70: EmpireJS: Hacking Art with Node js and Image Analysis

安土 安堂 安島 安東 安籐 安藤 安道 安達 阿藤

Andō

Page 71: EmpireJS: Hacking Art with Node js and Image Analysis

安藤

andō antō anzō

yasuzuka

A many-to-many mapping!

Page 72: EmpireJS: Hacking Art with Node js and Image Analysis

Sharaku Toshusai

東洲斎写楽

Page 73: EmpireJS: Hacking Art with Node js and Image Analysis

Sharaku Toshusai

東洲斎写楽

Is this the family name?Where are the stress marks?

How do you “split” this name?

Which name partscorrelate?

Page 74: EmpireJS: Hacking Art with Node js and Image Analysis

Tools (all are Node modules!)

• https://github.com/lovell/hepburn

• https://github.com/jeresig/node-enamdict

• https://github.com/jeresig/node-ndlna

• https://github.com/jeresig/node-romaji-name

ndlnahepburn enamdict

romaji-name

Page 75: EmpireJS: Hacking Art with Node js and Image Analysis

Hepburn

• https://github.com/lovell/hepburn

• Takes in the English form of a Japanese word.

• Returns it written in Hiragana or Katakana (phonetic Japanese alphabets).

ndlnahepburn enamdict

romaji-name

うたがわひろしげUtagawa Hiroshige

Page 76: EmpireJS: Hacking Art with Node js and Image Analysis

Enamdict

• https://github.com/jeresig/node-enamdict

• Downloads and queries the ENAMDICT database

• (A mapping of Japanese proper names to Hiragana and English.)

• Used to correct typos and figure out surname/given name.

ndlnahepburn enamdict

romaji-name

Page 77: EmpireJS: Hacking Art with Node js and Image Analysis

NDLNA

• https://github.com/jeresig/node-ndlna

• Queries the NDLNA database

• Finds the correct Kanji for an English name.

• Or the correct English for a Kanji name.

ndlnahepburn enamdict

romaji-name

Page 78: EmpireJS: Hacking Art with Node js and Image Analysis

ndlnahepburn enamdict

romaji-name

Page 79: EmpireJS: Hacking Art with Node js and Image Analysis

{ "original" : "Sharaku Toshusai (東洲斎写楽 )", "locale" : "ja", "kanji" : "東洲斎写楽", "given" : "Sharaku", "given_kana" : "しゃらく", "surname" : "Tōshūsai", "surname_kana" : "とおしゅうさい", "surname_kanji" : "東洲斎", "given_kanji" : "写楽", "name" : "Tōshūsai Sharaku", "ascii" : "Tooshuusai Sharaku", "plain" : "Toshusai Sharaku", "kana" : "とおしゅうさいしゃらく" }

Page 80: EmpireJS: Hacking Art with Node js and Image Analysis

Dates

• https://github.com/jeresig/node-yearrange

var yr = require("yearrange");!"yr.parse("1877")!// {"start": 1877, "end": 1877}!"yr.parse("1847-48")!// {"start": 1847, "end": 1848}!"yr.parse("ca. 1810-20s")!// {"start": 1810, "end": 1829, "circa": true}!"yr.parse("18th–19th century")!// {"start": 1700, "end": 1899}!"yr.parse("Meiji era")!// {"start": 1868, "end": 1912}

Page 81: EmpireJS: Hacking Art with Node js and Image Analysis

Artist Rectification

Page 82: EmpireJS: Hacking Art with Node js and Image Analysis
Page 83: EmpireJS: Hacking Art with Node js and Image Analysis
Page 84: EmpireJS: Hacking Art with Node js and Image Analysis

Miyagawa Shuntei

Printed in 1897

Sold for: $550

Prints sell for $100-$400 individually

True Estimate: $2100 - $8400 ** You just have to find someone willing to buy them!

Page 85: EmpireJS: Hacking Art with Node js and Image Analysis
Page 86: EmpireJS: Hacking Art with Node js and Image Analysis

• http://ejohn.org/research/

• http://ukiyo-e.org/

• https://github.com/jeresig