open library at make books apparent

http://flic.kr/p/4Pg28f

Hello.

My name is George Oates, and I’m leading the Open Library project.

http://www.lib.cam.ac.uk/exhibitions/Fantasy_to_Federation/Bellin1753.jpg


http://flic.kr/p/6iCLgP

Joined about 6 months agoRedesigning everything, and I thought I’d tell you a little bit about that.



First Steps

• Listen to people Answer help emails

• Meet team in person Met in San Francisco in June

• Streamline deploys 1 button!

• Redraw sitemap Refocus on core

• Dream a little Ask silly questions, assess competition

Get acclimatized

http://flic.kr/p/6xCJQS

Understand relationships

So, what have we got, and how does it all inter-relate?

Any relationship can be made into a hyperlink.



twitter.com/openlibrary

Reach into the network

- we’ve also arranged a little Flickr integration, so if people take photos of books, they can link them to Open Library records. We’re not using them yet.- as you may have noticed last night, we also added a link from Internet ARchive book pages into Open Library. We reckon that’s almost doubled our modest traffic. (About 250k unique IPs per day)

Challenges

• Dense library metadata

• Designed for classic institutional search/retrieve practice

• Data is “dry”, sometimes poor quality

• No insight into the community

• Distributed team US, India, UK

-so one thing I began was to start reading and answering enquiries that come to info@openlibrary (this is a good thing to have new people do for a while)- found that some questions repeated themselves and there was a key mismatch in understanding what Open Library was about. e.g. people would write in to us asking us to correct errors, not knowing they were able to do it themselves.

There are 4 Agatha Christies in this list, 2 of which appear to the eye to be identical. Computers have trouble recognising that these Authors are the same woman. It’s easy for humans to do. How could we build a UI to help people help us to merge these duplicates?

What have we got?

• Loads of data 23 million records

• Small user base < 20,000

• Small team 6 people

• Small architecture 12 servers

• Good framework infogami, web.py

Certainly there are challenges to trying to make use of a large but shallow dataset, but Open Library has lots of advantages in terms of a small team & system being able to change rapidly. This flexibility will hopefully help us.

Began experimenting with the data we have to try to see the catalog “landscape”. What do we already have that we’re not showing to people yet? Look at all these subjects! These timeframes! How can we make use of them?

Look at all these new links! ISBN -> Publisher names -> Show me all the books this publisher has published... Show me all the subjects related to cheese... Add links and hey presto! You’re bouncing around the catalog.

What if?

• Adjacent books

• Not efficiency, but effectiveness (conversation broker, records improve over time) - Shirky

• Not a purchasing engine, but a library

As an exercise, it’s fun to ask what might happen if there were no search box on Open Library? Could you still use it?

Changing the look of the logo will hopefully encourage people to come inside and look around. Break the conventional “library look” and try to warm it up a little... We are literally open - both at the software level, but also all of Open Library’s records are editable, by anyone.

Add a Book?

So, let’s take a look at one of the key UIs on Open Library - How to add a new record. This is the current form. Basically just a web UI to a pretty dense, librarian-centric form. A lot of the fields are difficult for not-librarians to complete - a definite barrier to entry for both adding new records and editing existing things.

The idea is to break it into two steps. This is step 1.

The most important thing to do is to make it feel easy to add a record. This first step also gathers enough info to allow us to do a decent search for any existing records. If we find a match, we can direct people towards the Edit view of that record. If there’s no match, we move on...

Step 2 is a massive form. There’s no way to hide that basically. All the fields are potentially useful. What we can do is organize the info a little, so related things (the physical object, pagination) are grouped together. We’re also going to try adding a tabbed view to try to soften the blow a little. Also, hopefully, adopting a conversational tone with the form labels might help direct people a little more about the sort of data we want.

It would be awesome if we could start to collect excerpts from books. A personal touch from people about particular bits they’ve enjoyed and why. Also, these excerpts could be indexed to help boost books in our search.

Links, links, links.... This “networked catalog” is all about how many things we can connect books to. This is the principal of metadata giving records a sort of “surface tension” to keep them from sinking into the depths.

Those first 3 tabs (About, Excerpts, Links) are about the Work level of our records. We’re going to try this first version not worrying about exposing this slightly weird metadata-y thing called Work to visitors, but still attempt to collect data at the Work-y level. There’s a specific tab just for Editions too, that contain fields mainly about publishing info and the physical (or virtual!) object itself.

Another experiment we’re looking forward to trying is about identifiers. We’re not particularly concerned about canonical identifiers. Perhaps it’s a waste of time to wait for one, so instead, we’re going to try and attach as many ID types to our records as we can. (This list is just a braindump - not active yet.) The idea is that people could add a URL or actual identifier and Open Library would just do the right thing. A suggestion (after this presentation was delivered) was that people could ping Open Library with an identifier, not even knowing what TYPE of ID it is. Perhaps Open Library could help “triangulate” this query towards a book record. “Record laundering.”

Key Features• History

• Activity, life, cause, effect

• Notifications thereof

• List(s)

• More small, ad hoc collections

• Public / private

• Exportable (ad hoc catalogs)

- Planning two features that play off the strengths of the underlying Wiki: History & Lists- AD HOC (so, BookServer feeds should be expected to be ad hoc. No point in trying to agree on a hierarchy etc for feeds. Waste of time.)

We’re excited about how we might improve the display and linkage from history of our records. They are another source of connections into and around the catalog, so we should “activate” them where we can to connect to people, subjects, publishers, even dates. “See everything that happened on Open Library on May the 4th, 2009. Version 1 probably won’t be quite this robust :)

Tension? http://flic.kr/p/6zyU3U

http://flic.kr/p/6zyU3U


http://loc.gov

- I’m not sure how much we’re going to be able to assist the Library of Congress

http://www.flickr.com/photos/baboon/405064021/

http://www.flickr.com/photos/baboon/405064021/

Small Collections

http://flic.kr/p/34WGhL

• Catalogues to & from from book lovers who may or may not be professional librarians

• Effective & Personal; Inefficient & Charming, Detailed• Looking to integrate cool cataloging services like Koha, Delicious Monster -

Anyone??• It was only last night I met a woman who is cataloguing a business’s library of

some 1,100 books. She had said she was looking on Open Library for a way to upload a CSV file to us. We should do that, and note it on each edition’s history. (*Note: Design that CSV and get it online!)



History http://flic.kr/p/6NHecm

- there was some talk about timestamps yesterday. Being able to slice things by time will only increase in importance as the web gets older, so, I’d suggest putting timestamps on anything you can think of.



http://flic.kr/p/4itJcB

Substrate:any surface on which a plant or animal lives or on which a material sticks



http://flic.kr/p/4itJcB

What if we position library records like that?



“Build it so anyone can contribute any amount.”

Clay Shirky

http://flic.kr/p/v5uNz

The act of adding a book to a library catalog is a bit like playing tetris.



http://flic.kr/p/6pmtQL

But, librarians are (very clever) humans too. And everyone who’s responsible for putting books into a traditional catalogue must work within patterns. Patterns that have grown semantically remarkable and deeply complex.



http://flic.kr/p/6pmtQL

"But here’s a question for you, let’s say you have an 856 URL to full text for a serial. And you know what date ranges it covers. What sub-field would you put that in? $3 or $z? I see it in both."

Jonathan Rochkind, Bibliographic Wilderness

I’m glad I don’t have to either ask or answer this question.



“Library metadata is diabolically rational.”

Karen Coyle, kcoyle.net

http://www.lib.cam.ac.uk/exhibitions/Fantasy_to_Federation/Blaeu.jpg

Hic sunt dracones.

A detail from a map of the East Indies showing, outlined in pink, the first European discoveries along the Cape York Peninsula. Early in 1606, towards the northern tip of the peninsula, Willem Jansz made here what was almost certainly the first landing by Europeans in Australia. This map first appeared in 1635 and was reprinted unchanged until 1664.



http://www.lib.cam.ac.uk/exhibitions/Fantasy_to_Federation/Blaeu.jpg

Here be dragons.

A detail from a map of the East Indies showing, outlined in pink, the first European discoveries along the Cape York Peninsula. Early in 1606, towards the northern tip of the peninsula, Willem Jansz made here what was almost certainly the first landing by Europeans in Australia. This map first appeared in 1635 and was reprinted unchanged until 1664.




This is one of the few maps in the eighteenth century devoted entirely to Australia. Jacques Bellin was hydrographer to the French King Louis XIV. He has added a hypothetical coast line joining Australia, New Guinea and Tasmania - a note says that this is included without proof. It is further suggested that New Zealand might be part of the great southern continent.



I wonder if librarians are trying to make catalogs look like this... Highly “accurate”; deeply organized; the perfect information system...

http://flic.kr/p/38TZ

What if a catalog looks like this? Is crystalline?

From the artist of this iamge, Jared Tarbell: “Lines like crystals form at perpendicular angles to existing lines. A complex form emerges. 1000 classic computational substrate, color palette stolen from Jackson Pollock: A simple perpendicular growth rule creates intricate city-like structures. The simple rule, the complex results, the enormous potential for modification; this has got to be one of my all time favorite self-discovered algorithms. Lines likes crystals grow on a computational substrate.”



http://flickr.com/photos/tupwanders/3356077817/

Deconstruction

I’ve learned a wee bit about the history of library metadata... And museum metadata for that matter.... It seems like the 1960s are a bit of a blight for human understanding, since that’s the time when we got all excited about computers and their processing power, and seemingly overwrote a lot of the crafty, poetic description and allusion that was done to describe cultural works, in favour of the Tetris approach.

What happens if you blow it up?



60013 $a Marie Antoinette $c Queen, Consort of Louis XVI, King of France $d 1755-1793

650 2 $a Queens $z France $v Biography 1 $a Queens $z France $x Biography

651 2 $a France $x History $y Louis XVI, 1774-1793 1 $a France $x History $y Revolution, 1789-1799 1 $a France $x Queens $x Biography

- I don’t want Open Library to jettison librarianship, or neglect to acknowledge the brilliant hard work of librarians over the years...- You could argue that this sort of computer-y librarianship (or any type of “educated classification”) was (perhaps unintentionally) designed to obscure the personal... the practical... the human

- How might we adapt or extend (or revert?) this librarians’ work to appeal to a broader audience?- Let’s see what happens when you explode Library of Congress Subject Headings. This data isn’t even in Open Library - we borrowed it from loc.gov then pulled out the dynamite...

600 (people)13 $a Marie Antoinette $c Queen, Consort of Louis XVI, King of France $d 1755-1793

650 (subjects) 2 $a Queens $z France $v Biography 1 $a Queens $z France $x Biography

651 (places) 2 $a France $x History $y Louis XVI, 1774-1793 1 $a France $x History $y Revolution, 1789-1799 1 $a France $x Queens $x Biography

These numbers are subsections of a thing called a MARC record - MAchine-Readable CatalogingSince librarianship is “diabolically rational” of course, everything is in it’s place, whether it’s a reference to a person, a place, a thing, an author or, whatever...

(people)Marie Antoinette, Louis XVI

(subjects)Queens, France, Biography

(places)France, History, Louis XVI, 1774-1793, Revolution, 1789-1799, Queens, Biography

So, if we get rid of all that machine readable gumpf, we start to have things that humans can parse as well...

Marie Antoinette, Louis XVI, Queens, France, Biography, History, 1774-1793, Revolution, 1789-1799

Marie Antoinette, Louis XVI, Queens, France, Biography, History, 1774-1793, Revolution, 1789-1799

Then, make them into links, but retain their interconnection.

SubjectRelated subjects

Books about...

“Collections”

Related authors

Information from the network

Publishing overtime

If it’s a place, show a map!

SubjectRelated subjects

Books about...

“Collections”

Related authors

Information from the network

Publishing overtime

If it’s a place, show a map!

openlibrary.org/subjects/places/bordeaux

Give it a URL

I used to use this image to represent contact networks on Flickr, but I think itʼs equally applicable as a visual for what a networked library catalog might look like. How many things can we connect book records to? Not only identifiers, but blog posts, reviews, subjects, publishers, booksellers etc etc

http://flickr.com/photos/swamibu/3191787234/

Release

- launch with what we’ve got - the records are still the same... just easier to skip around- allow people to collect books around them, and then share or export that collection



Connect

- exploring partnerships, connections- reach into existing networks- Library Thing, Good Reads, open source systems, etc- open data, improve API

http://flickr.com/photos/odreiuqzide/3195647925/

Observe

- see what people do- provide tools to let people see what everyone else is doing- monitor activity, like popular records, top editors, sign ups per day etc- and ABOVE ALL, participate!!!



Library Catalogs

Web Services

Small Collections

TidbitsEnhance

Streamline

Inhale

Gather

• Navigation

• Key Processes

• Branding

• Recognition

• Workflow

• Updates

• Expansion

• Respect

• Content, content, content

• BookServer

• APIs in & out

• Contribution

• Curated content

• Original clusters

To summarise, here are the 4 levels of stuff we’re trying to focus on in the coming months...

Next Steps

• We’re hiring! SOLR, Sys Admin, Web Dev

• Find money! Want to join forces?

• Release the redesign And watch what happens...

Short term... Want to come and work on an awesome project playing with the very nature of a library catalog? Let me know!

Thank [email protected]

http://flickr.com/photos/roadsidepictures/244926428/

mailto:[email protected]

mailto:[email protected]



open library at make books apparent

Education

rst european

willem jansz

east indies

cape york

map rst appeared

ad hoc

library metadata

rst landing