news data at the british library

12
News data at the British Library Luke McKernan Lead Curator, News and Moving Image Working with news data across different media 7 September 2015

Upload: lukemckernan

Post on 15-Apr-2017

500 views

Category:

News & Politics


1 download

TRANSCRIPT

Page 1: News data at the British Library

News data at the British Library

Luke McKernan

Lead Curator, News and Moving Image

Working with news data across different media

7 September 2015

Page 2: News data at the British Library

www.bl.uk 2

Map of news stories in the UK as read via Twitter (created using bit.ly links), Guardian Datablog, 16 May 2012

Changing news

Page 3: News data at the British Library

www.bl.uk 3

Moving from a world-class newspaper service to a world-class news service

Newspapers, television, radio and Web news

Reflection of the significant changes in news production and consumption taking place today, but it also reflects how news has always been consumed

News does not exist in any one form. It is sought out and selected by its users, from the multiple forms of information on offer

A change in how we manage news data is an essential part of how to deliver such change

“News is information of current interest for a specific audience”

News content strategy

The Newcastle Courant, The Huffington Post, Today, Al Jazeera English

Page 4: News data at the British Library

www.bl.uk 4

Newspapers

The UK national collection

34,000 newspaper titles: approximately 60M issues or 450M individual pages, from 17thC to present day

Current acquisition: 1,500 daily or weekly titles

Print copies acquired under legal deposit but will move increasingly towards digital acquisition

Physical access at Newsroom and Boston Spa

Online access to 11M pages via British Newspaper Archive (http://www.britishnewspaperarchive.com)

Approximately third of collection has microfilm access copies; around 2.5% has been digitised so far

British Newspaper Archive

Page 5: News data at the British Library

www.bl.uk 5

Television and radio news

Began recording television and radio news programmes receivable in the UK in May 2010

Collection of over 60,000 programmes, recorded off-air from 20 channels inc. BBC, Al-Jazeera, Russia Today, CNN, CCTV (China), NHK, Bloomberg, France 24, World Service, LBC

30 hours of TV and 22 hours of radio captured per day

Born digital archive, including Electronic Programme Guide data and subtitles where available

Access onsite only, owing to copyright restrictions, via Broadcast News service

Broadcast News

Page 6: News data at the British Library

www.bl.uk 6

Web news

Non-print legal deposit legislation introduced in April 2013 means British Library can start harvesting UK websites

First annual crawl collected 4.5M .uk websites and web pages – collection now amounts to around 3Bn digital assets

Harvesting c.1000 UK news websites (newspapers and web-only sites e.g. hyperlocals) on daily/weekly basis, from end of 2013, with another 500 to be added soon

Access onsite only at British Library and other Legal Deposit libraries

Also Open UK Web Archive, smaller collection of selected websites, openly available at http://www.webarchive.org.uk

UK Web Archive

Page 7: News data at the British Library

www.bl.uk 7

Our news research services

Explore.bl.uk The Newsroom Boston Spa reading room

British Newspaper Archive UK Web Archive Broadcast News

Page 8: News data at the British Library

www.bl.uk 8

News data

2M 19thC British newspaper pages – XML, images

UK television news data 2010 onwards – EPG data for 45,000 programmes, subtitles (XML) for c.25,000 programmes, some speech-to-text files for 2011 broadcasts (XML)

UK radio news data 2010 onwards – EPG data for 15,000 programmes, some speech-to-text files for 2011 broadcasts (XML)

Financial Times – four years of content (1888, 1939, 1966, 1991) – XML, images

Web news selection – possibly

Financial Times, 1893 and 2008

Page 9: News data at the British Library

www.bl.uk 9

Plans

All out-of-copyright UK newspapers on British Newspaper Archive, issue level data for research re-use, covered by single agreement, available through an API. Possibly…

Title-level data for all newspapers we hold (34,000 titles) released as open data

More partner initiatives

Hackathon on 16 November 2015, to be followed by other news data events in 2016

User-led development

BBC radio news script, 14/7/1969

Page 10: News data at the British Library

www.bl.uk 10

Dreams

An open news dataset

An archive news data model

All British Library news records available at issue level

Hyperlocal news sites: On the Wight, The City Talking, A Little Bit of Stone

Page 11: News data at the British Library

www.bl.uk 11

Questions

Copyright constraints limit use of much material to BL premises – how can tools such as named entity extraction work as a means to get round this?

How can print, web, television, radio news, and other news media, be linked up together, and to other resources, and how would this benefit research?

What research questions will we be able to support through a greater focus on news data?

Is news data only for the specialist, or can more general user-friendly applications be produced?

What can news archives learn from the management tools for current news?

How can we help each other?TV news idents

Page 12: News data at the British Library

www.bl.uk 12

Email: [email protected]

Twitter: @BL_newsroom

Web: http://bl.uk/subjects/news-media

Blog: http://britishlibrary.typepad.co.uk/thenewsroom

Contact