informatica course 2 - unibuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · informatica...

24
Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019

Upload: others

Post on 17-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

Informaticacourse 2

Anca Dinu

LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019

Page 2: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

Primary source of information on the slides:

THE DIGITAL HUMANITIES A Primer for Students and Scholars by Eileen Gardiner and Ronald G. Musto, Cambridge University Press, 2015

Page 3: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Tools clasiffication: by the object they process (text, image, sound, etc.) and by the task they are supposed to perfeorm, output or result.

• Even the simplest projects can require text and image/sound processing, storage, analysis with multiple tools either simultaneously or in succession.

• To learn how to use various tools there often will be free tutorials on product websites, on YouTube and other.

Page 4: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

Tools clasiffication by the object they process • Text-based tools:

• Text Analysis: • The simplest and most familiar example of text analysis is the document comparison

feature in Microsoft Word (taking two different versions of the same document and highlight the differences);

• More sophisticated text analysis tools create concordances, keyword density/prominence, visualizing patterns, etc. (for instance AntConc)

• Text Annotation: • On a basic level, digital text annotation is simply adding notes or glosses to a

document, for instance, putting comments on a PDF file for personal use. • But more professional annotations are done by strict standards (XML, HTML, TEI). • For more complex projects, there are interfaces specifically designed for annotations.

Page 5: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Text Conversion and Encoding tools: • Every text in computer format is encoded with tags, whether this is

apparent to the user or not. Everything from font size, bold, italics and underline, line and paragraph spacing, justification and superscripts, to meta-data as title and author are the result of such coding tags. Common encoding formats include RTF, plain text and robustly coded text.

• Text converters transform all these tags from one format to another so they can be used in different applications. Originally many of these converters were stand-alone applications. Now they are add-ons, or they are embedded within a program so that a user can, for example, create a PDF, an HTML or an ASCII file from a Microsoft Word document.

• Commercial and many free text converters are available for formats not included within original text-processing software applications (https://www.files-conversion.com/document-converter.php).

Page 6: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Text Editing tools: • Allow users to perform operations in text documents, like:

• write, • search, • cut, paste, • format, • do and undo, • check spelling and grammar, • outline, • generate tables of contents

• Examples: Microsoft Word, Notepad++, Atom, Brackets, TextMate, Sublime, etc.)

Page 7: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools• Text Recognition tools:

• Optical Character Recognition (OCR) tools automatically recognize characters (including non-Western alphabets) and create documents from scanned digital images of text.

• Examples:• ABBYY Fine Reader (http://finereader.abbyy.com) is a commercial OCR engine that

creates electronic files from scanned documents, PDFs and digital photographs. • DocScanner (http://www.docscannerapp.com) is an inexpensive app that uses the

built-in cameras of hand-held devices to scan documents, optimize images, perform OCR and create and send PDFs, text files and jpegs.

• OmniPage (http://www.nuance.co.uk/for-individuals/by-product/omnipage/index.htm) is a proprietary OCR software that enables text on physical objects to be scanned, processed and exported to a document file format.

• Tesseract (https://code.google.com/p/tesseract-ocr) is a Google-sponsored open-source OCR engine that can read a wide variety of image formats and convert them to text in more than sixty languages.

Page 8: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools• Handwriting Recognition tools (HWR):

• allow users to transcribe handwriting and produce text documents automatically. They still require much direct human intervention/correction.

• Text Visualization tools: • These tools take text and create various visual representations of texts and words,

such as semantic maps and word clouds.

Page 9: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Text Mining:• When text material is incorporated into scholarly research, it often first

needs to be converted into information that can be analyzed for patterns. • These programs extract data from text according to certain parameters

and deliver the data in useful file formats. • Example: Weka (http://www.cs.waikato.ac.nz/ml/weka) is a free, web-

based collection of machine learning algorithms from the University of Waikato for data mining tasks. It includes tools for data preprocessing, classification, regression, clustering, association rules and visualization.

Page 10: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Text processing (NLP - natural language processing):• lemmatizers• pos taggers• parsers• semantic analysers

Page 11: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Image and Sound Based Tools:• Image Creation:

• Images can be created in a digital environment or converted from analog images to a digital format.

• Image Processing, including Editing, Annotation and Markup: • taking a two-dimensional image that has been converted into digital format, making

enhancements such as sharpening, changing color balances, saturation and exposure, cropping or straightening;

• annotating by adding metadata for location, date, content and so forth; • setting parameters such as color mode, compression format and size.

Page 12: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• 3D Modeling: • Creates a mathematical representation of a threedimensional object that can then be

processed to be displayed in two-dimensional space. • This processing can include modeling, alteration and animation.

• 3D Printing: • Creates a three-dimensional solid object based on computergenerated models. It is an

additive process, that is, layers of material are added successively to achieve the exact computer-designed pattern in real space.

• Humanists can now recreate in minute detail everything from an ancient Greek kylix to a scale model of a baroque-style ball gown, offering a representation far closer than anything in print or on screen and emphasizing the unique materiality of the evidence.

Page 13: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

Pia Hinze’s gold Neobaroque dress made with a 3D printer.http://hinzepia.wix.com/muted. Photo by Olivier Ramonteau.

Page 14: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Video and Audio Processing Tools: • These control the alteration of digital acoustic and video files and can

include: enhancement, cleaning, mixing and cutting, annotation and compression.

• For instance, a sound file of a speech can be enhanced to remove background noise that interferes with its clarity;

• background noises might be added for dramatic effect, like the sound of aircraft behind a World War II speech.

Page 15: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Sound recognition tools:• Music Recognition. These tools can process a printed score and create

editable music files. (Shazam)• Speech Recognition and Transcription Speech recognition software

enables a user to automatically convert audio files, such as mp3s, to text.• It is useful for personal notes, interviews, etc.

Page 16: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Media convertors:• http://www.mediaconverter.org

Page 17: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Data-Based Tools:• Database Management Systems (DBMS): Database management systems are

software systems designed for defining, creating, querying, updating and administering databases.

• Data Collection tools: Data collection can be a large part of any scholar’s work. Much of this is done manually, using Database Management Systems to store and manipulate collected data. However, in some disciplines data can be collected through surveys and polls administered electronically, by samplings that include only part of a given population. Whatever the method, there are tools to make the collection of data efficient, thorough and systematic.

Page 18: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Data Analysis: • To be useful, once data is gathered, it must be inspected, cleaned,

transformed and modeled to discover useful information, arrive at conclusions and support decision making.

• There are tools to assist with qualitative and quantitative data analysis, processing complex phenomena in text and multimedia, grammatical structure and natural language, sequential events and geographical names.

• Many of these maintain the traditional philological role of humanistic work: identifying, collating and contextualizing text to properly understand its full meaning.

Page 19: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Data Visualization: • Create visual representations of structured databased on lexical, linguistic,

geographical, tonal, temporal and a other types of data.

• Mapping Tools: • deal specifically with geographic data, otherwise called cartography. These

tools may use GIS, GPS or other geospatial data to create base maps, overlays, historic maps, interactive maps and maps.

Page 20: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Classification of tools by the task they are supposed to do:• Blogging:

• Blogging is a way of informaly discussing or sharing information on the web by uploading posts (discrete, usually brief notices). These are often displayed with the most recent item at the top.

• Some blogs are maintained by groups of scholars involved in similar, related or the same projects. Others can comment on the posts, although sometimes this feature is disabled or restricted to individuals approved by an editor or moderator.

• PEA Soup7 is a good example of a multicontributor blog in the humanities. It covers topics like philosophy, ethics and academia.

Page 21: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Brainstorming: • Idea gathering is at the core of much scholarly research, and

brainstorming is a technique for generating ideas with the effort focused on creating lists of as many spontaneous ideas as possible without evaluation.

• It is often used in engineering and business and quite amenable to digital culture, if not to the traditional model of the solitary reflective humanist.

Page 22: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Collaborative tools: • On anything from text annotation to reviewing and coding to simple

document sharing (google drive, Microsoft word feature track changes). • Help conference organizers pull together their meeting by topic, date, time

and other criteria (doodle, etc.)• Searching:

• Search engines like Google and Yahoo, but others are also available that have special capabilities or features and might sometimes better fit a researcher’s needs.

Page 23: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Communication: • Provide the means for more efficient communication, particularly on

projects. • While many scholars still use e-mail as a basic communication method,

many other specialized applications have emerged to set up meetings, virtual and video conferencing, social networking, desktop sharing and web-based discussions.

• Organization: • Borrowed from the business and publishing worlds, in which schedules

and coordination of forces are critical, such tools are available to help researchers manage their projects and organize their materials for more efficient workflow.

• Examples include Microsoft OneNote, Pliny and Zotero.

Page 24: Informatica course 2 - UniBuclimbimoderne.lls.unibuc.ro/wp-content/uploads/... · Informatica course 2 Anca Dinu LMA, anul I, semestrul I, Universitatea din Bucuresti, 2019. Primary

DH Tools

• Publication and Sharing, including Website Development: • Make the process easier for scholars to create volumes, edit content,

manage workflow, track manuscripts, manage journal and dissertation submissions, create page layouts, share metadata and create e-books.

• Peer Reviewing: • The opportunity for pre- and postpublication review is one advantage of

online publishing. • There are a few specialized tools to help with organizing everything from

comments to peer review.• Examples: ScholarOne (http://scholarone.com) provides workflow

management for journals, books and conferences; Editorial Manager (http://www.editorialmanager.com/homepage/home.htm) is an online manuscript submission and tracking system.