identifying data set for dissertation

21
IDENTIFYING DATA SET FOR DISSERTATION An Academic presentation by Dr. Nancy Agnes, Head, Technical Operations, Tutors India Group www.tutorsindia.com Email: [email protected]

Upload: TutorsIndia

Post on 02-Apr-2021

4 views

Category:

Education


0 download

DESCRIPTION

The process of collecting and validating data on variables of interest is data set collection. It enables us to answer research questions, analyze hypotheses, and assess the outcome. Data collection is one of the significant processes in conducting research. It is good to have the best research design in the world but if it is not possible to collect the required data then completing the dissertation becomes a difficult task. Data collection is a very challenging task that requires systematic planning, patience, hard work, perseverance, and more to complete the project successfully. 2 Website: www.tutorsindia.com Email: [email protected] Whatsapp: +91-8754446690

TRANSCRIPT

IDENTIFYING DATA SET FOR DISSERTATION

An Academic presentation by

Dr. Nancy Agnes, Head, Technical Operations, Tutors India Groupwww.tutorsindia.com

Email: [email protected]

Abstract:

Introduction Categories of Data Set

Decentralized Data Set Search Some recent topics in dataset Conclusion

OUTLINE

Today's Discussion

The essential step in authorizing a proposed system is to evaluate over an appropriate dataset.

The value that has to be generated from data needs the ability to find out, access, and give sense to datasets.

Many efforts are carried out to encourage data sharing and reuse of data, from publishers asking writers to submit data along with dissertation to open data portals, data marketplaces, and data communities.

Contd...

Abstract:

Contd...

Google recently released a service for identifying datasets, which lets users find out data that has been stored in different online repositories through keyword queries.

These developments predict a research field that has been emerging in identifying dataset or data retrieval that broadly contains frameworks, tools, and methods that helps in comparing a user data need over a collection of datasets. 1

Contd...

Theprocessofcollectingandvalidatingdataonvariablesof interest is data set collection.

It enables us to answer research questions, analyze hypotheses, and assess the outcome.

Data collection is one of the significant processes in conducting research.

Introduction

It is good to have the best research design in the world but if it is not possible to collect the required data then completing the dissertation becomes a difficult task.

Datacollectionisaverychallengingtaskthatrequiressystematicplanning,patience, hard work, perseverance, and more to complete the project successfully. 2

Categories of Data Set

Dataareclassifiedintotwomajorcategories:qualitativeand quantitative.

I. QUALITATIVE DATA

Qualitative data are usually descriptive and non-numerical in nature.

The qualitative data collection method plays a significant role in impact evaluation by providing data that can be helpful to understand the processes behind observed outcomes.

Contd...

Moreover, these methods can also improve the quality of survey-based quantitative evaluations which helps to generate evaluation hypotheses and strengthens the design of survey questionnaires.

It also helps in expanding or clarifying findings of quantitative evaluation.

II. QUANTITATIVE DATA

Quantitative data is numerical and can be computed mathematically.

This method uses various scales, such as ordinal scale, nominal scale, ratio scale, and interval scale.

Contd...

Theseapproachesareverycheaptoimplement,andcanbeeasilycomparedas they are standardized.

However,theseapproachesarelimitedintheirabilityfortheresearchand clarification of similarities and differences.

Theoutcomeobtainedfromthesemethodscanbesummarized,compared,and generalized very easily. 3

Contd...

Types of Data Set Searches

Contd...

Source: https://link.springer.com/article/10.1007/s00778-019-00564-x

1. HIDDEN SEARCH:

The hidden or deep search refers to the content that can be found behind web forms usually written in HTML.

There are two main approaches for finding data on the deep web.

The first is the traditional method to develop vertical search engines, where semantic mappings are created between each website and a centralized third party customized to a specific domain.

Contd...

Structured queries are created on the third party and redirected through the web forms using mappings.

A second approach tries to produce the resulting web pages in HTML that emerge from web form searches.

Google has proposed an approach for finding data in deep web content by approximating input automatically to quite a few million HTML forms that are written in multiple languages and span over hundreds of domains, and the resulting HTML pages are added to its search engine index.

The user will be directed to the result of the newly submitted form when they click on a search result.

Contd...

In this type of search, information is ordered and accessed through entities of interest, and their relationships and attributes.

3. TABULAR SEARCH

In tabular search, users access the data stored in one or more tables. The main aim is to identify specific data, such as attribute names or extending tables with fresh attributes. 4

Contd...

2. ENTITY-CENTRIC SEARCH

Google proposed a vertical search engine that has been created to identify datasets on the web.

This system utilizes schema.org and DCAT.

The web for all datasets are crawled based on Google web crawl with the use of the schema.org, as well as the datasets that are described using DCAT, and gathers the associated metadata.

Contd...

Decentralize d Data Set Search

4. GOOGLE DATASET SEARCH

Theyadditionallylinkthemetadatatosomeothersources,findoutareplicaand generate an index of enhanced metadata for each dataset.

ThemetadataissubmissivetotheknowledgegraphofGoogleanditssearch capabilities that are built on the top of this metadata.

ThedatasetsthatareindexedcanbeidentifiedthroughkeywordsandCQL expressions.

5. DOMAIN-SPECIFIC SEARCH

In this type of search, services focus on datasets from specific domains.

They propose modified metadata schemas to explain the datasets and crawlers are implemented to determine them automatically. 5

Contd...

SOME RECENT TOPICS IN DATASET:

1.Dimensional Reduction approaches for large scale data. 2.Training / Inference in noisy environments and incomplete data. 3.Handling uncertainty in big data processing.

4.Anomaly Detection in Very Large Scale Systems. 5.Scalable privacy preservation on big data.

Lightweight Big Data analytics as a Service.

Approaches to make the models learn with less number of data samples.

Contd...

The character of a database row is articulated as the primary key.

Thus each tuple in database table characterizes a unique record. At the object level it is not constantly the case.

Itispossiblethattwodifferentobjectshavingthesamefeaturesvaluescanbe acknowledged as being equal with the equals () function.

Likewise the objects can be related using the (==) operator.

Contd...

PROBLEM OF IDENTITY

Conclusion

There are several ways to widen the system’s functionality.

Academic search engines extract metadata, in the form of a list of authors, publication year, venue, and so on.

The first step will be to provide this data for each paper in the system.

But apart from this, it will be useful to apply this information to datasets, to find out the authors and venues that have used a dataset, and also to determine how its usage has changed over time.

Contd...

Users of the system also help in enhancing the quality of search, by giving feedback on the extracted links, signifying errors, and finding out datasets within papers that cannot be identified by the classifier.

With such participation, the system’s accuracy, recall, and coverage can be enhanced further. Several future improvements must be carried out to get better results.

One of the ways is to study the global data collection patterns after this system is fully organized and employed by many users in real-life situations.

This can be achieved mainly by applying machine learning algorithms such as clustering algorithms like k-Means Clustering to split the users into several clusters. 6

Contd...

CONTACT US

UNITED KINGDOM

+44-1143520021

INDIA

+91-4448137070

EMAIL

[email protected]