no application is an island: using topes to transform strings during data transfer atipol...

17
No application is an island: No application is an island: Using topes to transform Using topes to transform strings strings during data transfer during data transfer Atipol Asavametha, Prashanth Ayyavu, Christopher Scaffidi School of Electrical Engineering and Computer Science Oregon State University

Upload: sabina-ferguson

Post on 27-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

No application is an island: No application is an island: Using topes to transform strings Using topes to transform strings

during data transferduring data transfer

Atipol Asavametha, Prashanth Ayyavu, Christopher ScaffidiSchool of Electrical Engineering and Computer Science

Oregon State University

22

Problem: Data heterogeneityProblem: Data heterogeneityamong software componentsamong software components

• Software components– Created by autonomous stakeholders

– Differing data formats

– May switch to new formats without prior notice

• Programmers– Need to move data between elements automatically

• End users– Need to move data between elements manually

problem approach evaluation

33

Example: Exchanging person namesExample: Exchanging person names

John Smith today

Smith, John tomorrow – unexpected format!unanticipated need for “glue code” to reformat

Lincolnshire MCC tomorrow – questionable!need to validate data, maybe trigger fail-over

Similar issues for data from users, external datasets, or the web.

problem approach evaluation

44

Other examples ofOther examples ofdata format heterogeneitydata format heterogeneity

• Room Numbers– NSH 3103 vs Newell Simon Hall 3103

• Stocks– GOOG vs Google vs Google Corporation

• Address Lines– 101 Main St. vs 101 MAIN STREET vs 101 Main Str.

• Phone Numbers– 888-800-2030 vs +1 888 800 2030 vs (888) 800-2030

• State Names– California vs CA vs Calif.

problem approach evaluation

55

Insight: Exchange Insight: Exchange kindskinds of data of data(rather than particular formats)(rather than particular formats)

John Smith303-202-3030101 Main St.Pittsburgh, PA

Doe, Jane+1 717 292 303088 Brooke LanePITTSBURGHPennsylvania

RAY TILL(404) 555-12032 PITT STPGH, Penna.

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

MR. ART COR282.303.404015 RED RUN RD.pittsburgh PA

JOHN SMITH(303) 202-3030101 MAIN STPittsburgh, PA

problem approach evaluation

66

Insight: Exchange Insight: Exchange kindskinds of data of data(rather than particular formats)(rather than particular formats)

• Three loci for reformatting…– Before transmitting (from source component)

– After receiving (at receiving component)

– Or along the way (in the connector itself)

problem approach evaluation

Could be a database,web site, XML web service,

desktop application, …

Could be a database,web site, XML web service,

desktop application, …

77

Use topes to reformat!Use topes to reformat!

• A tope = a platform-independent abstraction describing how to recognize and transform strings in one category of data

• Greek word for “place,” because each corresponds to a data category with a natural place in the problem domain

• Examples:– Tope for person name

– Tope for university names (and abbreviations)

– Tope for North American phone numbers

– Tope for Oregon State University phone numbers

problem approach evaluation

88

A tope is a graph.A tope is a graph.Node = format, edge = transformationNode = format, edge = transformation

Notional representation for an OSU room number tope…

Formal building name& room number

Kelley Engineering Center 1148

Colloquial building name& room number

Kelley 1148

Building abbreviation& room number

KEC 1148

problem approach evaluation

99

A tope is a conceptual abstraction.A tope is a conceptual abstraction.A tope A tope implementationimplementation is code. is code.

• Each tope implementation has executable functions:– 1 isa:string[0,1] function per format, for

recognizing instances of the format (a fuzzy set)

– 0 or more trf:stringstring functions linking formats, for transforming values from one format to another

• Validation function:(str) = max(isaf(str))

where f ranges over tope’s formats

– Valid when (str) = 1

– Invalid when (str) = 0

– Questionable when 0 < (str) < 1

problem approach evaluation

1010

But will it really work?But will it really work?

• For a range of different kinds of components, e.g….Web service application

Application web service

Web site web site

Desktop application web site

… and other combinations?

• How to specify which tope functions to invoke?• How much work will it be, in practice?

problem approach evaluation

1111

Case study propositionsCase study propositions

• Most of the difficulties encountered will result from technologies other than topes.

• Topes will be able to perform the string transformations needed in a variety of situations.

• Topes will be useful at all three loci (before/during/after data transfer), though not necessarily in every combination of locus and architectural style.

• Using topes will simplify the code required to perform string transformations.

problem approach evaluation

1212

Case #1: Enhanced Windows clipboardCase #1: Enhanced Windows clipboard

problem approach evaluation

1313

Case #2: Enhanced web macro toolCase #2: Enhanced web macro tool

• go to “http://people.oregonstate.edu/~ayyavup/form.html”• enter “Prashanth Ayyavu” into the “Full name” textbox• copy the “Full name” textbox• go to “http://some.other.website.com/myform.html”• paste in “DAVID JAMES” format from “person name” into

the “your name” textbox

(The CoScripter web macro tool already had copy/paste functionality; we just added the clauses for reformatting.)

problem approach evaluation

1414

Case #3: Web service libraryCase #3: Web service library

XML<!-- topesheet = http://eecs.oregonstate.edu/mytopes.txt -->

<mydoc><whatever>

<tel>233-222-3040</tel><date>11-Jan-96</date>

<tel>(203)484-2030</tel><date>12/30/2007</date>

</whatever></mydoc>

TopeSheetxpath:/mydoc/whatever/date{tope:url(http://www.w3c.org/topes/date_EN.xml);}

xpath:/mydoc/whatever/tel{tope:url(http://myserver.com/custom_tel.xml);}

Client CodeItemLoader loader = ItemLoader.FromXml(xml);

ItemSet items = loader.Load("xpath:/*/tel");

List<String> values = items.FormatAs("+1 404 505 6060");

// overloaded methods let you override the topes and/or validate the data

problem approach evaluation

1515

Summary of findingsSummary of findings

1. Clipboard 2. Web macros 3. Web services

Main sources of difficulty

Windows API Reading the CoScripter code; interfacing to our topes library

Web services becoming unavailable

Topes can handle the kinds of strings

Yes Yes Yes

Topes useful at all three loci

Connector CoScripter component (acts as connector between websites)

Sender or receiver of data

Topes simplify reformatting code

Yes No… needed interface code

Yes

problem approach evaluation

1616

ConclusionConclusion

• Software elements can use varying formats– No explicit references to format identifiers

– No need for ontology consensus

• Topes are reusable for data in…• XML nodes Database tuples

• HTML tags Webform fields

• Spreadsheet cells …and more

• Main challenge is interfacing to library across languages

problem approach evaluation

1717

Thank You…Thank You…

To ICISA for this opportunity to participate