slide 1euforbia open meeting, leatherhead 13/09/02 centre national de la recherche scientifique the...

23
EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 1 Centre National de la Recherche Scientifique The EUFORBIA project (IAP 26505) GENERAL ASPECTS Gian Piero ZARRI CNRS 44, rue de l’Amiral Mouchez 75014 PARIS France E-mail: [email protected]

Upload: nelson-dixon

Post on 03-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 1

Centre National de la Recherche Scientifique

The EUFORBIA project (IAP 26505)GENERAL ASPECTS

Gian Piero ZARRI

CNRS44, rue de l’Amiral Mouchez75014 PARISFranceE-mail: [email protected]

ESPRIT 29159 Slide 2EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 2

• General principles;General principles;

• The The EUFORBIAEUFORBIA labels; labels;

• The The EUFORBIAEUFORBIA ontology;ontology;

• The The NKRL NKRL filtering techniquesfiltering techniques;;

• The The Milan ModelMilan Model;;

• Conclusion.Conclusion.

O U T L I N E

ESPRIT 29159 Slide 3EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 3

General Principles (1)

The Partners:• Centre National de la Recherche Scientifique

(CNRS), co-ordinator, France (subcontractor: Maison

des Sciences de l’Homme, France);• AXON, Instituto de Informação Normativa Avançada,

Portugal;• Department of Computer Science of the University of

Milan, Italy;• PIRA International, New Media Department, UK.

ESPRIT 29159 Slide 4EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 4

General Principles (2)

THE EUFORBIA PROJECT:

characterised by a multi-strategy approach (use of two conceptual models, NKRL and the Milan Model) according to the same objective (creation of advanced filtering techniques) and making use of co-ordinated tools.

ESPRIT 29159 Slide 5EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 5

General Principles (3)

Objectives of the project (from the original proposal):

…contribute to the production and use of new generations of Internet filtering systems, more powerful and flexible than the existing ones, and easier to adapt to the cultural, political or religious differences… (these systems)… :

…should support a computer-effective description of the semantic contents of Web sites that could be simultaneously i) very precise and complete in the description of the issues at stake in a given site; ii) neutral as much as possible with respect to any specific doctrine, ideology or value system;

ESPRIT 29159 Slide 6EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 6

General Principles (4)

…should provide the users — both the individual consumers or institutional users — with software tools able to make use directly of the neutral descriptions above to set up filtering policies and filtering schemata according to the most different cultural, political, religious etc. options.

A prototype, running software system … will be realised by the consortium under the control of an “EUFORBIA user group”, active during the whole life span of the project.

ESPRIT 29159 Slide 7EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 7

General Principles (5)

In short, the very idea:

We assume that an in-depth description of the ‘semantic content’ of Internet sites – which is impossible to obtain with the traditional approaches – should allow the implementation of more sophisticated filtering strategies.

A two-years project: 1/1/2001 – 31/12/2002

ESPRIT 29159 Slide 8EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 8

The EUFORBIA labels (1)

The ‘very precise and complete’ and ‘neutral’ descriptions of the contents of the sites is obtained by adding, during the construction or at the moment of a major restructuring of these sites, EUFORBIA labels which make use of a high-level knowledge representation language, NKRL (Narrative Knowledge Representation Language).

ESPRIT 29159 Slide 9EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 9

The EUFORBIA labels (2)

The ‘protocol of description’ adopted consists in individuating three standard Sections within an EUFORBIA label:

– a description of the aims of the examined Web site (this section could be considered as the only really mandatory section);

– a description of some characteristics of the site that could be interesting to record;

– a list of the sub-sections with a short NKRL description of the main characteristics of each

of them.

ESPRIT 29159 Slide 10EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 10

The EUFORBIA labels (3)

Fiesta Online — http://www.fiesta.com.ukc10) (ENUM c11 c12 c13) (the description consists of three parts, i.e., Sections)

c11) OWN SUBJ fiesta_on_line_internet_site: (http://www.fiesta.com.uk)OBJ property_TOPIC (SPECIF dedicated_to (SPECIF internet_posting

(SPECIF porno_image heterosexual_)))

MODAL (SPECIF photo_gallery clickable_colour_photo)(the ‘aims’ Section: site is devoted to the posting of porno, heterosexual colour photos)

c12) (COORD c14 c15) (the ‘characteristics’ Section includes two items)

c14) EXIST SUBJ over_18/21_warning: (fiesta_on_line_internet_site) (the site is labelled with the usual ‘warning’ for adult use)

c15) OWN SUBJ fiesta_on_line_internet_siteOBJ property_TOPIC (COORD1 free_site

(SPECIF internet_edition (SPECIF fiesta_adult_magazine)))

(the site is a free one and is the on-line version of the Fiesta adult magazine)

ESPRIT 29159 Slide 11EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 11

The EUFORBIA labels (4)

c13) (COORD c16 c17 c18 c19 …) (the ‘sub-sections’ Section includes several items)

c16) OWN SUBJ (SPECIF internet_site_section fiesta_on_line_internet_site)OBJ property_TOPIC (SPECIF labelled_as picture_gallery_1 readers_wives_1

one_for_the_ladies_1 i_confess_1 shop_1 link_1)(the sub-sections have different labels)

c17) OWN SUBJ (SPECIF readers_wives_1 (SPECIF internet_site_section fiesta_on_line_internet_site))

OBJ property_TOPIC (SPECIF dedicated_to (SPECIF internet_posting (SPECIF

exhibitionist_porno_image (SPECIF woman_1 (SPECIF cardinality_ (SPECIF more_than

50))))))MODAL (SPECIF photo_gallery (SPECIF clickable_colour_photo

(SPECIF sent_by (SPECIF individual_person_1 (SPECIF cardinality_ several_))))

(the sub-section ‘wives of the readers’ embodies the exhibitionist porno images of more than 50 women) c18) BEHAVE SUBJ woman_1

MODAL housewife_

(the women are housewives)

c19) BEHAVE SUBJ individual_person_1

MODAL (SPECIF reader_ fiesta_magazine)

(the senders are readers of the Fiesta magazine)

ESPRIT 29159 Slide 12EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 12

The EUFORBIA labels (5)

The software module for the creation of the EUFORBIA labels has been implemented thanks to the collaboration between the CNRS (France) and AXON (Portugal) EUFORBIA teams, taking inspiration from an analogous module realised in another NKRL-based project, CONCERTO (Esprit 29159)

As all the EUFORBIA software, this module is realised in Java (JDK 1.3.1)

ESPRIT 29159 Slide 13EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 13

The EUFORBIA ontology (1)

The EUFORBIA ontologyAn important point related to the construction of

the EUFORBIA labels concerns the set up of an EUFORBIA ontology, common to the NKRL and Milan model components of the project. This has been obtained by adapting the existing NKRL ontology (H_CLASS) to include terms pertaining to the

‘pornography’, ‘violence’ and ‘racism’ domains.

This is a collaborative endeavour involving PIRA International and CNRS.

ESPRIT 29159 Slide 14EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 14

The EUFORBIA ontology (2)

Two steps:• In a first one (‘lexical level’), Pira

international has collected, e.g., about 600 terms in the pornography domain, organised according to Vickery’s ‘facets’ methodology.

• In a second one (‘conceptual level’), CNRS has grouped under a unique conceptual label terms that, even if different from a lexical point of view, refer in reality to the same ‘concept’, e.g., ‘prostitute’ and ‘whore’.

ESPRIT 29159 Slide 15EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 15

The EUFORBIA ontology (3)

Sexual acts Sexual deviation perversion

coitus intercourse copulation fucking bonking humping shagging screwing other, the having it off poke lay get laid fooling around

missionary dog fashion knee trembler

anal sex anal intercourse coitus in ano buggery sodomy bumming browning roger old dirt road back scuttle

ESPRIT 29159 Slide 16EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 16

The EUFORBIA ontology (4)

ESPRIT 29159 Slide 17EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 17

The NKRL filtering techniques (1)

NKRL filtering techniques: based on the ‘search patterns’ approach

Search patterns: formal NKRL structures corresponding, in a sense,to natural language queries

Search patterns supply the general framework of information to be searched for, by filtering or unification, within a knowledge base of NKRL data structures (here, of EUFORBIA labels).

ESPRIT 29159 Slide 18EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 18

The NKRL filtering techniques (2)

FUM (Filtering/Unification Module)

developed in the CONCERTO project, allows the direct unification of an NKRL search pattern with the knowledge base of NKRL structures (of EUFORBIA labels).

This module already includes a first, simple level of inferencing. The unification is executed taking into account the fact that a ‘generic concept’ in the search pattern can unify one of its ‘specific concepts’ (or an instance) in the target NKRL structure. “Generic” and “specific” refer to the organisation of the NKRL (Euforbia)ontology.

ESPRIT 29159 Slide 19EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 19

The NKRL filtering techniques (3)

Using the FUM module in EUFORBIA: a rule stating

that the visioning of sites including racist symbolism will not be accepted will prevent the uploading of the Stormfront White Pride site on the basis of the unification of its left-hand side:

(?w IS-NKRL-OCCURRENCE:predicate EXIST:SUBJ (SPECIF visual_content racist_symbol):location of the SUBJ: internet_site

with information included in the ‘characteristics’ section of the EUFORBIA label associated with the site:

c24) EXIST SUBJ (SPECIF graphical_image racist_symbol (SPECIF cardinality_ few_)):(stormfront_internet_site)

ESPRIT 29159 Slide 20EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 20

• The Milano Model is a content-based access control mechanism, initially defined for Digital Libraries (DLs), and well suited for the Web environment.

• It generalises and makes more flexible the approaches based on the PICS standard by organising the important notions of the domain into a hierarchy of concepts instead as a set of simple keywords.

• It manages filtering policies with respect to user characteristics.

The Milan Model (1)

ESPRIT 29159 Slide 21EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 21

The Milan Model (2)

Main features:• content-based filtering of Web documents;

• flexible specification of filtering policies, based on the qualification of users rather than on user identities;

• positive and negative privileges at different granularity levels;

• exception management and policy propagation;

• support for PICS content labels (in particular, ICRA/RSACi content labels).

ESPRIT 29159 Slide 22EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 22

The Milan Model (3)

• An extended version of the Milan Model has been specified according to the requirements of the EUFORBIA framework.

• The Extended Milan Model makes use of the EUFORBIA Conceptual Hierarchy and applies filtering rules to the set of concepts contained in the NKRL EUFORBIA labels.

• The Extended Milan Model has been implemented in a prototype system, using Oracle 8.1.7 DBMS and Java.

ESPRIT 29159 Slide 23EUFORBIA Open Meeting, Leatherhead 13/09/02 Slide 23

Conclusion

Main results obtained until now (September 2002):

• Full new implementation of the Annotation Manager (for the EUFORBIA labels);

• A full version of the EUFORBIA ontology (including ‘pornography’, ‘violence’ and ‘racism’);

• A new Java version of the Milan model.• A full version of the EUFORBIA/NKRL filtering

environment.