the business of personal knowledge - wordpress.com business of personal knowledge ... bibliography:...

The Business of Personal Knowledge

Mark GREGORY

Department of Finance and Operations, ESC Rennes School of Business

35065 Rennes Cedex, France

[email protected]

and

Dr. Mario NORBIS

Department of Management, Quinnipiac University

Hamden, Connecticut 06518, USA [email protected]

Conference on Knowledge, Culture and Change in Organisations

Cambridge University, UK

5-8 August 2008

Abstract

Knowledge and information workers work as individuals within virtual team structures. As individuals and as team members, they acquire information, which they store in a number of complex ways: some

paper-based, but increasingly computer-based. There are a number of computer-based tools, sometimes

referred to as Personal Information Managers or PIMs (Kelly 2006 and Teevan 2006) which can assist

in the storage and management of such information. However, little is understood about how people use

these tools, how they learn new ones, the ways in which the tools constrain how people work and think,

and how best to educate people to make the right choice of the right tools. The underlying hypothesis of

the research-in-progress presented in this paper is that individuals working in groups should be

encouraged and educated to make better use of the available tools, and that the tools themselves should

evolve into better ways of representing information and knowledge.

The object of this paper is to present a limited view of current trends in the academic and practitioners‘

literature in the areas of knowledge representation and communication by individuals and small groups

(Boardman et. al. 2004) in search of a better understanding about the way people use these tools and

learn new ones, in order subsequently to find strategies on how best to educate people to make the right choice of the right tools. The paper suggests a classification scheme for these tools based primarily on

their data representation: e.g. spreadsheet, relational database and semantic web represented at the

desktop level (Sauermann, et. al. 2005). Specific difficulties associated with certain of these data

representations are identified. The paper also suggests that a judicious mix of existing and emerging

techniques and tools will permit evolution or revolution in the management of individual and shared

information and knowledge.

Keywords: PIM, GIM, Classification, Knowledge Representation, Semantic Web

mailto:[email protected]

mailto:[email protected]

1. Introduction Knowledge and information workers work as individuals within virtual team structures. As individuals

and as team members, they acquire information, which they store in a number of complex ways: some

paper-based, but increasingly computer-based. There are a number of computer-based tools, sometimes

referred to as Personal Information Managers or PIMs (Kelly 2006 and Teevan et. al. 2006) which can

assist in the storage and management of such information. However, little is understood about how

people use these tools, how they learn new ones, the ways in which the tools constrain how people work

and think, and how best to educate people to make the right choice of the right tools.

Our earlier paper (Gregory M.R. & Norbis M. 2008) presented the hypotheses that individuals working

in groups should be encouraged and educated to make better use of the available tools for information

management and that the tools themselves should evolve into (or be replaced by) better ways of

representing information and knowledge. In that paper, we started to classify and evaluate the

effectiveness of existing tools and techniques by firstly summarising current trends in the academic and

practitioner‘s literature in the areas of knowledge representation and communication by individuals and

small groups; and then proposing a methodology for evaluating them in the tradition of the systems

approach originally formulated by Churchman 1968. Our earlier paper suggested a classification

scheme based primarily on their data representation.

The object of this paper is to present a limited view of current trends in the academic and practitioner‘s

literature in the areas of knowledge representation and communication by individuals and small groups

(Boardman, et. al. 2004) in search of a better understanding about the way people use these tools and

learn new ones, in order subsequently to find strategies on how best to educate people to make the right

choice of the right tools.

This paper begins to develop a multidimensional classification scheme for these tools based

Not only on their data representation - e.g. spreadsheet, relational database, semantic web represented at the desktop level (Sauermann et. al. 2005)

This dimension was suggested in our earlier paper and is developed here. It

is the “how to” of personal information management.

But also

The various functionalities (useful features) these tools offer

This is the “what” of personal information management.

Issues of usability and of user acceptability

This is the “why” (and why not!) of personal information management.

At this stage, we are setting out a research agenda; we do not yet have full answers to the questions that

we will present today and which will form the basis of our further work over the next couple of years.

2. Representation of personal data The ways in which data is stored on a computer influence how it can subsequently be used. We

therefore identify several possible, or candidate, data representation approaches and analyse the

consequences of choosing them.

2.1. Personal information management: a brief recap Many of us keep a wide range of personal data, which we classify or sub-divide into areas such

as:

Agenda: list of appointments

Address book: our contacts

To Do list

Most attendees at this conference keep more specialised (but still widely-used) data such as

Bibliography: reference list

Reading notes

Project logbook

Some of us do this primarily on paper, in spiral notebooks or perhaps in more-specialised

diaries and the like.

Many of us also or alternatively use personal computers (desktop or notebook), digital PDA

(Personal Digital Assistant) devices or smartphones.

Some of us work in contexts where this kind of information is no longer exclusively ours, and

we choose to (or are obliged to!) share and merge this kind of information.

All of us are of course very careful to copy this personal data from one device to another, in

order to safeguard it from corruption or loss. Some of us take additional care to synchronise

this data; that is, when we store a new contact detail on our smartphone, we subsequently

synchronise it into our desktop environment. An obligation to share this kind of data occurs if

we have a secretary or administrative assistant who also collects this kind of data on our

behalves.

2.2. Personal Information Management: Make or Buy?

2.2.1. Basic data management tools used for personal information management include spreadsheets and databases Spreadsheets are a very powerful combination of the nearest approach to widely

available end-user computer programming ever invented; and ways of storing (more

or less) structured data in which the relationship between items of data is imposed by

the use of formulae.

Later in this document, we will point out some of the shortcomings of spreadsheets,

which are the flip-side, the obverse, of their expressive power.

Databases generally have a more limited remit which they fulfil with greater precision

than do spreadsheets. The most widely accepted, implemented and used type of

database is the so-called ―relational‖ database (Date 2003). He suggests as an informal

initial definition that

―

A relational system is one in which the data is perceived by the user as tables

(and nothing but tables); and the operators at the user‘s disposal (e.g. for data

retrieval) are operators that generate new tables from old. For example, there

will be one operator to extract a subset of the rows of a table, and another to

extract a subset of the columns – and of course a row subset and a column

subset of a table can both be regarded as tables themselves. The reason such

systems are called ‗relational‘ is that the term ‗relation‘ is essentially just a

mathematical term for a table.

‖

It is possible to use spreadsheets and database management systems as the means by

which personal data is stored, in other words, as the means by which a given

individual carries out personal information management. In effect, the computer user

who chooses this approach is making her own specialised lists of data which is

important to them.

The choice between spreadsheet and database is actually not straightforward, and

many computer users make an inappropriate choice based on imagined or real self-

imposed constraints. These constraints include the competence that the user has with

such tools.

What spreadsheets are good at

Spreadsheets combine conceptual simplicity, very powerful data manipulation and analysis facilities, and good information presentation facilities

Spreadsheets seem to most end-users to be easier to design and to use than do databases

It is comparatively easy to evolve the form of a spreadsheet as the context of its use changes

Functions make it easy to use previously programmed data analytical techniques

It is possible to program new functions, or to have them written for you so that you can use a specific data analytical technique

Recent spreadsheet packages have excellent information presentation facilities and they also connect very well to other office programs such as word processors and presentation graphics programs

Some problems with spreadsheets, and some indications of why databases may be “better”

Spreadsheets are by their very nature highly insecure – anyone who can access a spreadsheet can see all the data in that spreadsheet; industrial strength databases make it impossible for users who are not privileged to change, or even to see, data: to do so

Spreadsheets can rapidly become very complex, and it is very difficult to understand what the overall structure of the spreadsheet is; as a result, they can become a nightmare to maintain

It is difficult for more than a very small number of people to use a single spreadsheet at one time, and almost impossible to stop them from interacting with each other, often in a conflicting way

Spreadsheets can handle at the most a few thousand records; databases can handle millions

Databases can support tens, or even thousands, of simultaneous users

Personal (small-scale) database management programs exist But are often badly used.

The best known examples are Microsoft Office Access and OpenOffice.org

Base.

2.3. Candidate data management approaches: Spreadsheets Spreadsheets consist of an array of cells, each of which can store a value or a formula. A

formula relates the value of the current cell to other cells which can be considered as exporting

their value to be used in the formula.

2.3.1. Spreadsheets in general

Dan Bricklin (Bricklin 1981) originated VisiCalc, the first application that turned the

personal computer from a hobby for computer enthusiasts into a business tool.

VisiCalc went on to become the first "killer app", an application that was so

compelling, people would buy a particular computer just to own it. In this case the

computer was the Apple II.

The acceptance of the IBM PC following its introduction in August, 1981, began

slowly, because most of the programs available for it were ports from other 8-bit platforms. Things changed dramatically with the introduction of the Lotus 1-2-3

spreadsheet package in January, 1983. It became the PC platform's so-called killer

app, and drove sales of the PC due to the improvements in speed and graphics

compared to VisiCalc. See Lotus Symphony 2008.

Investigations in various organisations suggest anecdotally that typical knowledge

workers possess tens or hundreds of spreadsheets. (There are 3615 spreadsheet files

on the hard disk of one of the authors.)

Figure 1 shows a spreadsheet being used to store bibliographic data, in fact, the list of

references on which this document is based:

Figure 1: Spreadsheet being used to store personal research data

2.3.2. Problems associated with spreadsheets

There are many problems associated with spreadsheets.

Panko, Raymond R. 1998 suggests that

―

Many spreadsheets are large and complex, and development often involves

interactions among multiple people. In fact, we would guess that the largest

portion of large-scale end user applications today involve spreadsheet

development.

In recent years, we have learned a good deal about the errors that people

make when they develop spreadsheets. In general, errors seem to occur in a

few percent of all cells, meaning that for large spreadsheets, the issue is how

many errors there are, not whether an error exists. These error rates, although

troubling, are in line with those in programming and other human cognitive

domains. In programming, we have learned to follow strict development

disciplines to eliminate most errors. Surveys of spreadsheet developers

indicate that spreadsheet creation, in contrast, is informal, and few

organizations have comprehensive policies for spreadsheet development.

Although prescriptive articles have focused on such disciplines as

modularization and having assumptions sections, these may be far less

important than other innovations, especially cell-by-cell code inspection

after the development phase.

‖

Ventana Research 2007 reports on a survey they undertook of actual user experience

in the use of spreadsheets. In addition to the observations already generally accepted

concerning spreadsheets (that they are error-prone and difficult to use in a team

context) they add the observations that they are often used for tasks to which they are

badly-adapted because they are perceived as free (ignoring the hidden costs which

then follow); and that they are difficult to combine, especially between enterprises.

In our opinion, these problems often stem from the fact that there is no accepted

methodology to define and document the requirements of a spreadsheet.

Ventana Research 2007 suggest also that the need to audit spreadsheets may push

organisations in a direction they consider advisable, that of identifying or creating

formal applications which supplant spreadsheets for some of their common uses –

notably budgeting, calendar management and the like.

Spreadsheets are frequently misapplied to relatively large business problems to which they are badly-adapted. Indeed, Figure 1 shows a spreadsheet being used to store

bibliographic data. There are some clear advantages in the approach. In the example

above, a link is made between a reference and a copy of the referenced document

stored on the same computer as the spreadsheet. This is done using a formula whose

use is well understood by many spreadsheet users. But in fact the example only works

because the user of the spreadsheet knows and respects the rules for ―well-formed‖

references. It is difficult to carry out the complex data validation which should be

imposed on bibliographic detail. In addition, the same formula uses a user-written

function (exists_file) which is, in Excel, expressed in Visual Basic for Applications

(VBA). VBA is a programming language and as such is inaccessible to a large

proportion of spreadsheet users.

See Burnett, M. & Atwood, J. & Walpole Djang, R. & Reichwein, J. & Gottfried, H.

& Yang, S. 2001 for a further discussion of spreadsheet shortcomings and other

suggestions of ways forward. See also Burnett, Margaret & Curtis Cook & Omkar

Pendse & Gregg Rothermel & Jay Summet & Chris Wallace 2003 for specific

suggestions on encouraging end users to profit from formal software engineering

methodologies, specifically making assertions about their spreadsheets in order to

achieve greater correctness and greater efficiency.

2.3.3. The role of spreadsheets in personal information management

Spreadsheets are very widely used (and as we have seen, misused) for storing personal

information. The use of informal techniques of sharing spreadsheets and of more

formal techniques such as that suggested by Expresso 2008 mean that spreadsheets

can be used in small-scale group information management systems. Ventana Research

2007 document the pervasiveness of spreadsheets, and confirm their value for

processes such as one-off ad hoc reporting and the prototyping of requirements for

what should subsequently be re-engineered into, or acquired ready made as, formal

applications to support specific management processes.

2.4. Candidate data management approaches: Relational databases The currently dominant approach, the relational database paradigm originally suggested by

Codd, E. 1970 and expanded upon by Date, Chris J. 2003 enables arbitrary manipulation: that

is to say that queries can be defined which will always have an answer. However, the data is

constrained to appear in normalized relations or sets or entities – these terms are equivalent;

they are implemented as data base tables.

2.4.1. Advantages of the relational approach

The following brief analysis is taken from Indiana University (n.d.):

―

A database is a collection of data, which is organized into files called tables.

These tables provide a systematic way of accessing, managing, and updating

data. A relational database is one that contains multiple tables of data that relate to each other through special key fields. Relational databases are far

more flexible (though harder to design and maintain) than what are known as

flat file databases, which contain a single table of data.

To understand the advantages of a relational database, imagine the needs of

two small companies that take customer orders for their products. Company

A uses a flat file database with a single table named orders to record orders they receive, while Company B uses a relational database with two tables:

orders and customers.

When a customer places an order with Company A, a new record (or row) in

the table orders is created. Because Company A has only one table of data,

all the information pertaining to that order must be put into a single record.

This means that the customer's general information, such as name and address, is stored in the same record as the order information, such as

product description, quantity, and price. If customers place more than one

order, their general information will need to be re-entered and thus

duplicated for each order they place.

Whenever there is duplicate data, as in the case above, many inconsistencies may arise when users try to query the database. Additionally, a customer's

change of address would require the database manager to find all records in

orders that the customer placed, and change the address data for each one.

Company B is much better off with its relational database. Each of its

customers has one and only one record of general information stored in the

table customers. Each customer's record is identified by a unique customer code which will serve as the relational key. When a customer orders from

Company B, the record in orders need contain only a reference to the

customer's code, because all of the customer's general information is already

stored in customers.

‖

Chen 1976 introduces the analysis and design issues which surround the effective use

of relational databases.

2.4.2. Disadvantages of the relational approach applied to personal and group information management

Freyberg, C.A. 1996 reports that the teaching of relational database design and

construction is a major challenge for teachers of introductory Information Systems

courses. The author himself has long experience of the difficulty of teaching end users

and non-technical students to design and use relational databases. Nevertheless, it is

possible to do this and for business students and professionals to design

straightforward relational databases and to implement them using products such as

Microsoft Access and OpenOffice.org Base.

Unfortunately, the ER model for even a simple PIM application is complex and runs

to many entity types. It is unlikely that there is great value in doing this when such

simpler requirements are well met by existing packaged solutions.

If any attempt is made to extend a model into more specific domains, the model can

become very complex indeed. IFLANET 1998 documents an entity-relationship

model for bibliographic records. The major entities are Work, Expression,

Manifestation, Item, Person, Corporate Body, Concept, Object, Event and Place. The

description of the functional requirements for a system to store this kind of data runs

to 136 pages – for what would be a small part of the information storage requirements

of a librarian or an academic or a student.

Faced with this complexity, a typical response is to eschew the advantages of a user-

specific database and instead to acquire a general PIM application (such as Microsoft

Outlook) and a specialist packages for each major type of data to be managed (such as

EndNote 2008 or RefWorks 2008 for bibliographic data).

2.5. Bought-in solution: “PIM” (Personal Information Manager) Various so-called ―PIM‖ (Personal Information Manager) tools have been developed and

marketed with varying degrees of success. We present a list of over 150 such programs and

services in section 3.4.

Effectively using spreadsheets (or even more so, databases) involves a level of planning and

organisation which not every business professional or knowledge worker can do well. As a

consequence, over the years, a plethora of more-or-less business-focussed application programs

have been devised to ease the task of storing and retrieving personal information such as

contacts (addresses), appointments, tasks and the like. These tools are frequently based on an

underlying relational database, whose existence may be visible to the user or hidden from her.

Currently, the most widely used such tool is Microsoft Outlook, which additionally provides

access to the facilities of an email system by means notably of the user‘s email inbox. Outlook

(and similar programs) are widely used to manage a user‘s contacts (individuals and

organisations) and the emails received from them and sent to them. The dominance of

Microsoft Outlook in the marketplace can be explained by the fact that a free version, Outlook

Express, is shipped with the huge majority of PCs when they are manufactured.

Outlook is typically configured in the enterprise to act as the client or front-end to a server,

frequently but not exclusively Microsoft Exchange Server (see next section). Outlook

integrates particularly effectively with Microsoft‘s Office software suite (Microsoft Office

2007), which is currently by far the most widely used office suite. Office incorporates many

programs (depending upon its version), including the word processor program Word, the spreadsheet program Excel, the presentation graphics program PowerPoint, and sometimes the

relational database Access.

According to a Gartner Group commercial report (Gartner Group 2007) in the large-enterprise

market for corporate email clients and messaging, Microsoft (Outlook integrated with

Exchange Server) still maintains its lead with a 47.8 percent market share, compared to IBM's

42.3 percent (Lotus Domino with Lotus Notes desktop client). In practice, Microsoft has a much larger lead when the huge numbers of standalone PCs which are not integrated into

corporate systems are taken into account. Here, Outlook has a crushing dominance over Notes.

Many more-focussed commercial PIM packages have been proposed over the years, but none

has been able to impose itself in the market in the face of the simple reality that Office and

Outlook appear on most corporate desktop and laptop computers and an increasing number of

smartphones. However, Outlook is not as such a PIM, and provides limited PIM functionality somewhat grudgingly (author‘s evaluation). Outlook offers good email management facilities,

adequate contact (address) management, and facilitates an arguably-lazy but very widespread

approach to time management, that of using the email in-tray as a way of tracking unfinished

tasks (as a ‗to do‘ list). See Whittaker, Steve & Victoria Bellotti & Jacek Gwizdka 2006.

In the open-source world, the Lightning and Thunderbird developments have provided an

effective email capability (but little more at this stage, although they have aspirations towards

becoming a more complete PIM). Similarly KDE 2008 has plans to spawn a PIM. Perhaps the

most cogent immediate threat to Microsoft is, however, a combination of the various Google

Docs utilities, and in particular their very powerful Gmail service.

2.6. Bought-in solution: “GIM” (Group Information

Manager) Various ―GIM‖ (Group Information Manager) tools have been developed and marketed with

varying degrees of success

The most-established such tool is IBM‘s Lotus Domino 2008 family of applications. This

incorporates Lotus Notes, which has been widely used to provide email client and document

storage and retrieval facilities which arguably constitute the basis for group information

management. More recently, Microsoft has introduced a raft of related tools which address the

same basic market need. Based on Microsoft Exchange Server (Microsoft Exchange Server

2007) and Microsoft SharePoint, these tools (just as those proposed by IBM) share as

characteristics:

An emphasis on structuring data and information so as to encourage its sharing and reuse

Dependence on computing professionals to set up and maintain the shared document store and/or database

2.7. Candidate data management approaches:

Outlining and Outliners An outline is a hierarchical way to display related items of text to graphically depict their

relationships. Outlining is a technique which may be implemented in general office programs

or in specific computer programs known as ―outliners‖. An outliner is a special text editor that

allows text to be structured as an outline. Outliners are typically used for computer

programming, collecting or organizing ideas, Getting Things Done (a time management approach espoused by Allen 2001, or project management. Outlining is the technique widely

used in programs such as Microsoft Office PowerPoint, in which the main headings of a

presentation appear as separate slides and on each slide appear points and sub-points. The same

technique is available in a more powerful but perhaps less widely-used form in word

processing packages such as Microsoft Office Word, which supports a very useful Outline

mode.

An outliner is a program which stores and depicts outlines.

Outliners have a long history as tools on PCs. The best example known to the authors is

NetManage ECCO Pro, which has not been updated by its publisher for over a decade but is

still extensively used and even updated by means of object-code patches (the source code still

being jealously guarded by its publisher). Another well-used program is Micro Logic‘s Info Select 2007, described a little below. The internal data structure of these programs is similar. A

data item is given meaning by being shown in its owning hierarchy. Thus a person‘s surname is

a component of a composite Contact object.

Realised in Word and formatted in a particular way, an outline has an appearance similar to:

Figure 2: Outline formatted as a hierarchy of points, sub-points, sub-sub-points.

Here, the owner in the hierarchy as shown is 11. Semantic Web. It is the eleventh point in a

document – it is implicitly owned by the document of which it forms a part.

It owns items 11.1, 11.2, 11.3, …

11.3 owns 11.3.1, 11.3.2, …

The owning item for 11.2, 11.3 … is 11.

The relative positioning of an item conveys meaning in that the label of the owner classifies or

otherwise gives contextual information concerning the owned item; and the depth in the

hierarchy gives some idea of the relative importance or significance of the item.

Part of the genius and the weakness of these programs is that the user has considerable control

over the structuring of data. Both Ecco 1997 and Info Select 2007 permit the definition of

forms to impose some order on anarchy. A second aspect of their genius is that a data item can

participate in more than one hierarchy. Thus for example an appointment for a meeting can

appear in an overall agenda or calendar, but also be linked to the name of each participant in

the meeting. Effectively, the same datum is classified in more than one way. To the extent that

knowledge is a product of the recognition by intelligent agents of connections between

information otherwise not explicitly linked, this kind of tool can be used as a mechanism for

storing relatively unsophisticated knowledge.

To give a flavour of this kind of tool, consider this screen capture from Ecco. In Ecco, a grid

can be superimposed on the outline. The column headers of the grid are the names of folders,

that is, named sets of data values.

Figure 3: Ecco screenshot

This screenshot shows a user‘s diary or calendar, and the associated phone-book item for the

current appointment. At the left-hand side of the screen capture is the folder hierarchy.

This program, and others like it, combine very powerful data structuring with relatively easy to

use (and understand) basic PIM ―functionality‖ in terms of diary, contact management and the

like.

2.8. Candidate data management approaches: Mindmaps Buzan 1996 has highlighted mind maps as a means of diagrammatically representing ideas and

the connections between ideas.

Wikipedia - Mind map (2008) reports that:

―

A mind map is a diagram used to represent words, ideas, tasks, or other items linked

to and arranged radially around a central key word or idea. It is used to generate,

visualize, structure, and classify ideas, and as an aid in study, organization, problem

solving, decision making, and writing.

It is an image-centred diagram that represents semantic or other connections between

portions of information. By presenting these connections in a radial, non-linear

graphical manner, it encourages a brainstorming approach to any given organizational

task, eliminating the hurdle of initially establishing an intrinsically appropriate or

relevant conceptual framework to work within.

A mind map is similar to a semantic network or cognitive map but there are no formal

restrictions on the kinds of links used.

‖

Mind maps can be created using software. See for example Visimap 2008, produced by CoCo

systems, which the company describes in these terms:

―

The innovative VisiMap Professional 4.1 is a unique creativity- and productivity-

enhancing application for Microsoft Windows® that saves you valuable time in your

day-to-day work and offers you new flexibility in exploring and organising your

thoughts.

It graphically records, structures and clarifies the results of your creativity so that they

can be used, reused and communicated effectively.

Based on the usefulness and simplicity of graphical 'visual maps' (similar to what are

variously called idea maps or brain maps), VisiMap Professional adds efficient data

entry, automatic layout, striking presentation, powerful map structuring, manipulation,

and printing features, and sophisticated document import and export facilities to create

an invaluable asset that produces visual solutions to all kinds of business and personal

applications.

‖

In VisiMap, an outline can be presented both diagrammatically as a mind map and also as a

text outline in Microsoft Word format.

Mind map software such as VisiMap or is frequently used in personal information management

applications.

The screenshot below, taken from early material on a planned enhancement to the SQLNotes

2008 PIM, gives the flavour of how such information looks when presented as a mindmap

(Buzan 1996):

Figure 4: Outline formatted as a mind map

2.9. Summary: making and buying personal and group information management Making personal information management can be achieved using spreadsheets. This should

normally be reserved for one-off ad hoc reporting and the prototyping of requirements for what

will subsequently be re-engineered into, or acquired ready made as, formal applications to

support specific management processes. It is also possible for business students and

professionals to design straightforward relational databases and to implement them using

products such as Microsoft Access and OpenOffice.org Base; this can be done for small

subsets of personal information which are specialised or very important to the user.

Buying (or otherwise acquiring) a specialist PIM or GIM tool is arguably a much more

sensible way of managing personal data than devising complex spreadsheets or devising

comprehensive databases.

But only a small proportion of knowledge workers buy PIMs, and even less of them persist in

using them. Why?

3. PIM Functionality: What PIMs do This section firstly summarises the meaning of data, before proceeding to list PIMs identified by the

authors and beginning to identify and classify their associated functionality, that is, what users can do

with them.

3.1. The meaning of data: semantics Making lists and storing them is not rocket science. In fact, it isn‘t even science. A list is only

as useful as the meaning it conveys. Consider this list of (what most of us will read as) girls‘

names:

Andrea 2007

Chantal 2007

Gabrielle 2007

What is this? Three members of a hockey team?

The addition of a column heading changes the story a little:

Hurricane name Year used

Andrea 2007

Chantal 2007

Gabrielle 2007

What we have done is to classify the data, by naming the sets. The process of labelling or naming data gives so-called semantic significance to the data. To be meaningful, data needs

syntax (rules for content and formatting) and semantics (rules for meaning). An alternative and

equivalent formulation is that data needs metadata to give it significance. Classification is

fundamental to science and to knowledge.

3.2. Structure and meaning

3.2.1. To make use of any computer based personal information management tools, we have to “structure” our data Computer users voluntarily sacrifice freedom in favour of structure in order to

facilitate storage, retrieval, and especially more precise querying (answering ad hoc

questions about the data) and communication; but they still do not achieve the level of

communication that they strive for.

In order to use computers we have traditionally needed to sacrifice, to limit the

expressiveness, of the information stored, where expressiveness is defined as the

ability to communicate meaning.

Well-structured data can be queried with greater precision; that is, more accurate and

complete answers can be obtained to questions about the data.

To illustrate this point. If we extend the example above:

Hurricane name Year used Meteorologist

Andrea 2007 John Smith

Chantal 2007 Methuselah Gabrielle

Gabrielle 2007 Chantal Legros

With data structured in this way, we can achieve precise answers to different queries:

Which hurricanes have been named “Chantal”?

Answer: one - Chantal

Which hurricanes have been named by a meteorologist called “Chantal”?

Answer: one - Gabrielle

Which hurricanes have been named after the meteorologist?

Answer: none

Note that free-text searching of the content alone, without taking into account the

structure of the data, would give imprecise (inaccurate) answers.

But how do we express meaning? After 50 years of ―advances‖ in Computer

Information Science we still do not know how to do this particularly well – and the

situation is arguably worst at a most crucial point for productivity – the work of the

individual knowledge worker, who is provided only with basic tools in which

integration remains unintuitive. Indeed, each tool tends to highlight one or two

information storage and presentation techniques to the exclusion of others. She then

resorts to approaches such as managing tasks by leaving emails in the inbox, and

keeps lists in linked spreadsheets. This creates isolated islands of under-managed and

difficult-to-integrate data.

3.2.2. Structure imposed centrally is essential in some contexts and inimical in others We sacrifice freedom in favour of structure in order to facilitate storage, retrieval, and

especially querying (answering ad hoc questions about the data) and some aspects of communication; but we still do not necessarily achieve the level or effectiveness of

communication that we strive for.

Some data is very clearly the property of a worker‘s employing enterprise, and some

needs to a greater or lesser degree to be held and managed centrally. Standards vary

widely according to the objectives and style of the organisation. A worker in a client call centre may not be permitted to store any data locally on a company owned

computer. Conversely, universities may actively encourage information sharing. More

common perhaps is controlled shared information – as in medical practice or business

consulting. Many organisations seek to impose a standard way of capturing and

storing data, which meets some purposes but defeats others.

3.3. Data storage techniques and their associated metadata – first list If we revisit some of the techniques used for storing personal information, we see that

somewhat different linguistic rules and resulting expressiveness are associated with each. The

table gives some examples:

Technique Metadata Expressiveness and

precision

Spreadsheets Pragmatic – the meaning of the data is not explicit, but is partially

expressed in column and/or row

headings; and partially in

relationships between cells.

Potentially very expressive and

frequently imprecise or

even contradictory.

Charting permits

visually-arresting

representations of some

of the underlying data.

Relational databases If the data is normalised (Codd 1970; Date 2003), then the column

headings name sets of atomic (non-

divisible) data items. This is

deliberately constricting, because

human-readable metadata, in the

form of a natural language

description (name) for each

attribute, can be exploited by users

as they enquire from the data,

enabling precise answers to questions they have. These names

can be extended by a data

dictionary (which, however, is

often not accessible to the end-user

of the data in he database).

Deliberately very restricted

expressiveness. All data

is constrained to appear

as tables to permit

generality and precision

of subsequent querying.

The results of queries

are themselves virtual

tables constructed from

the original input data.

Outlining and

Outliners

The relative positioning of the items in a hierarchy groups and

classifies data; and associates

meaning with each group and sub-

group. The addition of a grid

permits further structuring.

Hierarchies themselves are cognitively

powerful or not

depending on the prior

training of the user. The

addition of a grid adds

expressiveness.

Mindmaps The relative positioning of the items in a diagram groups and

classifies data; and associates

meaning with each branch and sub-

branch. An image is (potentially)

associated with each branch or

sub-branch

Visually very powerful, the user perceives both

structure and meaning.

Querying is very

imprecise or non-

existent.

Table 1: Data storage techniques and their associated metadata – first list

We suggest that the relationship between Generality and Meaning/Focus is a trade off. We can

express it mathematically as G * M = constant – in the same way as we observe in the

thermodynamics of gases P*V = constant. If we graph this speculation, we get:

Trade-off, Focus versus Generality

0

0,2

0,4

0,6

0,8

1

1,2

1 2 3 4 5 6 7

Generality: expressiveness

Fo

cu

s:

pre

cis

ion

of

qu

ery

ing

Figure 5: Posited relationship between generality and meaning (or focus)

There may also be user indifference curves of a similar shape relating features and usability.

These speculations will be developed as hypotheses in subsequent research.

3.4. Some PIM packages The table which follows lists the PIM (and GIM) packages which we have so far been able to identify. The sources include Wikipedia - Personal information management 2008, Keeping

Found Things Found 2008 and our own developing research.

Table 2: An initial list of packages which provide (or can be used to provide) PIM functionality

Product Publisher URL Licence type Platform(s) First appeared Most recent version

Internal data storage and external presentation approach

Notes

24SevenOffice

24SevenOffice http://www.24sevenoffice.com/webpage/en/

proprietary software

Web application

2008 ERP/CRM - contains collaboration module

Above & Beyond 2000 PRO

1soft http://www.1soft.com


Windows 2008 PIM/GIM

ACT! Sage http://www.act.com

Windows, PalmOS, Windows Mobile

2008 Contact management

ActionOutline Green Parrots Software

http://www.actionoutline.com/

Windows 2.1 Contact management

Aethera TheKompany.com

http://www.thekompany.com/projects/aethera/

FOSS-GPL Linux, Mac OS-X, Windows

2001 2005 PIM, PDR, Messaging and Groupware

Agenda, Lotus

S. Jerrold Kaplan, Mitchell D. Kapor, Edward J. Belove, Richard A. Landsman, and Todd R. Drake. Lotus.

http://en.wikipedia.org/wiki/Lotus_Agenda


DOS 1992 PIM. Still in use - very powerful data handling. See also Chandler, which in some ways is a successor to Agenda.

AIM 96 Accu Knowledge,

http://www.akinet.com/aimf


Windows 1996 PIM

http://www.24sevenoffice.com/webpage/en/




http://www.1soft.com/

http://www.1soft.com/

http://www.act.com/

http://www.act.com/




http://www.akinet.com/aimff.htm


Inc. f.htm

Ajour Calendar / PIM

Micro-Sys ApS, DK

http://www.micro-sys.dk/products/ajour/

FOSS 2004 Personal information manager - manage appointments and events. Ajour is an easy-to-use personal information manager (PIM). Use it as a combined calendar, diary, organizer, and reminder. Keep track of dates, appointments, annual events like birthdays, to-do items, and notes. You can also dial phone numbers stored in your data. Calendar and organizer - dates and to-do items reminder .

All-in-1 Personal Organizer

Bruno Cancellieri

http://www.cancellieri.org/pmo_index.htm

Shareware Windows 2005 Data stored in Microsoft Access relational database

All-in-1 Personal Organizer (APO) is a personal information manager (PIM) with three main uses. First, it's a tool for managing any kind of personal information such as tasks, events, contacts, notes, file links, Web links and executable key scripts. Second, it's an image viewer. Finally, it can be used as a mind stimulator useful for reflection, self-analysis and self-improvement.






AZZ Cardfile

AZZ Cardfile team: Rytis Zumbakis, Antanas Zdramys

http://www.azzcardfile.com/

Shareware Windows 2007 Data stored in XML.

AZZ Cardfile is a Windows program that helps manage any personal information like addresses, phone numbers, references, notes, recipes. It can serve as personal organizer, contact manager, address book, rolodex, personal information manager (PIM) or small database software. Replaces Microsoft Cardfile. Modern customizable user interface, ease of use and extensive features makes this information management software equally suitable for business office or home use.

Backflip Backflip, Inc http://www.backflip.com/login.ihtml

Web application

Backflip gets you back to the good stuff. It's the easiest way to save and share important things you see on the Web. With Backflip's organization and powerful search, you'll never lose anything interesting again. You can use it from any computer. And it's totally free. How does it work? As you discover interesting Web pages, use the Backflip it! button to save them and Backflip will organize them for you. Then, simply go to your Backflip account and you'll find all of your favourite pages filed in your personal directory -- which you can access from any computer.




http://www.backflip.com/login.ihtml



Backpack 37signals http://www.backpackit.com/?source=37s+home

Web application

Intranet, group calendar, organizer Share info, schedules, documents, and to-dos across your company, group, or organization.

Basecamp 37signals http://www.basecamphq.com/?source=37s+home

Web application

Project management and collaboration Collaborate with your team and clients. Schedules, tasks, files, messages, and more.

Bifrost Inbox Organizer

Olle Bälter, Candace L Sidner

http://delivery.acm.org/10.1145/580000/572034/p111-balter.pdf?key1=572034&key2=9273163801&coll=Portal&dl=GUIDE&CFID=21022367&CFTOKEN=33191323

Research prototype

Inbox organiser

BitPim http://www.bitpim.org/

FOSS for CDMA phones: Linux, Mac OS-X, Windows

BitPim is a program that allows you to view and manipulate data on many CDMA phones from LG, Samsung, Sanyo and other manufacturers. This includes the PhoneBook, Calendar, WallPapers, RingTones (functionality varies by phone) and the Filesystem for most Qualcomm CDMA chipset based phones.














Blackberry Research In Motion Limited

http://www.blackberry.net/index.shtml

Smartphone

Campfire 37signals http://www.campfirenow.com/?source=37s+home

Web application

Real-time group chat for business It's like instant messaging, but optimized for groups. Especially great for remote teams.

Chandler OSAF http://chandlerproject.org/

FOSS Linux, Mac OS-X, Windows XP clients and web application

2008 Collaborative information management; an open source Note-to-Self Organizer. It features calendaring, task and note management and consists of a desktop application, web application and a free sharing and back-up service called Chandler Hub.

Citadel FOSS-GPL groupware/BBS for all POSIX-based operating systems

Bulletin board system

Contact Plus Personal 2.7 c

Contact Plus Corporation

http://www.contactplus.com/products/personal/permain.htm









Contactizer proprietary software

Mac OS Calendar and contact management within groups. Previously known as "OD4Contact"

ContactMap

Bonnie A. Nard, Steve Whittaker, Ellen Isaacs, Mike Creech, Jeff Johnson, and John Hainsworth.

http://www.izix.com/pro/lightweight/contactmap.php

C-Organizer Pro 2.4

CSoftLab

http://www.csoftlab.com/C-OrganizerPro.html

CyberDesk Andrew Wood, Anind Dey, and Gregory D. Abowd.

http://www.cc.gatech.edu/fce/cyberdesk/

Daily vX Professional

DEVONtechnologies

http://www.devon-technologies.com/products/devonthink/index.html


Mac OS journal and note taking software

Data Mountain

George Robertson, Mary Czerwinski, Kevin Larson, Daniel C. Robbins, David Thiel, and Maarten van Dantzich

http://research.microsoft.com/~ggr/
















DayPoint Professional

Front Office Communications, Inc

http://www.daypoint.com/Products/DayPointProf.asp

DevonThink Professional

DEVONtechnologies

http://www.devon-technologies.com/products/devonthink/index.html


Mac OS

do-Organizer GemX proprietary software

Microsoft Windows

Contacts, appointments, to-dos, mind mapping, bookmarks

Duck Software: Organizer Software

Technological Solutions, Inc. (TSI): Duck Software

http://www.ducksoftware.com/

Dynomite Lynn D. Wilcox, Bill N. Schilit, and Nitin “Nick” Sawhney.

http://seattleweb.intel-research.net/people/schilit/ldw.pdf

Ecco Netmanage

http://users.rcn.com/wussery/

proprietary software now free to download

Microsoft Windows

Hierarchic outline with assignment to multiple folders, one parent per folder. Information is presented in a dingle pane with a folder grid.

Intranet, group calendar, organizer. Share info, schedules, documents, and to-dos across your company, group, or organization.

Email Reminders Pro for Outlook

Sperry Software

http://www.sperrysoftware.com/Outlook-EmailReminders-Pro.asp





















EndNote ISI ResearchSoft

http://www.endnote.com/enhome.asp

Enfish Enfish/Louise Wannier

http://www.enfish.com/

Entourage, Microsoft


Mac OS

Essential PIM Pro

available as proprietary software or free software

Microsoft Windows

Eudora QUALCOMM Incorporated

http://www.eudora.com/

EverNote proprietary software

Microsoft Windows

Evolution Ximian Inc

http://www.ximian.com/products/evolution/features.html#pim

Evolution, Novell

Novell FOSS-GPL Linux/Unix/GNOME

FeedDemon RSS Reader for Windows

Nick Bradbury, Bradbury Software, LLC

http://www.bradsoft.com/feeddemon/index.asp

FileMaker proprietary software

Microsoft Windows

Formation RadicalBreeze Software


Mac OS Idea and personal information organizer

Fusionpoint Stick-e-NotePad R.I.M.

Fusionpoint Technologies Corp.

http://www.fusionpointtech.com/

GetOrganized99

Web application




















GNOME PIM

GNOME Foundation

http://www.gnome.org/gnome-office/gnome-pim.shtml

Gnowsis Knowledge Management Lab of the DFKI

http://www.gnowsis.org/

Research prototype

2006 RDF; Semantic web

GoalPro Success Studios Corporation

http://www.goalpro.com/

GoBinder proprietary software

Microsoft Windows

Golden Retriever

N-Liter Enterprise

http://www.n-liter.com/

GoldenSectionNotes

The Golden Section Labs, Pacific Business Centre

http://www.tgslabs.com/eng/gsnotes/

GoldMine proprietary software

Microsoft Windows

Google Calendar

Web application

Google Notebook

Web application

GrandView Symantec (John Friend)


Microsoft Windows

Historically-important outliner program

Haystack Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology

http://groups.csail.mit.edu/haystack/

FOSS-MIT Licence

all operating systems with POSIX and Java

RDF; Semantic web













Highrise 37signals http://www.highrisehq.com/?source=37s+home

Web application

Online contact manager and simple CRM Keep track of who your business talks to, what was said, and what to do next.

HTP To-do List


Microsoft Windows

Hula FOSS

iCal proprietary software

Mac OS

Idea Graph Danny Ayers

http://www.ideagraph.net/

Ideaspace proprietary software

Microsoft Windows; Mac OS

ikeepbookmarks.com

Software Designs Development Corp

http://www.ikeepbookmarks.com/

Info Select Micro Logic http://www.miclog.com/is/isdesc.htm


Microsoft Windows

InfoRecall http://www.phantech.com/


Microsoft Windows; Mac OS

Outline type tree structure so you can easily categorize your information.






Inspiration Micro Logic http://www.inspiration.com/index.cfm


Microsoft Windows

Inspiration is a powerful, easy-to-use tool that allows professionals to visually organize and communicate complex topics. Visual diagrams clarify patterns, interrelationships and interdependencies. They also stimulate creative thinking.

Internet Organizer Deluxe

PrimaSoft PC, Inc.

http://www.primasoft.com/deluxeprg/inodx.htm

Ishmail Jonathan Helfman, Charles Isbell, Brian Amento and Gavin Bell.

http://ishmail.sourceforge.net/

JetTask proprietary software

Microsoft Windows

Jot+ Notes King Stairs Software

http://kingstairs.com/

Shareware Microsoft Windows

1993 3.4.3 (31 May 2007); 3.6.0 beta 25/06/2008

Hierarchical note manager, outliner and cardfile

KDE (K Desktop Environment) Office

Richard Moore, Ben Hummon

http://www.kde.org/

Linux/KDE

Keep It Together


Mac OS

Keynote Marek Jedlinski

http://www.tranglos.com/free/keynote_main.html

Kontact FOSS Linux/KDE










http://www.kde.org/

http://www.kde.org/





LeaderCode Personal Information Manager


Microsoft Windows

Lifestreams / Scopeware Vision

Eric Freeman, David Gelernter and Scott Fertig

http://www.scopeware.com

LinkaGoGo linkaGoGo, DBA

http://www.linkagogo.com/

Livelink OpenLink Software Inc.

http://www.opentext.com/livelink

Lookout Lookout Software (Eric Hahn and Mike Belshe)

http://www.lookoutsoft.com/

Maple Crystal Office Systems

http://crystaloffice.com/maple/


Microsoft Windows

Two-pane hierarchical (tree) organiser

MDE Info Handler

MDE Software (Dr. Manfred Derenbach)

http://www.mdesoft.com/eng.htm

Meeting Maker


Microsoft Windows, Mac OS, Solaris, and Linux

MindManager Mindjet http://www.mindjet.com/Default.aspx


Mind map MindJet Connect introduces group collaboration facilities.

More Symantec proprietary software

Mac June 1986 Outliner A historically-important outliner program for the Mac

Mozilla Calendar Project

FOSS-MPL Linux, Windows

http://www.scopeware.com/













http://www.mindjet.com/Default.aspx



My Personal Diary

CAM Development.

http://www.camdevelopment.com/pim/my_personal_diary/default.htm

MyInfo proprietary software

Microsoft Windows

MyLibrary RENCorp

http://www.cribbagepegs.com/myuniqueprograms.html

MyLifeBits Microsoft Bay Area Research Center, Media Presence Group

http://research.microsoft.com/research/barc/MediaPresence/MyLifeBits.aspx

MyLifeOrganized


Microsoft Windows

myNotes proprietary software

Mac OS

MyYahoo Yahoo! Inc http://my.yahoo.com/

Net Snippets

Net Snippets Ltd.

http://www.netsnippets.com/

Newdocs Manuel Arriaga

http://m-arriaga.net/software/newdocms/


















http://my.yahoo.com/

http://my.yahoo.com/








Notes, Lotus http://www-142.ibm.com/software/sw-lotus/lotus/general.nsf/wdocs/lotusprods


Microsoft Windows

Allows all the major information organization techniques to be used in one information space: outlines, graphics, hypertext links, relational databases, free (rich) text, expanding/collapsing reports, collapsing rich text sections, tabbed notebooks (like wizards) and tables. No specific PIM functionality but has been used as the basis for effective GIM in many business organisations.

Now Up-to-Date & Contact


Mac OS, Windows

Office Accelerator

Baseline Data Systems (BDS)

http://baselineconnect.com/product.html

Omea proprietary software

Microsoft Windows

OneNote, Microsoft Office

Microsoft http://www.microsoft.com/office/onenote/prodinfo/overview.mspx


Microsoft Windows

Notable for the multiple ways in which information can be presented: an excellent note-taking environment. Poorer in terms of inter-item linking – integration is left to the mind of the user.

Online FileCabinet

Mike Giles http://www.furl.net

Organizer Deluxe Series

PrimaSoft PC, Inc.

http://www.primasoft.com/shware.htm

http://www-142.ibm.com/software/sw-lotus/lotus/general.nsf/wdocs/lotusprods










http://www.microsoft.com/office/onenote/prodinfo/overview.mspx





http://www.furl.net/

http://www.furl.net/




Organizer, Lotus


Microsoft Windows

Outlook, Microsoft Office

Microsoft Corp.

http://www.microsoft.com/

Outlook, SharePoint, InfoPath, Groove : Microsoft Office

Microsoft http://office.microsoft.com/fr-fr/products/FX100487411036.aspx?pid=CL100571081036


Microsoft Windows

Palm Palm Inc. http://www.palm.com/home.html


Mac OS, Windows

Paper Tiger Harold Tyler

http://www.taylorontime.com/ptigersw.html

PDO (Personal Document Organizer)

Insoft Technologies Inc

http://www.insoft-tech.com/personal%20document%20organizer.htm

Pegasus Mail

David Harris http://www.pmail.com/index.htm

Pepys and Video Diary

Michael G. Lamming, M. A. Eldridge, M. Flynn, and William M. Newman

http://www.xrce.xerox.com/programs/mds/past-projects/video-diary.html



http://office.microsoft.com/fr-fr/products/FX100487411036.aspx?pid=CL100571081036








http://www.palm.com/home.html













http://www.pmail.com/index.htm









PersonalBrain


Microsoft Windows, Mac OS, Linux

PlanPlus FranklinCovey

http://www.franklincovey.com

Plaxo Web application

Powerbookmarks

Li, W-S., Q. Vu, E. Chang, D. Agrawal, Y. Hara, and H. Takano

http://www.teamxweb.com/doc/relatedWork.shtml#PowerBookmarks

Powermarks

Kaylon Technologies

http://www.kaylon.com/power.html

Presto Paul Dourish, W. Keith Edwards, Anthony Lamarca, and Michael Salisbury

http://www2.parc.com/csl/projects/placeless/papers/tochi-presto.pdf

Prophet 2004: Contact manager

Avidian Technologies

http://www.avidian.com/avidian_product.aspx?n=2

Proteus Thomas Erickson

http://www.outliners.com/discuss/msgReader$648?mode=day

http://www.franklincovey.com/


























Queries-R-Links (QRL)

Nipon Charoenkitkarn, Jim Tam, Mark H. Chignell, and Gene Golovchinsky.

http://www.cs.brown.edu/memex/projects.html

Quick2Do 1.0.2

CodeGrid Software.

http://codegrid.tripod.com/

Reader's Helper,The

Jamey Graham, RICOH Research Center, Palo Alto

http://rii.ricoh.com/~jamey

Remember The Milk

Web application

Scopeware Vision

Scopeware, Inc.

http://www.scopeware.com

Secure Notes Organizer

SecureAction Research, LLC.

http://www.secureaction.com/notes/

Semantic Blogging for Bibliography Management

HP Lab http://www.hpl.hp.com/semweb/biblio.htm

Simple Diary

Aaron Whiffin http://www.webbedfeetuk.com/diary/

SixDegree Creo Inc.

http://www2.creo.com/sixdegrees/














http://www.hpl.hp.com/semweb/biblio.htm




http://www.webbedfeetuk.com/diary/






Softwrights Reminder ™

Softwrights, Inc.

http://www.softwrights.com/rmenu.htm

SQLNotes proprietary software

Microsoft Windows

Hierarchic outline with assignment to multiple folders, multiple parents per folder. Information is presented in a dual pane with a folder grid

Not yet released. Author has this tool in beta test version.

Stickies Tom Revell http://www.btinternet.com/~tom.revell

Student Online

Student Online Inc.

http://www.studentonline.com/

Stuff I've Seen

Susan Dumais http://research.microsoft.com/~sdumais/

SurfSaver askSam Systems

http://www.surfsaver.com/

Sync4jMozilla FOSS

Taskmaster Bellotti, Ducheneaut, Howard, Smith

http://peach.mie.utoronto.ca/people/jacek/emailresearch/CSCW2002/submissions/PARC-Taskmaster%20position%20paper.pdf




http://www.btinternet.com/~tom.revell






http://research.microsoft.com/~sdumais/
















TaskView Gwizdka

http://www.cas.ibm.com/archives/2002/papers/cascon02/htm/francais/abs/gwizdka.htm

Tempus Fugit

Daniel A. Ford, Joann Ruvolo, Stefan Edlund, Jussi Myllymaki, James Kaufman, Jared Jackson, and Martin Gerlach.

http://www.research.ibm.com/people/j/jussi/papers/TF/TF-CIKM2001.pdf

THE (The Human Environment)

Jef Raskin

http://humane.sourceforge.net/the/index.html

The Brain TheBrain Technologies Corporation

http://www.thebrain.com/Default.htm

Time & Chaos

Chaos Software (formerly iSBiSTER International, Inc.)

http://www.isbister.com/chaos32.html

TimeStore Byron Long, Kelvin S. Yiu, Ronald Baecker, and Nancy Silver

http://www.dgp.toronto.edu/people/byron/papers/timestore.html






























Tinderbox Eastgate Systems

http://www.treepad.com/

Proprietary software

Mac XML Tinderbox is a personal content assistant that helps you visualize, analyze, and share your notes.

TreePad! Freebyte http://www.treepad.com/

Proprietary software

Tree Structured data-management

Twine Radar Networks http://www.twine.com/

Web application

Automatically organizes information by learning about user interests and making connections and recommendations

A commercial service currently 2008 in beta-test which is claimed to be the first commercially-oriented implementation of semantic web techniques.

Ultra Recall Kinook proprietary software

Microsoft Windows

Link items in multiple locations, create internal links between items, and link to other web pages and files

Umea (User-Monitoring Environment for Activities)

Victor Kaptelinin

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/k/Kaptelinin:Victor.html

VIP Organizer


Microsoft Windows

Visimap CoCo proprietary software

Microsoft Windows

Mind map

Vombato Organizer


Microsoft Windows










VVKB (Visual Knowledge Builder)

Dr. Frank Shipman, Dr. Haowei Hsieh

http://www.csdl.tamu.edu/VKB/

Wayback Machine

The Internet Archive

http://www.archive.org/web/web.php

WebWatcher

T. Joachims, D. Freitag, T. Mitchell

http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-6/web-agent/www/project-home.html

Windows Calendar


Microsoft Windows

WinOrganizer 2.4

The Golden Section Labs, Pacific Business Centre

http://www.tgslabs.com/eng/winorganizer/

Wintermute David Jeske and Scott Hassan

http://neomason.com/wm.cst

WorkplaceMirror

Richard Boardman, Robert Spence, M. Angela Sasse

http://www.iis.ee.ic.ac.uk/~rick/research/pubs/struggle-hcii2003.pdf

Wrike Web application

XLibris Bill N. Schilit, Gene Golovchinsky, and Morgan N. Price

http://www.fxpal.com/?id=xlibris






























Yahoo! Calendar

Web application

Yawas Laurent Denoue and Laurence Vignollet.

http://www.fxpal.com/people/denoue/yawas/

YellowPen YellowPen, Inc. - Steve Brown, John Leibovitz, and Steve Robinsion

http://www.yellowpen.com/ypsite3/product/ypoverview.htm

Yojimbo proprietary software

Mac OS

Zimbra FOSS Web application

Zoot Zoot Software http://www.zootsoftware.com/index.html


Microsoft Windows

2008 (version 5.1)

Zoot offers a highly efficient process for collecting, classifying and prioritizing information so that it can be viewed in meaningful timeframes and contexts.

3.5. The functionality associated with PIMs (extract) The table which follows is an extract from a table compiled by the University of Washington PIM research group (see for example Keeping Found Things

Found 2008). Unfortunately the table has not been updated recently. Our subsequent research will extend and consolidate this classification as necessary.










http://www.zootsoftware.com/index.html



Table 3: The functionality associated with PIMs – extract from our research spreadsheet

Product A

nn

ota

tio

ns a

nd

n

ote

-takin

g

Co

nta

ct

man

ag

em

en

t

Do

cu

men

t m

an

ag

er

Em

ail-c

en

tred

Gen

era

l P

IM

Hyp

ert

ext

au

tho

rin

g t

oo

l

Lif

e p

lan

ner

Meeti

ng

pla

nn

er

Mo

bil

e/P

DA

d

evic

es

Org

an

izati

on

of

han

dw

ritt

en

no

tes

Pap

er

org

an

izer

Pro

ject-

cen

tred

Read

ing

an

d

Su

mm

ari

zati

on

Reco

rd e

very

thin

g

Searc

h a

cro

ss

em

ail,

e-d

ocs a

nd

o

ther

info

rmati

on

form

s

Sem

i-str

uctu

red

o

rgan

izati

on

of

sm

all p

ieces o

f

info

rmati

on

(p

ho

ne

nu

mb

ers

, err

an

ds

to r

un

, b

oo

ks t

o

read

..)

Task m

an

ag

em

en

t

Vir

tual 3-d

V

isu

ali

zati

on

of

info

rmati

on

re

lati

on

sh

ips

Web

org

an

izer

Above & Beyond 2000 PRO y

ACT! y

Agenda, Lotus y

AIM 96 y

All-in-1 Personal Organizer y

AmikaFreedom y

Aquanet y

AZZ Cardfile y

Backflip

Chandler y

Haystack y

Info Select y

(many programs omitted here)

Zoot y

This list is based on that maintained by the University of Washington and found at http://pim.ischool.washington.edu/tools.htm

3.6. An initial classification of personal information and functionalities Before any further classification can be attempted, it is necessary to revisit and extend the list

of functionalities now offered and proposed.

3.6.1. Data: Some useful personal information

After studying several extant PIMs, we suggest the following basic classification of

the information they store:

General information

Lists

(a) Ad hoc

Such as shopping lists.

(b) Repeating

E.g. Inventories, Christmas card lists, etc.

Semi-structured organization of small pieces of information (phone numbers, errands to run, books to read...)

Own information

Examples of such information include:

(a) Passwords

(b) Passport

(c) Health

(d) Social security

Contact management, address books, etc.: details such as

Names

Affiliations, e.g. companies, households

Contact mechanisms

Addresses

Personal details

Qualifications

Competences

Calendar

Appointments and meetings

Significant calendar dates:

(a) Birthdays

(b) Anniversaries

Events, alerts, reminders

Meeting planner / scheduler

(a) Within hierarchies (permanent teams within organisations) and “projects”, that is, people who come together in ad hoc groups in order to complete tasks small and large

Diary / journal

Document management

Organization of handwritten notes

Paper organizer

Reading and Summarization

Message management

Email and instant message archives

Fax communications and voicemail

RSS/Atom feeds

Web organizer

Resource management

Paper filing and archives

Computer files

Photos

Books, CDs, etc.

Key documents

Copy management

To Dos: task management for self and others

Day planning

Reminders

Alerts

Project management: project management features

3.6.2. Processes associated with personal information

Among the processes associated with personal information are these:

Diarising: record “everything”

See Gemmell, Jim & Gordon Bell & Roger Lueder 2006. This uses functions

such as:

Personal notes/journal, annotations and note-taking in multiple media

Transcription between media, e.g. handwriting recognition, voice recognition

Search across email, e-docs and other information forms; across multiple media types

Hypertext authoring: making lists which can refer to other items in the same or linked lists; and making references to external (web-based) items

Synchronisation between computers: Mobile/PDA devices and inter-device synchronisation

Coordination between people in hierarchies and in projects

Visualisation of information resources

Graphing, charting, mind maps etc.

Services and service level management

3.6.3. Towards Personal Knowledge Management and knowledge creation

There is emerging support in some PIMs for:

Classification and contents

User-specified keyword classification of information structured in accordance with user design

Rule-based auto-classification

Tagging

Semantic web approaches, such as semantic desktop

We also observe the desirability of learning from library information science, and

encouraging

Thesauri

A lexicon of terms

3.7. Further candidate data management approaches:

XML documents Before we can better understand these emerging functionalities, it is necessary to

extend the list of candidate data management approaches. We start with XML.

3.7.1. What is XML?

The Extensible Markup Language (XML) is a general-purpose specification for

creating custom markup languages. It is itself a simplified subset of the Standard

Generalized Markup Language (SGML), and is designed to be relatively human-

legible. In some ways it is a successor to, and it certainly follows on from, HTML

(HyperText Markup Language), the language in which web pages have been

expressed since the early 1990s.

XML (eXtensible Markup Language) is a specification developed by the W3C (World

Wide Web Consortium). XML became a formal specification in February 1998, and is

a subset of SGML designed for use on the Internet. Like SGML, XML is a

metalanguage that lets users define their own descriptive markup languages. With

XML, it is possible to create customized tags to surpass the functionality of HTML.

XML is described at XML 2008. One of the motivations for the design of XML is

clearly distinguishing between the content of data, its structure and its presentation.

This point is further developed below in section 3.9.4.

3.7.2. What is SGML?

SGML (Standard Generalized Markup Language) is a language for defining markup

languages such as HTML and for specifying the rules for tagging elements in a

document. SGML itself is not a markup language; rather, it is a language to create

markup languages. SGML supports the definition of markup languages that are

hardware- and software-independent. SGML was developed and standardized by the

International Organization for Standardization (ISO), which published it in 1986 (ISO

8879). Because of SGML's complexity, HTML and XML were developed as

simplified subsets of SGML for use on the Internet.

For more information, see: SGML 2008.

3.7.3. Why is XML important?

Extensible Markup Language (XML) is a simple, very flexible text format derived

from SGML. Originally designed to meet the challenges of large-scale electronic

publishing, XML is also playing an increasingly important role in the exchange of a

wide variety of data on the Web and elsewhere.

3.7.4. How does XML compare with other data management approaches?

XML is an excellent data interchange mechanism, and is very widely implemented. It

is verbose and less efficient than SQL for database-to-database exchanges. But it is

unique in forming the basis for web services and service oriented computing; and as

the basis for the Semantic Web.

3.7.5. Applicability of XML to personal information management

Attempts have been made to build open source PIMs which store information as

XML. A PIM widely used on the Apple Mac platform is Tinderbox, which uses XML

as its internal data storage format.

XML is a metalanguage, used in the specification of target languages. As such, it has

been used to create OPML Outline Processor Markup Language, described at

Wikipedia OPML 2008 in these terms:

―

OPML (Outline Processor Markup Language) is an XML format for

outlines. Originally developed by Radio UserLand as a native file format for

an outliner application, it has since been adopted for other uses, the most

common being to exchange lists of web feeds between web feed aggregators.

The OPML specification defines an outline as a hierarchical, ordered list of

arbitrary elements. The specification is fairly open which makes it suitable

for many types of list data.

‖

Perhaps the main significance of XML is the fact that it has been used as the basis for

web services, service oriented computing and the Semantic Web.

3.8. Further candidate data management approaches: RDF, the basis of a semantic web The Resource Description Framework (RDF) integrates a variety of applications from library

catalogues and world-wide directories to syndication and aggregation of news, software, and

content to personal collections of music, photos, and events using XML as an interchange

syntax. The RDF specifications provide a lightweight ontology system to support the exchange

of knowledge on the Web.

See section 3.9 for a fuller description.

XML has been characterised as a meta-language, and RDF as both meta-language and as meta-

data.

3.9. Ontology and the Semantic Web: Towards Web 3.0? What is the applicability of semantic web approaches to personal information management?

3.9.1. Search or classify?

As we work, we find useful ―stuff‖ and store it – hopefully in a way in which we can

find it again afterwards! (See Fichter, D. 2004).

In a Googled world, there is an increasing trend (and temptation) to invest less in

classification and to trust to the raw power of technology to find things when we want

them. And this approach is not without its merits. Certainly the ability to search the

contents of a PC hard drive using Google Desktop is a frequent lifesaver. But it would

be a rash person indeed who no longer imposed any folder structure on the tens or

hundreds of thousands of files which populate a typical knowledge-worker‘s PC. An

earlier generation recognised the limitations of search in recognising the trade-off

which exists between relevance and retrieval. Google appears to solve that problem by

ordering the results of searches in a ―sensible‖ way – but how sensible is the outcome,

and how sensitive is it to the preferences and fads of earlier searchers?

Ontologies or classification schemes are therefore still necessary in many contexts

(see Chandrasekaran et al 2008) – BUT one person‘s ontology is not necessarily the

best choice for another.

There are areas in which some level of standardisation is beginning to emerge. They

include contact data (e.g. the VCF data exchange format) and calendars. Few current

PIMs use these standards, however.

3.9.2. What is the Semantic Web?

The semantic web is an evolving extension of the World Wide Web in which web

content can be expressed not only in natural language, but also in a form that can be

understood, interpreted and used by software agents, thus permitting them to find,

share and integrate information more easily. It derives from W3C director Tim

Berners-Lee's vision of the Web as a universal medium for data, information, and

knowledge exchange

At its core, the semantic web comprises a philosophy, a set of design principles,

collaborative working groups, and a variety of enabling technologies. Some elements

of the semantic web are expressed as prospective future possibilities that have yet to

be implemented or realized. Other elements of the semantic web are expressed in

formal specifications. Some of these include Resource Description Framework (RDF),

a variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples), and

notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL). All of these are intended to formally describe concepts, terms, and relationships within a

given knowledge domain.

3.9.3. Purpose of the semantic web

Humans are capable of using the Web to carry out tasks such as finding the Finnish

word for "car", to reserve a library book, or to search for the cheapest DVD and buy it.

However, a computer cannot accomplish the same tasks without human direction

because web pages are designed to be read by people, not machines. The semantic

web is a vision of information that is understandable by computers, so that they can

perform more of the tedium involved in finding, sharing and combining information

on the web.

For example, a computer might be instructed to list the prices of flat screen HDTVs

larger than 40 inches with 1080p resolution at shops in the nearest town that are open

until 8pm on Tuesday evenings. To do this today requires search engines that are

individually tailored to every website being searched. The semantic web provides a

common standard (RDF) for websites to publish the relevant information in a more

readily machine-processable and integratable form.

Tim Berners-Lee (Berners-Lee 1998) originally expressed the vision of the semantic

web as follows:

―

I have a dream for the Web [in which computers] become capable of

analyzing all the data on the Web – the content, links, and transactions

between people and computers. A ‗Semantic Web‘, which should make this

possible, has yet to emerge, but when it does, the day-to-day mechanisms of

trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‗intelligent agents‘ people have touted for ages will finally

materialize.

‖

3.9.4. Architectural principles

The principle of separation of concerns (originally coined by Dijkstra, Edsger W.

(1982) and applied by him to the design of computer programs) has been extended in

many directions, notably in suggesting that the structure, content and presentation of data should wherever possible be kept separate. This is a design motivation

highlighted for example by the World Wide Web consortium in a discussion of

Separation of Content, Presentation, and Interaction in the architecture of the world

wide web: for example, in XML and XML-derived languages (W3C 2004).

3.9.5. Realisation: going beyond the hypertext web

Markup

The files on a typical computer can be loosely divided into documents and

data. Documents, like mail messages, reports and brochures, are read by

humans. Data, like calendars, address books, playlists and spreadsheets, are

presented using an application program which lets them be viewed, searched

and combined in many ways.

Currently, the World Wide Web is based mainly on documents written in

Hypertext Markup Language (HTML), a markup convention that is used for

coding a body of text interspersed with multimedia objects such as images

and interactive forms. The semantic web involves publishing the data in a

language, Resource Description Framework (RDF), specifically for data, so

that it can be manipulated and combined just as can data files on a local

computer.

The HTML language describes documents and the links between them. RDF,

by contrast, describes arbitrary things such as people, meetings, and aircraft

parts.

For example, with HTML and a tool to render it (perhaps Web browser

software, perhaps another user agent), one can create and present a page that

lists items for sale. The HTML of this catalogue page can make simple,

document-level assertions such as "this document's title is 'Widget

Superstore'". But there is no capability within the HTML itself to

unambiguously assert that, say, item number X586172 is an Acme Gizmo

with a retail price of €199, or that it is a consumer product. Rather, HTML can only say that the span of text "X586172" is something that should be

positioned near "Acme Gizmo" and "€199", etc. There is no way to say "this

is a catalogue" or even to establish that "Acme Gizmo" is a kind of title or

that "€199" is a price. There is also no way to express that these pieces of

information are bound together in describing a discrete item, distinct from

other items perhaps listed on the page.

Descriptive, and extensible

The semantic web addresses this shortcoming, using the descriptive

technologies Resource Description Framework (RDF) and Web Ontology

Language (OWL), and the data-centric, customizable Extensible Markup

Language (XML). These technologies are combined in order to provide

descriptions that supplement or replace the content of Web documents. Thus,

content may manifest as descriptive data stored in Web-accessible databases,

or as markup within documents (particularly, in Extensible HTML

(XHTML) interspersed with XML, or, more often, purely in XML, with

layout/rendering cues stored separately). The machine-readable descriptions enable content managers to add meaning to the content, i.e. to describe the

structure of the knowledge we have about that content. This way the

machine can process knowledge itself, instead of text, using processes

similar to human deductive reasoning and inference, thereby obtaining more

meaningful results and facilitating automated information gathering and

research by computers.

XML, XML Schema, RDF, OWL, SPARQL: the W3C Semantic Web Layer Cake

The semantic web comprises the standards and tools of XML, XML Schema,

RDF, RDF Schema and OWL. The OWL Web Ontology Language

Overview describes the function and relationship of each of these

components of the semantic web:

XML provides a surface syntax for structured documents, but imposes no semantic constraints on the meaning of these documents.

XML Schema is a language for restricting the structure and content elements of XML documents.

RDF is a simple data model for referring to objects ("resources") and how they are related. An RDF-based model can be represented in XML syntax.

http://en.wikipedia.org/wiki/XML_Schema

http://en.wikipedia.org/wiki/Resource_Description_Framework

http://en.wikipedia.org/wiki/Data_model

http://en.wikipedia.org/wiki/Resource_%28Web%29

RDF Schema is a vocabulary for describing properties and classes of RDF resources, with a semantics for generalization-hierarchies of such properties and classes.

OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry) and enumerated classes.

SPARQL is a protocol and query language for semantic web data sources.

Enhancing the usability and usefulness of the Web and its interconnected resources might be achieved by:

Servers which expose existing data systems using the RDF and SPARQL standards. Many converters to RDF exist from different applications. Relational databases are an important source. The semantic web server attaches to the existing system without affecting its operation.

Documents "marked up" with semantic information (an extension of the HTML <meta> tags used in today's Web pages to supply information for Web search engines using web crawlers). This could be machine-understandable information about the human-understandable content of the document (such as the creator, title, description, etc., of the document) or it could be purely metadata representing a set of facts (such as resources and services elsewhere in the site). (Note that anything that can be identified with a Uniform Resource Identifier (URI) can be described, so the semantic web can reason about animals, people, places, ideas, etc.) Semantic markup is often generated automatically, rather than manually.

Common metadata vocabularies (ontologies) and maps between vocabularies that allow document creators to know how to mark up their documents so that agents can use the information in the supplied metadata (so that Author in the sense of 'the Author of the page' won't be confused with Author in the sense of a book that is the subject of a book review).

Automated agents to perform tasks for users of the semantic web using this data

Web-based services (often with agents of their own) to supply information specifically to agents (for example, a Trust service that an agent could ask if some online store has a history of poor service or spamming).

http://en.wikipedia.org/wiki/RDF_Schema

http://en.wikipedia.org/wiki/Web_Ontology_Language

http://en.wikipedia.org/wiki/SPARQL

An Issue: whose ontology?

If we accept the necessity for imposing some sort of classification

mechanism to achieve accuracy and precision in searching for information,

the next question which inevitably arises is ―whose ontology shall we

adopt?‖. We can identify three broad and overlapping alternatives:

Standardisation by committee (or by employer): top-down imposition

This is frequently done within communities of experts, such as

pharmacists or medical practitioners.

Emergent ontology - ontologies shared between workers in small, often virtual, groups: bottom-up conceptualisation

This situation is common in areas of fast-changing technology or

practice. A common vocabulary and classification system

―emerges‖ and almost imposes itself. Evolution, when it occurs, is

ad hoc.

Specialist programs which recognise or implement user-defined ontology

E.g. Ideaspace 2008.

3.9.6. Semantic Web: current state of the art

Large-scale research prototypes aimed at the corporate level are beginning to emerge.

Their implementation and use is fraught with practical and conceptual difficulties. The

best-known example to date is MIT‘s Simile project (Simile 2008). See also Gnowsis

2008.

3.9.7. Semantic desktop: the semantic Web represented at the small-group level

Introduction

In computer science, the Semantic Desktop is a collective term for ideas

related to changing a computer‘s user interface so that data is more easily

shared between different applications or tasks and so that data that once

could not be automatically processed by a computer could be. It also

encompasses some ideas about being able to automatically share information

between different people. This concept is very much related to the semantic

web but is distinct.

General description

The vision of the semantic desktop can be considered as a response to the

perceived problems of existing user interfaces. Firstly computers cannot get

a great deal of information about the content of files. For example suppose

one downloads a document by a particular author on a particular subject -

though the document will likely clearly indicate its subject, author, source

and possibly copyright information there is no way for the computer to

obtain this information or process it. This means the computer cannot search, filter or otherwise act upon the information as effectively as it otherwise

could. This is very much the problem that the semantic web is concerned

with.

Secondly there is the problem that information stored on a computer can

only be accessed or sorted in a way related to its format. For example e-

mails are stored separately to data files, and both have nothing to do with tasks, notes and planned activities that may be stored in a calendar program,

whilst contacts might be stored in another program, however all these forms

of information might simultaneously be relevant a necessary for a particular

task. Further even if data is all stored as part of the file system it is often

accessed with different applications, even very similar formats may need to

be accessed with different programs - for example a PDF, PostScript,

Microsoft Word and ASCII files are all opened in different programs despite

being essentially the same in content.

Related to this a user will often access a lot of data for the Internet which is

segregated from the data stored locally on the computer, being accessed

through a browser or other programs. As well as accessed data a user has to

share data, often through e-mail or separate file transfer programs.

The semantic desktop is an attempt to solve some or all of these problems by

changing the user interface.

Different interpretations of the semantic desktop

There are various interpretations of the semantic desktop. At its most limited

it might be interpreted as adding mechanisms for relating machine readable metadata to files. In a more extreme way it could be viewed as a complete

replacement to existing user interfaces, which unifies all forms of data and

provides a consistent single interface. There are many degrees between these

two depending on which of the above problems are being dealt with.

Relationship with the Semantic Web

The semantic web is mainly concerned with making machine readable

metadata to enable computers to process shared information, and the creation

of formats and standards related to this. As such the aims of allowing more

of a users data to be processed by a computer and allowing data to more

easily be shared could be considered as a subset of those of the semantic

web, but extended to a users local computer, rather than just files stored on

the internet.

However the aims of creating a unified interface and allowing data to be

accessed in a format independent way are not really the concerns of the

semantic web.

In practice most projects related to the semantic desktop make use of

semantic web protocols for storing their data. In particular RDF's concepts

are used, and the format itself is frequently used.

3.9.8. “Web 2.0”, Social Networking and personal and group

information management

Fichter (Fichter 2004) presents ―solutions to electronic problems in information

management: tools to solve problems such as collecting, organizing, searching and

sharing information online, simplify storing, keeping, searching and sharing Web

resources‖. Such applications tools are sometimes called social bookmark tools and

have many features in common with the older generation of Web-based bookmark

sites and personal information managers. Many want to accomplish tasks quickly and

easily when a useful online resource is available. Such tasks include bookmarking the

site with one click of the mouse and have the Internet domain name and page title

automatically populated into the appropriate fields, ready to edit, jotting down a

comment or description of the site, clipping out important excerpts, and filing it in

categories that are created for suitability, among others. The new breed of social

bookmarking applications offers more than just a universally accessible search and

store facility for links. Such applications assume that once a site is found, not only do

users want to share it with others but users want to discover other related sites and

people who are interested in the same topics.

3.9.9. Semantic wikis

An emerging approach which arguably combines the power of social networking and

formal knowledge representation is that of the so-called ―semantic wiki‖. Wikipedia -

Personal information management 2008 suggests that a semantic wiki is ―a wiki that

has an underlying model of the knowledge described in its pages. Regular wikis have

structured text and untyped hyperlinks (such as the links in this article). Semantic

wikis allow the ability to capture or identify further information about the pages (metadata) and their relations… semantic wikis try to … allow users to make their

internal knowledge more explicit and more formal, so that the information in a wiki

can be searched in better ways than just with keywords, offering queries similar to

structural databases.‖

3.9.10.Emerging commercial products

The first commercial products to exploit semantic web approaches are beginning to

appear. Radar Networks has recently introduced its Twine service. Radar Networks is

claiming to ―pioneer the mainstream adoption of the Semantic Web, or what is

sometimes called ‗Web 3.0‘‖ (see Twine 2008).

3.9.11.Implications

We can identify three basic approaches to identifying and/or creating more effective

personal information management in order to evaluate that effectiveness.

One is to create a unifying ―super-app‖: one program which does everything, bundling

the world into a super PIM/GIM. Two major research prototypes have emerged which

take this approach and use semantic web techniques (notably RDF and OWL). They

are MIT‘s Haystack (Haystack 2006) and the Gnowsis project (Gnowsis 2008).

A much more conventional ―super PIM‖ approach is being taken by a small

Californian start-up company called NeoTech systems with their SQLNotes product

(still on beta test at the time of writing). SQLNotes, should it ever work properly, is

close to a dream or ultimate ―power user‖ PIM, being based as it is on the decade-old

NetManage Ecco application‘s approach but very much better integrated with

Windows and Office. Information can be stored in an outline, in a spreadsheet-like

grid, or in rich text documents within a grid. SQLNotes even permits access to the

relational tables which store its data. However, it may well fall victim to its own

flexibility, because the flexibility is accompanied by a conceptual complexity which

makes its usefulness difficult to grasp and its power difficult to manage.

The second is to take a federating approach in which minimal assembly or composing

of emerging building blocks is undertaken: just sufficient to provide to a very small

community of users, tools of sufficient usefulness to permit the hypotheses of this

study to be investigated and evolved. Sauermann 2005 suggests a possible

architecture, from which this diagram is extracted to give a flavour for what may be

achievable and usable at some point in the next two to three years. The approach is

very interesting but is likely in practice to suffer from serious performance problems.

http://www.gnowsis.org/

Figure 6: Semantic desktop architecture

SOURCE: Sauermann, Leo & Ansgar Bernardi & Andreas Dengel 2005

The work of two research centres is crucial in this context. One is the German

Research Centre for Artificial Intelligence DFKI Gmbh based in Keiserslauten.

Sauermann et al 2005 is a seminal paper in respect of its identification of the

components of the semantic desktop and necessary research directions. The second is

DERI notably at Galway in Ireland. DERI states that its mission is ―to exploit

semantics for people , organisations, and systems to collaborate and interoperate on a

global scale‖ (DERI 2008). Both institutions are involved in the NEPOMUK initiative: the DFKI Knowledge Management Department is in fact the coordinating

organisation. NEPOMUK (Networked Environment for Personalized, Ontology-based

Management of Unified Knowledge) aims to ―bring together researchers, industrial

software developers, and representative industrial users, to develop a comprehensive

solution for extending the personal desktop into a collaboration environment which

supports both the personal information management and the sharing and exchange

across social and organizational relations‖. The approach is technically very

interesting but also very challenging. Its huge potential merit is that it might unify

existing structured information already present on the desktop.

The third approach is consistent with the new phenomenon characterised in a recent

conference as the ―disappearing desktop‖ (see PIM 2008). Increasingly capable client

computers (typically smartphones rather than PCs, at least for a numerical majority of

the world‘s web users) will access semantic networks based on server computers. A

server based approach is typified in the Radar Networks Twine product mentioned

above in section 3.9.10. The biggest single architectural advantage of this approach is

that it makes the mutual recognition of ontological tags very much easier. The

corollary is that the approach might in practice favour the emergence of overbearing

―common‖ tagging schemes which are not, in fact so much emergent as imposed.

3.10.Data storage techniques and their associated metadata – second list

We are now in a position to extend our list of personal information storage approaches:

Technique Metadata Expressiveness and precision

XML The meaning of

an XML

document is

described in an

associated Data

Type Definition (DTD) or

Schema.

Potentially combines the strengths of

outlining and of relational database.

Generalised query languages are

emerging.

RDF and OWL RDF Schema Makes possible the expression of simple forms of knowledge (as opposed simply

to information), and supports processes

like:

User-specified keyword

classification of information

structured in accordance with

user design

Rule-based auto-classification

Table 4: Data storage techniques and their associated metadata – second list

We make the observation that these techniques are very powerful but largely or wholly

unapproachable by end-users in their current raw form.

4. Issues of usability and user acceptance

4.1. User frustrations and their origins Personal information management has not worked well so far for many computer users, no

doubt for a whole lot of reasons, among which may be factors such as these possible ―mini-

hypotheses‖ (more will follow as our research and that of others progress):

Data has no meaning except in context. Context gives meaning and removing context removes meaning. People need to be encouraged to use tools which preserve context and thence meaning.

Effective personal information management needs portable accessible computer resources that currently are not portable enough. Notebooks are heavy, expensive, dependant and lacking in autonomy. Smartphones aren’t smart!

Computers can assist knowledge management. However, structuring knowledge is often alien to the way people want to work.

4.2. Why people use computer-based PIM (and why they don’t): Some observations We can summarise the significance of what we have said so far as follows:

Different kinds of data are best represented in different, and often incompatible, forms

But this makes it difficult to search comprehensively.

No one single PIM approach will work for all groups of computer users. Some will prefer highly expressive, but difficult to query and to manage, general solutions. Others will prefer very packaged, very restrictive approaches which dictate what kinds of information are stored.

Perhaps many computer users might benefit from some of the more sophisticated approaches to

personal information which we have already outlined. Many, perhaps most, will not be able to

realise those benefits without knowledgeable ―hand-holding‖. Achieving a better understanding

of user pain when faced with the complexity of effective personal information management is a

major focus of our ongoing research. Perhaps then too it will be possible to design better

solutions.

4.3. Other issues with current personal information management

4.3.1. Integration

Data quickly risks becoming trapped in isolated islands once we start to use more than one platform on which to store it. A majority of mobile phone users do not exploit the

available synchronisation techniques to consolidate the contact information they have

on their phone with that which they store in a PC-based PIM. The result is

incoherence, lost or obsolete information, and associated user frustration.

Application suites such as Microsoft Office 2007 offer facilities for tying together,

integrating, data stored in different tools on different platforms – but their use implies a level of skill (in so-called end user programming) which is uncommon. Even people

who have the necessary skills would find it very difficult to justify the investment of

time and effort needed to build their own tailored environments.

These issues become more and more difficult with the passage of time. Some people

become locked to a combination of a specific mobile phone platform and a specific

PIM because they depend on the synchronisation facilities which exist between them.

4.3.2. User training and exhaustion

A situation in which people have to search for specific applications to undertake

specific aspects of their information management creates a need for awareness-raising,

for training and for self-learning. And there is a limit to the number of computer

applications which people can cope with; eventually an exhaustion sets in and people

resign themselves to a limited range of approaches.

4.3.3. User ontologies

Most researchers, and many business users and similar professional knowledge

workers, would benefit from being able to classify their information sources and

resources. Some PIMs impose a standard classification or ontology. A few allow

users to devise their own. Almost none allow the sharing of these classification

schemes or ontologies.

4.4. Evaluating user experience We recognise that we are guilty of generalisations and insufficiently-justified assertions in this

area. Further research is indicated. It will be hampered by the needs to:

Ask CURRENT users what their experiences have been and are

Ask why former users drop tools

Ask non-users why they are reluctant to use tools and techniques

5. Conclusions and suggestions for further

research

5.1. Dimensions for classifying personal information

management approaches: an initial summary

5.1.1. Data storage techniques

These are listed in Table 1 and Table 4 above, and can be summarised as:

Technique

Spreadsheets

Relational databases

Outlining and Outliners

Mindmaps

XML

RDF and OWL

5.1.2. An initial classification of personal information and functionalities

See section 3.6.1 for a list of personal information which might be stored in a PIM.

See section 3.6.2 for a list of the processes, associated with the maintenance of

personal information, which might be enabled by a PIM.

See 3.6.3 for a list of knowledge management techniques which might be offered by a

PIM.

5.2. Hypotheses still to be tested We don‘t know how many people actually use programs which can be specifically identified as

PIMs (Personal Information Managers). But although ―all‖ knowledge workers need to store

and manage data which is personal to them, by no means all use a PIM to do it.

The (initial, working) hypotheses follow:

5.2.1. Hypothesis 1

The data-centred approach adopted by most PIMs is not necessarily well

adapted to the working methods adopted by knowledge workers.

Establishing what styles and functionalities appeal to (or repel) different

types of users is not yet well understood.

This hypothesis hides a plethora of mini-hypotheses which will emerge and which we

will refine as we continue our work. We observe, for example, that it is necessary but

not sufficient for a PIM to provide a repository of personal information; specific

functionality is also required. Making PIMs more useful suggests the desirability of

―natural language‖ interfaces, and we will posit the value of a user-interface where the

user can interact with her data: perhaps The Poet and her Muse?

5.2.2. Hypothesis 2

Current PIMs tend to emphasise one particular information management

technique, to the exclusion of others. The absence of complementary

information management techniques is one of the factors which cause

knowledge workers to reject current PIMs.

It is suggested that PIMs are not much used because there are a significant number of

Information Management techniques implemented in various PIMs (and less formally by computer users themselves) which are normally presented as opposing when

instead they should be seen as complementary.

It should be possible for the individual to store and communicate data in a way that is

less closed (less rigid, less structured) and more communicable (able to be understood

by recipients) than current ones. For example, databases are very structured, the real

world is not. We will argue that there‘s a place for ALL of:

Structured data (databases and database-supported products such as contact

managers and email clients).

Semi-structured data (e.g. spreadsheet contents, hierarchical outlines, PIMs

which support the semi-structured organisation of small pieces of

information (phone numbers, errands to run, books to read...).

Rule-based classification (automatic assignment of data items to categories

which the software deems appropriate; the rules may be based simply on data

values, or may take the form of ‗regular expressions‘ - a regular expression

(regex or regexp for short) is a special text string for describing a search

pattern.

Inter-item hyperlinking, folksonomies (Gruber, Tom 2007), semantic tagging

and the like.

The use of a thesaurus and a lexicon of terms to assist in the maintenance and

communication of precise information and more accurate inferencing from

stored data.

5.2.3. Hypothesis 3

PIMs are not much used because PIMs either impose an ontology which

does not correspond to the user’s ontology, or do not permit that ontology

to be made explicit and/or shared. The incorporation of explicit

knowledge representation mechanisms which are tailored to their users’

(plural) needs will make a PIM more useful: by beginning to turn it into a

small-group knowledge manager.

This is a primary example of the functionality we believe to be missing from many

contemporary PIMs.

5.3. Constraints We have described a potentially very rich area of research in which at the moment there is not

a great deal of published material.

We recognise that there are large areas that deserve attention but that we will not have time to

investigate.

In particular, we note with approval the work of Penrose, Roger (1990) – which points to

fundamental limitations on the usefulness of computer-based approaches in this and in other

areas; and the large existing literature on human computer interface (HCI) issues, much of

which has relevance to this enquiry but which we have deliberately excluded from

consideration in this paper.

5.4. Design for Further Research

5.4.1. Further analyse existing PIM / GIM approaches (literature

and software)

Literature review and review of current practice

This will include work on cognate (e.g. linguistic and epistemological)

issues.

Establish a comprehensive list of existing PIM tools and techniques

Choose a small number to concentrate on for subsequent research

Establish Classification and Evaluation criteria

The evaluation of information systems is discussed in Beynon-Davies 2004.

Qualitative research (largely but not exclusively secondary) will be carried

out to identify and classify existing approaches and to characterise them in

accordance with significant dimensions as we identify them.

5.4.2. Identify and/or compose better approaches in order to

evaluate user reaction

The intention is to identify existing tools and approaches, and either minimally to

integrate them, or evaluate them in isolation.

Choose appropriate approaches

Tools

The two tools which we have initially identified are SQLNotes

2008 and Twine 2008.

Techniques

Make available and/or compose improved (or at least, better integrated) basic tool

We hope to identify a small number of tools and to see how users in at least two

communities of practice (Weir & Hutchings 2005) respond to them in a standalone

and in an assisted (accompanied) environment.

5.4.3. Observe and evaluate the behaviour of at least two small-groups of knowledge workers as they confront, learn and

exploit two different GIM approaches

We recognise the need to help people as they learn.

5.5. Research context

5.5.1. Methodological approaches will be based on a combination of:

Experimentation

Action Research (participation in, and concurrent evaluation of, experimental approaches)

Ethnographic approach

in different contexts of use. Two example contexts of use follow: they are Research and Projects and Project Management.

5.5.2. Research as a context of use

Research is archetypal knowledge creation (and therefore management) carried out by

individuals and small groups.

5.5.3. Projects and Project Management (PM) as a context of use

The management of projects is typically represented by mechanisms such as Gantt

charts. A Gantt chart is a specific model which abstracts and simplifies the project. As

such, it is only the static representation of the project.

In projects, functional activities represented as vertical relations interact with project

activities represented as horizontal relations.

Our project is a meta-project for itself, in a recursive relationship with itself.

5.6. Summary We hope that we have succeeded in demonstrating that individuals working in groups should

be encouraged and educated to make better use of the available computer-based tools, and that

the tools themselves should evolve into better ways of representing information and

knowledge.

We recognise that we need further to search for a better understanding of the way people use

these tools and learn new ones, in order subsequently to find strategies on how best to educate

people to make the right choice of the right tools. This paper has suggested a classification

scheme for these tools based primarily on their data representation: e.g. spreadsheet, relational

database, semantic web represented at the desktop level. At least one other dimension of

classification has also been suggested as significant, that of functionality. Usability issues have

been highlighted. The paper has also suggested that a judicious mix of existing and emerging

techniques and tools will permit evolution or revolution in the management of individual and

shared information and knowledge. Establishing the truth of that suggestion is forging our

future research agenda.

6. References

Allen, David 2001 ‗Getting Things Done: The Art of Stress-Free Productivity.‘ Penguin Books.

Beynon-Davies, P. & Owens, I. (2004) 'Information Systems Evaluation and the Information Systems Development Process.', The Journal of Enterprise Information Management, Vol. 17, No. 4. 2004, pp. 276-

282

Boardman, R. & Sasse, M. 2004 'Stuff goes into the computer and doesn't come out: a cross-tool study of

personal information management.‖ Proceedings of the SIGCHI conference on Human Factors, 2004.

Bricklin, Dan & B Frankston (1978) - tech. rep., Lotus Corp., 'Visicalc'

Burnett, Margaret & Curtis Cook & Omkar Pendse & Gregg Rothermel & Jay Summet & Chris Wallace 2003 'End-user software engineering with assertions in the spreadsheet paradigm.' Proceedings of the 25th

International Conference on Software Engineering, Portland, Oregon, 2003. Pages: 93 - 103.

Burnett, M. & Atwood, J. & Walpole Djang, R. & Reichwein, J. & Gottfried, H. & Yang, S. 2001 'Forms/3: A first-order visual language to explore the boundaries of the spreadsheet paradigm.' Journal of

Functional Programming, Vol. 11, Issue 2, pp. 155-206, March 2001.

Buzan, Tony 1996 'The Mind Map Book: How to Use Radiant Thinking to Maximize Your Brain's

Untapped Potential.' Plume, 1996.

Chandrasekaran B & JR Josephson & VR Benjamins 2008 'What Are Ontologies, and Why Do We Need

Them?' Available at http://doi.ieeecs.org accessed 20-06-2008.

Churchman, C.W. (1968) 'The Systems Approach.' New York: Dell. 1968

Codd, E. 1970 'A Relational Model of Data for Large Shared Data Banks'. Communications of the ACM,

Vol. 13, Issue 6, pp. 377-387, 1970.

Date, C.J. (1968) 'SQL Structured Query Language: A guide to the SQL standard.' Addison-Wesley

Longman Publishing Co., Inc. Boston, MA, USA. 1968.

Date, Chris J. 2003 An introduction to database systems 8ed. Addison-Wesley

De Vorsey K., Elson C., Gregorev N., Hansen J. 2006 ‗The Development of a Local Thesaurus to Improve

Access to the Anthropological Collections of the American Museum of Natural History‘ D-Lib Magazine

Volume 12 Number 4 April 2006. Found online at

http://www.dlib.org/dlib/april06/devorsey/04devorsey.html viewed most recently 25-04-2007

DERI 2008 is described at http://www.deri.ie/ accessed 20-06-2008.

Dijkstra, Edsger W. (1982) 'On the role of scientific thought.' In: Selected writings on computing: a

personal perspective. Springer Verlag, 1982.

Ecco 1997 can be found at

http://supportweb.netmanage.com/ts_viewnow/downloads/patchesUnsupported/ecco.asp

EndNote 2008 is described at http://www.endnote.com/ accessed 03-07-2008

Expresso 2008 How to Share Excel Spreadsheets - Free Webinar http://www.expressocorp.com/ accessed

11-07-2008

Fichter, D. 2004 Tools for Finding Things Again. Online; Sep/Oct2004, Vol. 28 Issue 5, p52-56, 5p, 2

charts, 3bw

Freyberg, C.A. 1996 FIND!!!

Gartner Group 2007 'Market Share: Enterprise E-Mail and Calendaring Software, Worldwide, 2004-2006.'

27 July 2007

Gemmell, Jim & Gordon Bell & Roger Lueder 2006 MyLifeBits: a personal database for everything.

Communications of the ACM Volume 49, Number 1 2006, Pages 88-95

Gnowsis 2008 is described at http://www.gnowsis.org/ accessed 20-06-2008.

Gregory M.R. & Norbis M. 2008 'Towards a Systematic Evaluation of Personal and Small Group Information and Knowledge Management‘. Paper presented to 5th International Conference on

Cybernetics and Information Technologies, Systems and Applications: CITSA 2008, in July 2008.

Gruber, Tom 2007 'Ontology of Folksonomy: A Mash-Up of Apples and Oranges.' International Journal on

Semantic Web and Information, 2007.

Haystack 2006 is to be found at http://simile.mit.edu/hayloft/index.html checked 28/07/2008

Ideaspace 2008 is to be found at http://www.ideaspace.com/

IFLANET 1998 'International Federation of Library Associations and Institutions: Functional Requirements for Bibliographic Records. Final Report — 1998. Since frequently updated. Found at

http://www.ifla.org/VII/s13/frbr/frbr1.htm accessed 11/07/2008.

Info Select 2007 Micro Logic Corp., South Hackensack, NJ; 201-342-6518; www.miclog.com.

KDE 2008 is described at http://pim.kde.org/ accessed 20-06-2008.

Kelly, D. 2006 Evaluating Personal Information Management Behaviours and Tools. Communications of

the ACM; Jan2006, Vol. 49 Issue 1, p84-86, 3p

Lotus Domino 2008 is described at http://www-306.ibm.com/software/lotus/products/domino/ accessed

02-07-2008.

Lotus Symphony 2008 is described at http://www-142.ibm.com/software/sw-

lotus/lotus/general.nsf/wdocs/lotusprods accessed 20-06-2008.

Panko, Raymond R. 1998 'What We Know About Spreadsheet Errors.' Journal of End User Computing's

Special issue on Scaling Up End User Development Volume 10, No 2. Spring 1998, pp. 15-21. Available

on the web in an expanded form at

http://www.opssys.com/instantkb/attachments/What_We_Know_About_Spreadsheet_Errors_Whitepaper-

GUID9b35763e2d504ddab36b9e26a4eee631.pdf accessed 11-07-2008.

PIM 2008 The disappearing desktop: PIM 2008 Conference on Human Factors in Computing Systems CHI

'08 extended abstracts on Human factors in computing systems

RefWorks 2008 is described at http://www.refworks.com/ accessed 03-07-2008

Sauermann, Leo & Ansgar Bernardi & Andreas Dengel 2005 Overview and Outlook on the Semantic

Desktop. Proc. of Semantic Desktop Workshop at the ISWC, 2005

SGML 2008 is described at http://www.w3.org/MarkUp/SGML/ accessed 20-06-2008.

Simile 2008 is described at http://simile.mit.edu/hayloft/index.html accessed 20-06-2008.

SQLNotes 2008 is described at http://sqlnotes.wikispaces.com/ accessed 20-06-2008.

Teevan, Jaime & William Jones & Benjamin B. Bederson 2006 Personal information management:

Introduction. Communications of the ACM Volume 49, Number 1 2006, Pages 40-43

Teevan, Jaime & William Jones 2008 'The disappearing desktop: pim 2008.' Conference on Human Factors in Computing Systems CHI '08, extended abstracts on Human factors in computing systems.

Available at

http://portal.acm.org.libezproxy.open.ac.uk/citation.cfm?id=1358628.1358956&coll=Portal&dl=GUIDE&

CFID=76138045&CFTOKEN=51276375 accessed 01-07-2008

Twine 2008 is described at http://www.radarnetworks.com/ accessed 28-04-2008

Ventana Research 2007 'Requirements for 21st Century Spreadsheets: Uses and misuses of a critical business technology: Executive Summary.' San Mateo,CA: Ventana Research, 2007. Found at

http://www.ventanaresearch.com/uploadedFiles/Ventana_Research_Requirements_for_21st_Century_Spre

adsheets_Executive_Summary_FINAL.pdf accessed 11-07-2008.

Visimap 2008 is developed by CoCo Systems and is described at http://www.coco.co.uk/ accessed 23-07-

2008.

W3C 2004 'Architecture of the World Wide Web, Volume One. W3C Recommendation 15 December

2004.' Found at: http://www.w3.org/TR/2004/REC-webarch-20041215/

Weir, D. & Hutchings, K. 2005 ‗Cultural embeddedness and contextual constraints: knowledge sharing in

Chinese and Arab cultures.‘ doi.wiley.com

Whittaker, Steve & Victoria Bellotti & Jacek Gwizdka 2006 Email in personal information management

Communications of the ACM Volume 49 , Issue 1 (January 2006)

Wikipedia - Knowledge representation (2006) Knowledge representation. Permanent link: http://en.wikipedia.org/w/index.php?title=Knowledge_representation&oldid=35105964 Page Version ID:

35105964

Wikipedia - Mind map (2008) Mind map. Permanent link

http://en.wikipedia.org/w/index.php?title=Mind_map&oldid=224639553 accessed 11-06-2008.

Wikipedia - OPML (2008) OPML. Found at http://en.wikipedia.org/wiki/OPML accessed 19/07/2008

Wikipedia - Personal information management (2008) 'Personal information management' found at

http://en.wikipedia.org/wiki/Personal_information_management accessed 18/08/2008.

Wikipedia - Semantic Wiki (2008) 'Semantic Wiki' found at FIND THIS

Wikipedia - Web Ontology Language (2006) Web Ontology Language. Permanent link: http://en.wikipedia.org/w/index.php?title=Web_Ontology_Language&oldid=3421802O Page Version ID:

34218020

XML (2008) is described at http://www.w3.org/XML/ accessed 20-06-2008.