the business of personal knowledge - wordpress.com business of personal knowledge ... bibliography:...
TRANSCRIPT
The Business of Personal Knowledge
Mark GREGORY
Department of Finance and Operations, ESC Rennes School of Business
35065 Rennes Cedex, France
and
Dr. Mario NORBIS
Department of Management, Quinnipiac University
Hamden, Connecticut 06518, USA [email protected]
Conference on Knowledge, Culture and Change in Organisations
Cambridge University, UK
5-8 August 2008
Abstract
Knowledge and information workers work as individuals within virtual team structures. As individuals and as team members, they acquire information, which they store in a number of complex ways: some
paper-based, but increasingly computer-based. There are a number of computer-based tools, sometimes
referred to as Personal Information Managers or PIMs (Kelly 2006 and Teevan 2006) which can assist
in the storage and management of such information. However, little is understood about how people use
these tools, how they learn new ones, the ways in which the tools constrain how people work and think,
and how best to educate people to make the right choice of the right tools. The underlying hypothesis of
the research-in-progress presented in this paper is that individuals working in groups should be
encouraged and educated to make better use of the available tools, and that the tools themselves should
evolve into better ways of representing information and knowledge.
The object of this paper is to present a limited view of current trends in the academic and practitioners‘
literature in the areas of knowledge representation and communication by individuals and small groups
(Boardman et. al. 2004) in search of a better understanding about the way people use these tools and
learn new ones, in order subsequently to find strategies on how best to educate people to make the right choice of the right tools. The paper suggests a classification scheme for these tools based primarily on
their data representation: e.g. spreadsheet, relational database and semantic web represented at the
desktop level (Sauermann, et. al. 2005). Specific difficulties associated with certain of these data
representations are identified. The paper also suggests that a judicious mix of existing and emerging
techniques and tools will permit evolution or revolution in the management of individual and shared
information and knowledge.
Keywords: PIM, GIM, Classification, Knowledge Representation, Semantic Web
1. Introduction Knowledge and information workers work as individuals within virtual team structures. As individuals
and as team members, they acquire information, which they store in a number of complex ways: some
paper-based, but increasingly computer-based. There are a number of computer-based tools, sometimes
referred to as Personal Information Managers or PIMs (Kelly 2006 and Teevan et. al. 2006) which can
assist in the storage and management of such information. However, little is understood about how
people use these tools, how they learn new ones, the ways in which the tools constrain how people work
and think, and how best to educate people to make the right choice of the right tools.
Our earlier paper (Gregory M.R. & Norbis M. 2008) presented the hypotheses that individuals working
in groups should be encouraged and educated to make better use of the available tools for information
management and that the tools themselves should evolve into (or be replaced by) better ways of
representing information and knowledge. In that paper, we started to classify and evaluate the
effectiveness of existing tools and techniques by firstly summarising current trends in the academic and
practitioner‘s literature in the areas of knowledge representation and communication by individuals and
small groups; and then proposing a methodology for evaluating them in the tradition of the systems
approach originally formulated by Churchman 1968. Our earlier paper suggested a classification
scheme based primarily on their data representation.
The object of this paper is to present a limited view of current trends in the academic and practitioner‘s
literature in the areas of knowledge representation and communication by individuals and small groups
(Boardman, et. al. 2004) in search of a better understanding about the way people use these tools and
learn new ones, in order subsequently to find strategies on how best to educate people to make the right
choice of the right tools.
This paper begins to develop a multidimensional classification scheme for these tools based
Not only on their data representation - e.g. spreadsheet, relational database, semantic web represented at the desktop level (Sauermann et. al. 2005)
This dimension was suggested in our earlier paper and is developed here. It
is the “how to” of personal information management.
But also
The various functionalities (useful features) these tools offer
This is the “what” of personal information management.
Issues of usability and of user acceptability
This is the “why” (and why not!) of personal information management.
At this stage, we are setting out a research agenda; we do not yet have full answers to the questions that
we will present today and which will form the basis of our further work over the next couple of years.
2. Representation of personal data The ways in which data is stored on a computer influence how it can subsequently be used. We
therefore identify several possible, or candidate, data representation approaches and analyse the
consequences of choosing them.
2.1. Personal information management: a brief recap Many of us keep a wide range of personal data, which we classify or sub-divide into areas such
as:
Agenda: list of appointments
Address book: our contacts
To Do list
Most attendees at this conference keep more specialised (but still widely-used) data such as
Bibliography: reference list
Reading notes
Project logbook
Some of us do this primarily on paper, in spiral notebooks or perhaps in more-specialised
diaries and the like.
Many of us also or alternatively use personal computers (desktop or notebook), digital PDA
(Personal Digital Assistant) devices or smartphones.
Some of us work in contexts where this kind of information is no longer exclusively ours, and
we choose to (or are obliged to!) share and merge this kind of information.
All of us are of course very careful to copy this personal data from one device to another, in
order to safeguard it from corruption or loss. Some of us take additional care to synchronise
this data; that is, when we store a new contact detail on our smartphone, we subsequently
synchronise it into our desktop environment. An obligation to share this kind of data occurs if
we have a secretary or administrative assistant who also collects this kind of data on our
behalves.
2.2. Personal Information Management: Make or Buy?
2.2.1. Basic data management tools used for personal information management include spreadsheets and databases Spreadsheets are a very powerful combination of the nearest approach to widely
available end-user computer programming ever invented; and ways of storing (more
or less) structured data in which the relationship between items of data is imposed by
the use of formulae.
Later in this document, we will point out some of the shortcomings of spreadsheets,
which are the flip-side, the obverse, of their expressive power.
Databases generally have a more limited remit which they fulfil with greater precision
than do spreadsheets. The most widely accepted, implemented and used type of
database is the so-called ―relational‖ database (Date 2003). He suggests as an informal
initial definition that
―
A relational system is one in which the data is perceived by the user as tables
(and nothing but tables); and the operators at the user‘s disposal (e.g. for data
retrieval) are operators that generate new tables from old. For example, there
will be one operator to extract a subset of the rows of a table, and another to
extract a subset of the columns – and of course a row subset and a column
subset of a table can both be regarded as tables themselves. The reason such
systems are called ‗relational‘ is that the term ‗relation‘ is essentially just a
mathematical term for a table.
‖
It is possible to use spreadsheets and database management systems as the means by
which personal data is stored, in other words, as the means by which a given
individual carries out personal information management. In effect, the computer user
who chooses this approach is making her own specialised lists of data which is
important to them.
The choice between spreadsheet and database is actually not straightforward, and
many computer users make an inappropriate choice based on imagined or real self-
imposed constraints. These constraints include the competence that the user has with
such tools.
What spreadsheets are good at
Spreadsheets combine conceptual simplicity, very powerful data manipulation and analysis facilities, and good information presentation facilities
Spreadsheets seem to most end-users to be easier to design and to use than do databases
It is comparatively easy to evolve the form of a spreadsheet as the context of its use changes
Functions make it easy to use previously programmed data analytical techniques
It is possible to program new functions, or to have them written for you so that you can use a specific data analytical technique
Recent spreadsheet packages have excellent information presentation facilities and they also connect very well to other office programs such as word processors and presentation graphics programs
Some problems with spreadsheets, and some indications of why databases may be “better”
Spreadsheets are by their very nature highly insecure – anyone who can access a spreadsheet can see all the data in that spreadsheet; industrial strength databases make it impossible for users who are not privileged to change, or even to see, data: to do so
Spreadsheets can rapidly become very complex, and it is very difficult to understand what the overall structure of the spreadsheet is; as a result, they can become a nightmare to maintain
It is difficult for more than a very small number of people to use a single spreadsheet at one time, and almost impossible to stop them from interacting with each other, often in a conflicting way
Spreadsheets can handle at the most a few thousand records; databases can handle millions
Databases can support tens, or even thousands, of simultaneous users
Personal (small-scale) database management programs exist But are often badly used.
The best known examples are Microsoft Office Access and OpenOffice.org
Base.
2.3. Candidate data management approaches: Spreadsheets Spreadsheets consist of an array of cells, each of which can store a value or a formula. A
formula relates the value of the current cell to other cells which can be considered as exporting
their value to be used in the formula.
2.3.1. Spreadsheets in general
Dan Bricklin (Bricklin 1981) originated VisiCalc, the first application that turned the
personal computer from a hobby for computer enthusiasts into a business tool.
VisiCalc went on to become the first "killer app", an application that was so
compelling, people would buy a particular computer just to own it. In this case the
computer was the Apple II.
The acceptance of the IBM PC following its introduction in August, 1981, began
slowly, because most of the programs available for it were ports from other 8-bit platforms. Things changed dramatically with the introduction of the Lotus 1-2-3
spreadsheet package in January, 1983. It became the PC platform's so-called killer
app, and drove sales of the PC due to the improvements in speed and graphics
compared to VisiCalc. See Lotus Symphony 2008.
Investigations in various organisations suggest anecdotally that typical knowledge
workers possess tens or hundreds of spreadsheets. (There are 3615 spreadsheet files
on the hard disk of one of the authors.)
Figure 1 shows a spreadsheet being used to store bibliographic data, in fact, the list of
references on which this document is based:
Figure 1: Spreadsheet being used to store personal research data
2.3.2. Problems associated with spreadsheets
There are many problems associated with spreadsheets.
Panko, Raymond R. 1998 suggests that
―
Many spreadsheets are large and complex, and development often involves
interactions among multiple people. In fact, we would guess that the largest
portion of large-scale end user applications today involve spreadsheet
development.
In recent years, we have learned a good deal about the errors that people
make when they develop spreadsheets. In general, errors seem to occur in a
few percent of all cells, meaning that for large spreadsheets, the issue is how
many errors there are, not whether an error exists. These error rates, although
troubling, are in line with those in programming and other human cognitive
domains. In programming, we have learned to follow strict development
disciplines to eliminate most errors. Surveys of spreadsheet developers
indicate that spreadsheet creation, in contrast, is informal, and few
organizations have comprehensive policies for spreadsheet development.
Although prescriptive articles have focused on such disciplines as
modularization and having assumptions sections, these may be far less
important than other innovations, especially cell-by-cell code inspection
after the development phase.
‖
Ventana Research 2007 reports on a survey they undertook of actual user experience
in the use of spreadsheets. In addition to the observations already generally accepted
concerning spreadsheets (that they are error-prone and difficult to use in a team
context) they add the observations that they are often used for tasks to which they are
badly-adapted because they are perceived as free (ignoring the hidden costs which
then follow); and that they are difficult to combine, especially between enterprises.
In our opinion, these problems often stem from the fact that there is no accepted
methodology to define and document the requirements of a spreadsheet.
Ventana Research 2007 suggest also that the need to audit spreadsheets may push
organisations in a direction they consider advisable, that of identifying or creating
formal applications which supplant spreadsheets for some of their common uses –
notably budgeting, calendar management and the like.
Spreadsheets are frequently misapplied to relatively large business problems to which they are badly-adapted. Indeed, Figure 1 shows a spreadsheet being used to store
bibliographic data. There are some clear advantages in the approach. In the example
above, a link is made between a reference and a copy of the referenced document
stored on the same computer as the spreadsheet. This is done using a formula whose
use is well understood by many spreadsheet users. But in fact the example only works
because the user of the spreadsheet knows and respects the rules for ―well-formed‖
references. It is difficult to carry out the complex data validation which should be
imposed on bibliographic detail. In addition, the same formula uses a user-written
function (exists_file) which is, in Excel, expressed in Visual Basic for Applications
(VBA). VBA is a programming language and as such is inaccessible to a large
proportion of spreadsheet users.
See Burnett, M. & Atwood, J. & Walpole Djang, R. & Reichwein, J. & Gottfried, H.
& Yang, S. 2001 for a further discussion of spreadsheet shortcomings and other
suggestions of ways forward. See also Burnett, Margaret & Curtis Cook & Omkar
Pendse & Gregg Rothermel & Jay Summet & Chris Wallace 2003 for specific
suggestions on encouraging end users to profit from formal software engineering
methodologies, specifically making assertions about their spreadsheets in order to
achieve greater correctness and greater efficiency.
2.3.3. The role of spreadsheets in personal information management
Spreadsheets are very widely used (and as we have seen, misused) for storing personal
information. The use of informal techniques of sharing spreadsheets and of more
formal techniques such as that suggested by Expresso 2008 mean that spreadsheets
can be used in small-scale group information management systems. Ventana Research
2007 document the pervasiveness of spreadsheets, and confirm their value for
processes such as one-off ad hoc reporting and the prototyping of requirements for
what should subsequently be re-engineered into, or acquired ready made as, formal
applications to support specific management processes.
2.4. Candidate data management approaches: Relational databases The currently dominant approach, the relational database paradigm originally suggested by
Codd, E. 1970 and expanded upon by Date, Chris J. 2003 enables arbitrary manipulation: that
is to say that queries can be defined which will always have an answer. However, the data is
constrained to appear in normalized relations or sets or entities – these terms are equivalent;
they are implemented as data base tables.
2.4.1. Advantages of the relational approach
The following brief analysis is taken from Indiana University (n.d.):
―
A database is a collection of data, which is organized into files called tables.
These tables provide a systematic way of accessing, managing, and updating
data. A relational database is one that contains multiple tables of data that relate to each other through special key fields. Relational databases are far
more flexible (though harder to design and maintain) than what are known as
flat file databases, which contain a single table of data.
To understand the advantages of a relational database, imagine the needs of
two small companies that take customer orders for their products. Company
A uses a flat file database with a single table named orders to record orders they receive, while Company B uses a relational database with two tables:
orders and customers.
When a customer places an order with Company A, a new record (or row) in
the table orders is created. Because Company A has only one table of data,
all the information pertaining to that order must be put into a single record.
This means that the customer's general information, such as name and address, is stored in the same record as the order information, such as
product description, quantity, and price. If customers place more than one
order, their general information will need to be re-entered and thus
duplicated for each order they place.
Whenever there is duplicate data, as in the case above, many inconsistencies may arise when users try to query the database. Additionally, a customer's
change of address would require the database manager to find all records in
orders that the customer placed, and change the address data for each one.
Company B is much better off with its relational database. Each of its
customers has one and only one record of general information stored in the
table customers. Each customer's record is identified by a unique customer code which will serve as the relational key. When a customer orders from
Company B, the record in orders need contain only a reference to the
customer's code, because all of the customer's general information is already
stored in customers.
‖
Chen 1976 introduces the analysis and design issues which surround the effective use
of relational databases.
2.4.2. Disadvantages of the relational approach applied to personal and group information management
Freyberg, C.A. 1996 reports that the teaching of relational database design and
construction is a major challenge for teachers of introductory Information Systems
courses. The author himself has long experience of the difficulty of teaching end users
and non-technical students to design and use relational databases. Nevertheless, it is
possible to do this and for business students and professionals to design
straightforward relational databases and to implement them using products such as
Microsoft Access and OpenOffice.org Base.
Unfortunately, the ER model for even a simple PIM application is complex and runs
to many entity types. It is unlikely that there is great value in doing this when such
simpler requirements are well met by existing packaged solutions.
If any attempt is made to extend a model into more specific domains, the model can
become very complex indeed. IFLANET 1998 documents an entity-relationship
model for bibliographic records. The major entities are Work, Expression,
Manifestation, Item, Person, Corporate Body, Concept, Object, Event and Place. The
description of the functional requirements for a system to store this kind of data runs
to 136 pages – for what would be a small part of the information storage requirements
of a librarian or an academic or a student.
Faced with this complexity, a typical response is to eschew the advantages of a user-
specific database and instead to acquire a general PIM application (such as Microsoft
Outlook) and a specialist packages for each major type of data to be managed (such as
EndNote 2008 or RefWorks 2008 for bibliographic data).
2.5. Bought-in solution: “PIM” (Personal Information Manager) Various so-called ―PIM‖ (Personal Information Manager) tools have been developed and
marketed with varying degrees of success. We present a list of over 150 such programs and
services in section 3.4.
Effectively using spreadsheets (or even more so, databases) involves a level of planning and
organisation which not every business professional or knowledge worker can do well. As a
consequence, over the years, a plethora of more-or-less business-focussed application programs
have been devised to ease the task of storing and retrieving personal information such as
contacts (addresses), appointments, tasks and the like. These tools are frequently based on an
underlying relational database, whose existence may be visible to the user or hidden from her.
Currently, the most widely used such tool is Microsoft Outlook, which additionally provides
access to the facilities of an email system by means notably of the user‘s email inbox. Outlook
(and similar programs) are widely used to manage a user‘s contacts (individuals and
organisations) and the emails received from them and sent to them. The dominance of
Microsoft Outlook in the marketplace can be explained by the fact that a free version, Outlook
Express, is shipped with the huge majority of PCs when they are manufactured.
Outlook is typically configured in the enterprise to act as the client or front-end to a server,
frequently but not exclusively Microsoft Exchange Server (see next section). Outlook
integrates particularly effectively with Microsoft‘s Office software suite (Microsoft Office
2007), which is currently by far the most widely used office suite. Office incorporates many
programs (depending upon its version), including the word processor program Word, the spreadsheet program Excel, the presentation graphics program PowerPoint, and sometimes the
relational database Access.
According to a Gartner Group commercial report (Gartner Group 2007) in the large-enterprise
market for corporate email clients and messaging, Microsoft (Outlook integrated with
Exchange Server) still maintains its lead with a 47.8 percent market share, compared to IBM's
42.3 percent (Lotus Domino with Lotus Notes desktop client). In practice, Microsoft has a much larger lead when the huge numbers of standalone PCs which are not integrated into
corporate systems are taken into account. Here, Outlook has a crushing dominance over Notes.
Many more-focussed commercial PIM packages have been proposed over the years, but none
has been able to impose itself in the market in the face of the simple reality that Office and
Outlook appear on most corporate desktop and laptop computers and an increasing number of
smartphones. However, Outlook is not as such a PIM, and provides limited PIM functionality somewhat grudgingly (author‘s evaluation). Outlook offers good email management facilities,
adequate contact (address) management, and facilitates an arguably-lazy but very widespread
approach to time management, that of using the email in-tray as a way of tracking unfinished
tasks (as a ‗to do‘ list). See Whittaker, Steve & Victoria Bellotti & Jacek Gwizdka 2006.
In the open-source world, the Lightning and Thunderbird developments have provided an
effective email capability (but little more at this stage, although they have aspirations towards
becoming a more complete PIM). Similarly KDE 2008 has plans to spawn a PIM. Perhaps the
most cogent immediate threat to Microsoft is, however, a combination of the various Google
Docs utilities, and in particular their very powerful Gmail service.
2.6. Bought-in solution: “GIM” (Group Information
Manager) Various ―GIM‖ (Group Information Manager) tools have been developed and marketed with
varying degrees of success
The most-established such tool is IBM‘s Lotus Domino 2008 family of applications. This
incorporates Lotus Notes, which has been widely used to provide email client and document
storage and retrieval facilities which arguably constitute the basis for group information
management. More recently, Microsoft has introduced a raft of related tools which address the
same basic market need. Based on Microsoft Exchange Server (Microsoft Exchange Server
2007) and Microsoft SharePoint, these tools (just as those proposed by IBM) share as
characteristics:
An emphasis on structuring data and information so as to encourage its sharing and reuse
Dependence on computing professionals to set up and maintain the shared document store and/or database
2.7. Candidate data management approaches:
Outlining and Outliners An outline is a hierarchical way to display related items of text to graphically depict their
relationships. Outlining is a technique which may be implemented in general office programs
or in specific computer programs known as ―outliners‖. An outliner is a special text editor that
allows text to be structured as an outline. Outliners are typically used for computer
programming, collecting or organizing ideas, Getting Things Done (a time management approach espoused by Allen 2001, or project management. Outlining is the technique widely
used in programs such as Microsoft Office PowerPoint, in which the main headings of a
presentation appear as separate slides and on each slide appear points and sub-points. The same
technique is available in a more powerful but perhaps less widely-used form in word
processing packages such as Microsoft Office Word, which supports a very useful Outline
mode.
An outliner is a program which stores and depicts outlines.
Outliners have a long history as tools on PCs. The best example known to the authors is
NetManage ECCO Pro, which has not been updated by its publisher for over a decade but is
still extensively used and even updated by means of object-code patches (the source code still
being jealously guarded by its publisher). Another well-used program is Micro Logic‘s Info Select 2007, described a little below. The internal data structure of these programs is similar. A
data item is given meaning by being shown in its owning hierarchy. Thus a person‘s surname is
a component of a composite Contact object.
Realised in Word and formatted in a particular way, an outline has an appearance similar to:
Figure 2: Outline formatted as a hierarchy of points, sub-points, sub-sub-points.
Here, the owner in the hierarchy as shown is 11. Semantic Web. It is the eleventh point in a
document – it is implicitly owned by the document of which it forms a part.
It owns items 11.1, 11.2, 11.3, …
11.3 owns 11.3.1, 11.3.2, …
The owning item for 11.2, 11.3 … is 11.
The relative positioning of an item conveys meaning in that the label of the owner classifies or
otherwise gives contextual information concerning the owned item; and the depth in the
hierarchy gives some idea of the relative importance or significance of the item.
Part of the genius and the weakness of these programs is that the user has considerable control
over the structuring of data. Both Ecco 1997 and Info Select 2007 permit the definition of
forms to impose some order on anarchy. A second aspect of their genius is that a data item can
participate in more than one hierarchy. Thus for example an appointment for a meeting can
appear in an overall agenda or calendar, but also be linked to the name of each participant in
the meeting. Effectively, the same datum is classified in more than one way. To the extent that
knowledge is a product of the recognition by intelligent agents of connections between
information otherwise not explicitly linked, this kind of tool can be used as a mechanism for
storing relatively unsophisticated knowledge.
To give a flavour of this kind of tool, consider this screen capture from Ecco. In Ecco, a grid
can be superimposed on the outline. The column headers of the grid are the names of folders,
that is, named sets of data values.
Figure 3: Ecco screenshot
This screenshot shows a user‘s diary or calendar, and the associated phone-book item for the
current appointment. At the left-hand side of the screen capture is the folder hierarchy.
This program, and others like it, combine very powerful data structuring with relatively easy to
use (and understand) basic PIM ―functionality‖ in terms of diary, contact management and the
like.
2.8. Candidate data management approaches: Mindmaps Buzan 1996 has highlighted mind maps as a means of diagrammatically representing ideas and
the connections between ideas.
Wikipedia - Mind map (2008) reports that:
―
A mind map is a diagram used to represent words, ideas, tasks, or other items linked
to and arranged radially around a central key word or idea. It is used to generate,
visualize, structure, and classify ideas, and as an aid in study, organization, problem
solving, decision making, and writing.
It is an image-centred diagram that represents semantic or other connections between
portions of information. By presenting these connections in a radial, non-linear
graphical manner, it encourages a brainstorming approach to any given organizational
task, eliminating the hurdle of initially establishing an intrinsically appropriate or
relevant conceptual framework to work within.
A mind map is similar to a semantic network or cognitive map but there are no formal
restrictions on the kinds of links used.
‖
Mind maps can be created using software. See for example Visimap 2008, produced by CoCo
systems, which the company describes in these terms:
―
The innovative VisiMap Professional 4.1 is a unique creativity- and productivity-
enhancing application for Microsoft Windows® that saves you valuable time in your
day-to-day work and offers you new flexibility in exploring and organising your
thoughts.
It graphically records, structures and clarifies the results of your creativity so that they
can be used, reused and communicated effectively.
Based on the usefulness and simplicity of graphical 'visual maps' (similar to what are
variously called idea maps or brain maps), VisiMap Professional adds efficient data
entry, automatic layout, striking presentation, powerful map structuring, manipulation,
and printing features, and sophisticated document import and export facilities to create
an invaluable asset that produces visual solutions to all kinds of business and personal
applications.
‖
In VisiMap, an outline can be presented both diagrammatically as a mind map and also as a
text outline in Microsoft Word format.
Mind map software such as VisiMap or is frequently used in personal information management
applications.
The screenshot below, taken from early material on a planned enhancement to the SQLNotes
2008 PIM, gives the flavour of how such information looks when presented as a mindmap
(Buzan 1996):
Figure 4: Outline formatted as a mind map
2.9. Summary: making and buying personal and group information management Making personal information management can be achieved using spreadsheets. This should
normally be reserved for one-off ad hoc reporting and the prototyping of requirements for what
will subsequently be re-engineered into, or acquired ready made as, formal applications to
support specific management processes. It is also possible for business students and
professionals to design straightforward relational databases and to implement them using
products such as Microsoft Access and OpenOffice.org Base; this can be done for small
subsets of personal information which are specialised or very important to the user.
Buying (or otherwise acquiring) a specialist PIM or GIM tool is arguably a much more
sensible way of managing personal data than devising complex spreadsheets or devising
comprehensive databases.
But only a small proportion of knowledge workers buy PIMs, and even less of them persist in
using them. Why?
3. PIM Functionality: What PIMs do This section firstly summarises the meaning of data, before proceeding to list PIMs identified by the
authors and beginning to identify and classify their associated functionality, that is, what users can do
with them.
3.1. The meaning of data: semantics Making lists and storing them is not rocket science. In fact, it isn‘t even science. A list is only
as useful as the meaning it conveys. Consider this list of (what most of us will read as) girls‘
names:
Andrea 2007
Chantal 2007
Gabrielle 2007
What is this? Three members of a hockey team?
The addition of a column heading changes the story a little:
Hurricane name Year used
Andrea 2007
Chantal 2007
Gabrielle 2007
What we have done is to classify the data, by naming the sets. The process of labelling or naming data gives so-called semantic significance to the data. To be meaningful, data needs
syntax (rules for content and formatting) and semantics (rules for meaning). An alternative and
equivalent formulation is that data needs metadata to give it significance. Classification is
fundamental to science and to knowledge.
3.2. Structure and meaning
3.2.1. To make use of any computer based personal information management tools, we have to “structure” our data Computer users voluntarily sacrifice freedom in favour of structure in order to
facilitate storage, retrieval, and especially more precise querying (answering ad hoc
questions about the data) and communication; but they still do not achieve the level of
communication that they strive for.
In order to use computers we have traditionally needed to sacrifice, to limit the
expressiveness, of the information stored, where expressiveness is defined as the
ability to communicate meaning.
Well-structured data can be queried with greater precision; that is, more accurate and
complete answers can be obtained to questions about the data.
To illustrate this point. If we extend the example above:
Hurricane name Year used Meteorologist
Andrea 2007 John Smith
Chantal 2007 Methuselah Gabrielle
Gabrielle 2007 Chantal Legros
With data structured in this way, we can achieve precise answers to different queries:
Which hurricanes have been named “Chantal”?
Answer: one - Chantal
Which hurricanes have been named by a meteorologist called “Chantal”?
Answer: one - Gabrielle
Which hurricanes have been named after the meteorologist?
Answer: none
Note that free-text searching of the content alone, without taking into account the
structure of the data, would give imprecise (inaccurate) answers.
But how do we express meaning? After 50 years of ―advances‖ in Computer
Information Science we still do not know how to do this particularly well – and the
situation is arguably worst at a most crucial point for productivity – the work of the
individual knowledge worker, who is provided only with basic tools in which
integration remains unintuitive. Indeed, each tool tends to highlight one or two
information storage and presentation techniques to the exclusion of others. She then
resorts to approaches such as managing tasks by leaving emails in the inbox, and
keeps lists in linked spreadsheets. This creates isolated islands of under-managed and
difficult-to-integrate data.
3.2.2. Structure imposed centrally is essential in some contexts and inimical in others We sacrifice freedom in favour of structure in order to facilitate storage, retrieval, and
especially querying (answering ad hoc questions about the data) and some aspects of communication; but we still do not necessarily achieve the level or effectiveness of
communication that we strive for.
Some data is very clearly the property of a worker‘s employing enterprise, and some
needs to a greater or lesser degree to be held and managed centrally. Standards vary
widely according to the objectives and style of the organisation. A worker in a client call centre may not be permitted to store any data locally on a company owned
computer. Conversely, universities may actively encourage information sharing. More
common perhaps is controlled shared information – as in medical practice or business
consulting. Many organisations seek to impose a standard way of capturing and
storing data, which meets some purposes but defeats others.
3.3. Data storage techniques and their associated metadata – first list If we revisit some of the techniques used for storing personal information, we see that
somewhat different linguistic rules and resulting expressiveness are associated with each. The
table gives some examples:
Technique Metadata Expressiveness and
precision
Spreadsheets Pragmatic – the meaning of the data is not explicit, but is partially
expressed in column and/or row
headings; and partially in
relationships between cells.
Potentially very expressive and
frequently imprecise or
even contradictory.
Charting permits
visually-arresting
representations of some
of the underlying data.
Relational databases If the data is normalised (Codd 1970; Date 2003), then the column
headings name sets of atomic (non-
divisible) data items. This is
deliberately constricting, because
human-readable metadata, in the
form of a natural language
description (name) for each
attribute, can be exploited by users
as they enquire from the data,
enabling precise answers to questions they have. These names
can be extended by a data
dictionary (which, however, is
often not accessible to the end-user
of the data in he database).
Deliberately very restricted
expressiveness. All data
is constrained to appear
as tables to permit
generality and precision
of subsequent querying.
The results of queries
are themselves virtual
tables constructed from
the original input data.
Outlining and
Outliners
The relative positioning of the items in a hierarchy groups and
classifies data; and associates
meaning with each group and sub-
group. The addition of a grid
permits further structuring.
Hierarchies themselves are cognitively
powerful or not
depending on the prior
training of the user. The
addition of a grid adds
expressiveness.
Mindmaps The relative positioning of the items in a diagram groups and
classifies data; and associates
meaning with each branch and sub-
branch. An image is (potentially)
associated with each branch or
sub-branch
Visually very powerful, the user perceives both
structure and meaning.
Querying is very
imprecise or non-
existent.
Table 1: Data storage techniques and their associated metadata – first list
We suggest that the relationship between Generality and Meaning/Focus is a trade off. We can
express it mathematically as G * M = constant – in the same way as we observe in the
thermodynamics of gases P*V = constant. If we graph this speculation, we get:
Trade-off, Focus versus Generality
0
0,2
0,4
0,6
0,8
1
1,2
1 2 3 4 5 6 7
Generality: expressiveness
Fo
cu
s:
pre
cis
ion
of
qu
ery
ing
Figure 5: Posited relationship between generality and meaning (or focus)
There may also be user indifference curves of a similar shape relating features and usability.
These speculations will be developed as hypotheses in subsequent research.
3.4. Some PIM packages The table which follows lists the PIM (and GIM) packages which we have so far been able to identify. The sources include Wikipedia - Personal information management 2008, Keeping
Found Things Found 2008 and our own developing research.
Table 2: An initial list of packages which provide (or can be used to provide) PIM functionality
Product Publisher URL Licence type Platform(s) First appeared Most recent version
Internal data storage and external presentation approach
Notes
24SevenOffice
24SevenOffice http://www.24sevenoffice.com/webpage/en/
proprietary software
Web application
2008 ERP/CRM - contains collaboration module
Above & Beyond 2000 PRO
1soft http://www.1soft.com
proprietary software
Windows 2008 PIM/GIM
ACT! Sage http://www.act.com
Windows, PalmOS, Windows Mobile
2008 Contact management
ActionOutline Green Parrots Software
http://www.actionoutline.com/
Windows 2.1 Contact management
Aethera TheKompany.com
http://www.thekompany.com/projects/aethera/
FOSS-GPL Linux, Mac OS-X, Windows
2001 2005 PIM, PDR, Messaging and Groupware
Agenda, Lotus
S. Jerrold Kaplan, Mitchell D. Kapor, Edward J. Belove, Richard A. Landsman, and Todd R. Drake. Lotus.
http://en.wikipedia.org/wiki/Lotus_Agenda
proprietary software
DOS 1992 PIM. Still in use - very powerful data handling. See also Chandler, which in some ways is a successor to Agenda.
AIM 96 Accu Knowledge,
http://www.akinet.com/aimf
proprietary software
Windows 1996 PIM
Inc. f.htm
Ajour Calendar / PIM
Micro-Sys ApS, DK
http://www.micro-sys.dk/products/ajour/
FOSS 2004 Personal information manager - manage appointments and events. Ajour is an easy-to-use personal information manager (PIM). Use it as a combined calendar, diary, organizer, and reminder. Keep track of dates, appointments, annual events like birthdays, to-do items, and notes. You can also dial phone numbers stored in your data. Calendar and organizer - dates and to-do items reminder .
All-in-1 Personal Organizer
Bruno Cancellieri
http://www.cancellieri.org/pmo_index.htm
Shareware Windows 2005 Data stored in Microsoft Access relational database
All-in-1 Personal Organizer (APO) is a personal information manager (PIM) with three main uses. First, it's a tool for managing any kind of personal information such as tasks, events, contacts, notes, file links, Web links and executable key scripts. Second, it's an image viewer. Finally, it can be used as a mind stimulator useful for reflection, self-analysis and self-improvement.
AZZ Cardfile
AZZ Cardfile team: Rytis Zumbakis, Antanas Zdramys
http://www.azzcardfile.com/
Shareware Windows 2007 Data stored in XML.
AZZ Cardfile is a Windows program that helps manage any personal information like addresses, phone numbers, references, notes, recipes. It can serve as personal organizer, contact manager, address book, rolodex, personal information manager (PIM) or small database software. Replaces Microsoft Cardfile. Modern customizable user interface, ease of use and extensive features makes this information management software equally suitable for business office or home use.
Backflip Backflip, Inc http://www.backflip.com/login.ihtml
Web application
Backflip gets you back to the good stuff. It's the easiest way to save and share important things you see on the Web. With Backflip's organization and powerful search, you'll never lose anything interesting again. You can use it from any computer. And it's totally free. How does it work? As you discover interesting Web pages, use the Backflip it! button to save them and Backflip will organize them for you. Then, simply go to your Backflip account and you'll find all of your favourite pages filed in your personal directory -- which you can access from any computer.
Backpack 37signals http://www.backpackit.com/?source=37s+home
Web application
Intranet, group calendar, organizer Share info, schedules, documents, and to-dos across your company, group, or organization.
Basecamp 37signals http://www.basecamphq.com/?source=37s+home
Web application
Project management and collaboration Collaborate with your team and clients. Schedules, tasks, files, messages, and more.
Bifrost Inbox Organizer
Olle Bälter, Candace L Sidner
http://delivery.acm.org/10.1145/580000/572034/p111-balter.pdf?key1=572034&key2=9273163801&coll=Portal&dl=GUIDE&CFID=21022367&CFTOKEN=33191323
Research prototype
Inbox organiser
BitPim http://www.bitpim.org/
FOSS for CDMA phones: Linux, Mac OS-X, Windows
BitPim is a program that allows you to view and manipulate data on many CDMA phones from LG, Samsung, Sanyo and other manufacturers. This includes the PhoneBook, Calendar, WallPapers, RingTones (functionality varies by phone) and the Filesystem for most Qualcomm CDMA chipset based phones.
Blackberry Research In Motion Limited
http://www.blackberry.net/index.shtml
Smartphone
Campfire 37signals http://www.campfirenow.com/?source=37s+home
Web application
Real-time group chat for business It's like instant messaging, but optimized for groups. Especially great for remote teams.
Chandler OSAF http://chandlerproject.org/
FOSS Linux, Mac OS-X, Windows XP clients and web application
2008 Collaborative information management; an open source Note-to-Self Organizer. It features calendaring, task and note management and consists of a desktop application, web application and a free sharing and back-up service called Chandler Hub.
Citadel FOSS-GPL groupware/BBS for all POSIX-based operating systems
Bulletin board system
Contact Plus Personal 2.7 c
Contact Plus Corporation
http://www.contactplus.com/products/personal/permain.htm
Contactizer proprietary software
Mac OS Calendar and contact management within groups. Previously known as "OD4Contact"
ContactMap
Bonnie A. Nard, Steve Whittaker, Ellen Isaacs, Mike Creech, Jeff Johnson, and John Hainsworth.
http://www.izix.com/pro/lightweight/contactmap.php
C-Organizer Pro 2.4
CSoftLab
http://www.csoftlab.com/C-OrganizerPro.html
CyberDesk Andrew Wood, Anind Dey, and Gregory D. Abowd.
http://www.cc.gatech.edu/fce/cyberdesk/
Daily vX Professional
DEVONtechnologies
http://www.devon-technologies.com/products/devonthink/index.html
proprietary software
Mac OS journal and note taking software
Data Mountain
George Robertson, Mary Czerwinski, Kevin Larson, Daniel C. Robbins, David Thiel, and Maarten van Dantzich
http://research.microsoft.com/~ggr/
DayPoint Professional
Front Office Communications, Inc
http://www.daypoint.com/Products/DayPointProf.asp
DevonThink Professional
DEVONtechnologies
http://www.devon-technologies.com/products/devonthink/index.html
proprietary software
Mac OS
do-Organizer GemX proprietary software
Microsoft Windows
Contacts, appointments, to-dos, mind mapping, bookmarks
Duck Software: Organizer Software
Technological Solutions, Inc. (TSI): Duck Software
http://www.ducksoftware.com/
Dynomite Lynn D. Wilcox, Bill N. Schilit, and Nitin “Nick” Sawhney.
http://seattleweb.intel-research.net/people/schilit/ldw.pdf
Ecco Netmanage
http://users.rcn.com/wussery/
proprietary software now free to download
Microsoft Windows
Hierarchic outline with assignment to multiple folders, one parent per folder. Information is presented in a dingle pane with a folder grid.
Intranet, group calendar, organizer. Share info, schedules, documents, and to-dos across your company, group, or organization.
Email Reminders Pro for Outlook
Sperry Software
http://www.sperrysoftware.com/Outlook-EmailReminders-Pro.asp
EndNote ISI ResearchSoft
http://www.endnote.com/enhome.asp
Enfish Enfish/Louise Wannier
http://www.enfish.com/
Entourage, Microsoft
proprietary software
Mac OS
Essential PIM Pro
available as proprietary software or free software
Microsoft Windows
Eudora QUALCOMM Incorporated
http://www.eudora.com/
EverNote proprietary software
Microsoft Windows
Evolution Ximian Inc
http://www.ximian.com/products/evolution/features.html#pim
Evolution, Novell
Novell FOSS-GPL Linux/Unix/GNOME
FeedDemon RSS Reader for Windows
Nick Bradbury, Bradbury Software, LLC
http://www.bradsoft.com/feeddemon/index.asp
FileMaker proprietary software
Microsoft Windows
Formation RadicalBreeze Software
proprietary software
Mac OS Idea and personal information organizer
Fusionpoint Stick-e-NotePad R.I.M.
Fusionpoint Technologies Corp.
http://www.fusionpointtech.com/
GetOrganized99
Web application
GNOME PIM
GNOME Foundation
http://www.gnome.org/gnome-office/gnome-pim.shtml
Gnowsis Knowledge Management Lab of the DFKI
http://www.gnowsis.org/
Research prototype
2006 RDF; Semantic web
GoalPro Success Studios Corporation
http://www.goalpro.com/
GoBinder proprietary software
Microsoft Windows
Golden Retriever
N-Liter Enterprise
http://www.n-liter.com/
GoldenSectionNotes
The Golden Section Labs, Pacific Business Centre
http://www.tgslabs.com/eng/gsnotes/
GoldMine proprietary software
Microsoft Windows
Google Calendar
Web application
Google Notebook
Web application
GrandView Symantec (John Friend)
proprietary software
Microsoft Windows
Historically-important outliner program
Haystack Computer Science & Artificial Intelligence Laboratory, Massachusetts Institute of Technology
http://groups.csail.mit.edu/haystack/
FOSS-MIT Licence
all operating systems with POSIX and Java
RDF; Semantic web
Highrise 37signals http://www.highrisehq.com/?source=37s+home
Web application
Online contact manager and simple CRM Keep track of who your business talks to, what was said, and what to do next.
HTP To-do List
proprietary software
Microsoft Windows
Hula FOSS
iCal proprietary software
Mac OS
Idea Graph Danny Ayers
http://www.ideagraph.net/
Ideaspace proprietary software
Microsoft Windows; Mac OS
ikeepbookmarks.com
Software Designs Development Corp
http://www.ikeepbookmarks.com/
Info Select Micro Logic http://www.miclog.com/is/isdesc.htm
proprietary software
Microsoft Windows
InfoRecall http://www.phantech.com/
proprietary software
Microsoft Windows; Mac OS
Outline type tree structure so you can easily categorize your information.
Inspiration Micro Logic http://www.inspiration.com/index.cfm
proprietary software
Microsoft Windows
Inspiration is a powerful, easy-to-use tool that allows professionals to visually organize and communicate complex topics. Visual diagrams clarify patterns, interrelationships and interdependencies. They also stimulate creative thinking.
Internet Organizer Deluxe
PrimaSoft PC, Inc.
http://www.primasoft.com/deluxeprg/inodx.htm
Ishmail Jonathan Helfman, Charles Isbell, Brian Amento and Gavin Bell.
http://ishmail.sourceforge.net/
JetTask proprietary software
Microsoft Windows
Jot+ Notes King Stairs Software
http://kingstairs.com/
Shareware Microsoft Windows
1993 3.4.3 (31 May 2007); 3.6.0 beta 25/06/2008
Hierarchical note manager, outliner and cardfile
KDE (K Desktop Environment) Office
Richard Moore, Ben Hummon
http://www.kde.org/
Linux/KDE
Keep It Together
proprietary software
Mac OS
Keynote Marek Jedlinski
http://www.tranglos.com/free/keynote_main.html
Kontact FOSS Linux/KDE
LeaderCode Personal Information Manager
proprietary software
Microsoft Windows
Lifestreams / Scopeware Vision
Eric Freeman, David Gelernter and Scott Fertig
http://www.scopeware.com
LinkaGoGo linkaGoGo, DBA
http://www.linkagogo.com/
Livelink OpenLink Software Inc.
http://www.opentext.com/livelink
Lookout Lookout Software (Eric Hahn and Mike Belshe)
http://www.lookoutsoft.com/
Maple Crystal Office Systems
http://crystaloffice.com/maple/
proprietary software
Microsoft Windows
Two-pane hierarchical (tree) organiser
MDE Info Handler
MDE Software (Dr. Manfred Derenbach)
http://www.mdesoft.com/eng.htm
Meeting Maker
proprietary software
Microsoft Windows, Mac OS, Solaris, and Linux
MindManager Mindjet http://www.mindjet.com/Default.aspx
proprietary software
Mind map MindJet Connect introduces group collaboration facilities.
More Symantec proprietary software
Mac June 1986 Outliner A historically-important outliner program for the Mac
Mozilla Calendar Project
FOSS-MPL Linux, Windows
My Personal Diary
CAM Development.
http://www.camdevelopment.com/pim/my_personal_diary/default.htm
MyInfo proprietary software
Microsoft Windows
MyLibrary RENCorp
http://www.cribbagepegs.com/myuniqueprograms.html
MyLifeBits Microsoft Bay Area Research Center, Media Presence Group
http://research.microsoft.com/research/barc/MediaPresence/MyLifeBits.aspx
MyLifeOrganized
proprietary software
Microsoft Windows
myNotes proprietary software
Mac OS
MyYahoo Yahoo! Inc http://my.yahoo.com/
Net Snippets
Net Snippets Ltd.
http://www.netsnippets.com/
Newdocs Manuel Arriaga
http://m-arriaga.net/software/newdocms/
Notes, Lotus http://www-142.ibm.com/software/sw-lotus/lotus/general.nsf/wdocs/lotusprods
proprietary software
Microsoft Windows
Allows all the major information organization techniques to be used in one information space: outlines, graphics, hypertext links, relational databases, free (rich) text, expanding/collapsing reports, collapsing rich text sections, tabbed notebooks (like wizards) and tables. No specific PIM functionality but has been used as the basis for effective GIM in many business organisations.
Now Up-to-Date & Contact
proprietary software
Mac OS, Windows
Office Accelerator
Baseline Data Systems (BDS)
http://baselineconnect.com/product.html
Omea proprietary software
Microsoft Windows
OneNote, Microsoft Office
Microsoft http://www.microsoft.com/office/onenote/prodinfo/overview.mspx
proprietary software
Microsoft Windows
Notable for the multiple ways in which information can be presented: an excellent note-taking environment. Poorer in terms of inter-item linking – integration is left to the mind of the user.
Online FileCabinet
Mike Giles http://www.furl.net
Organizer Deluxe Series
PrimaSoft PC, Inc.
http://www.primasoft.com/shware.htm
Organizer, Lotus
proprietary software
Microsoft Windows
Outlook, Microsoft Office
Microsoft Corp.
http://www.microsoft.com/
Outlook, SharePoint, InfoPath, Groove : Microsoft Office
Microsoft http://office.microsoft.com/fr-fr/products/FX100487411036.aspx?pid=CL100571081036
proprietary software
Microsoft Windows
Palm Palm Inc. http://www.palm.com/home.html
proprietary software
Mac OS, Windows
Paper Tiger Harold Tyler
http://www.taylorontime.com/ptigersw.html
PDO (Personal Document Organizer)
Insoft Technologies Inc
http://www.insoft-tech.com/personal%20document%20organizer.htm
Pegasus Mail
David Harris http://www.pmail.com/index.htm
Pepys and Video Diary
Michael G. Lamming, M. A. Eldridge, M. Flynn, and William M. Newman
http://www.xrce.xerox.com/programs/mds/past-projects/video-diary.html
PersonalBrain
proprietary software
Microsoft Windows, Mac OS, Linux
PlanPlus FranklinCovey
http://www.franklincovey.com
Plaxo Web application
Powerbookmarks
Li, W-S., Q. Vu, E. Chang, D. Agrawal, Y. Hara, and H. Takano
http://www.teamxweb.com/doc/relatedWork.shtml#PowerBookmarks
Powermarks
Kaylon Technologies
http://www.kaylon.com/power.html
Presto Paul Dourish, W. Keith Edwards, Anthony Lamarca, and Michael Salisbury
http://www2.parc.com/csl/projects/placeless/papers/tochi-presto.pdf
Prophet 2004: Contact manager
Avidian Technologies
http://www.avidian.com/avidian_product.aspx?n=2
Proteus Thomas Erickson
http://www.outliners.com/discuss/msgReader$648?mode=day
Queries-R-Links (QRL)
Nipon Charoenkitkarn, Jim Tam, Mark H. Chignell, and Gene Golovchinsky.
http://www.cs.brown.edu/memex/projects.html
Quick2Do 1.0.2
CodeGrid Software.
http://codegrid.tripod.com/
Reader's Helper,The
Jamey Graham, RICOH Research Center, Palo Alto
http://rii.ricoh.com/~jamey
Remember The Milk
Web application
Scopeware Vision
Scopeware, Inc.
http://www.scopeware.com
Secure Notes Organizer
SecureAction Research, LLC.
http://www.secureaction.com/notes/
Semantic Blogging for Bibliography Management
HP Lab http://www.hpl.hp.com/semweb/biblio.htm
Simple Diary
Aaron Whiffin http://www.webbedfeetuk.com/diary/
SixDegree Creo Inc.
http://www2.creo.com/sixdegrees/
Softwrights Reminder ™
Softwrights, Inc.
http://www.softwrights.com/rmenu.htm
SQLNotes proprietary software
Microsoft Windows
Hierarchic outline with assignment to multiple folders, multiple parents per folder. Information is presented in a dual pane with a folder grid
Not yet released. Author has this tool in beta test version.
Stickies Tom Revell http://www.btinternet.com/~tom.revell
Student Online
Student Online Inc.
http://www.studentonline.com/
Stuff I've Seen
Susan Dumais http://research.microsoft.com/~sdumais/
SurfSaver askSam Systems
http://www.surfsaver.com/
Sync4jMozilla FOSS
Taskmaster Bellotti, Ducheneaut, Howard, Smith
http://peach.mie.utoronto.ca/people/jacek/emailresearch/CSCW2002/submissions/PARC-Taskmaster%20position%20paper.pdf
TaskView Gwizdka
http://www.cas.ibm.com/archives/2002/papers/cascon02/htm/francais/abs/gwizdka.htm
Tempus Fugit
Daniel A. Ford, Joann Ruvolo, Stefan Edlund, Jussi Myllymaki, James Kaufman, Jared Jackson, and Martin Gerlach.
http://www.research.ibm.com/people/j/jussi/papers/TF/TF-CIKM2001.pdf
THE (The Human Environment)
Jef Raskin
http://humane.sourceforge.net/the/index.html
The Brain TheBrain Technologies Corporation
http://www.thebrain.com/Default.htm
Time & Chaos
Chaos Software (formerly iSBiSTER International, Inc.)
http://www.isbister.com/chaos32.html
TimeStore Byron Long, Kelvin S. Yiu, Ronald Baecker, and Nancy Silver
http://www.dgp.toronto.edu/people/byron/papers/timestore.html
Tinderbox Eastgate Systems
http://www.treepad.com/
Proprietary software
Mac XML Tinderbox is a personal content assistant that helps you visualize, analyze, and share your notes.
TreePad! Freebyte http://www.treepad.com/
Proprietary software
Tree Structured data-management
Twine Radar Networks http://www.twine.com/
Web application
Automatically organizes information by learning about user interests and making connections and recommendations
A commercial service currently 2008 in beta-test which is claimed to be the first commercially-oriented implementation of semantic web techniques.
Ultra Recall Kinook proprietary software
Microsoft Windows
Link items in multiple locations, create internal links between items, and link to other web pages and files
Umea (User-Monitoring Environment for Activities)
Victor Kaptelinin
http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/k/Kaptelinin:Victor.html
VIP Organizer
proprietary software
Microsoft Windows
Visimap CoCo proprietary software
Microsoft Windows
Mind map
Vombato Organizer
proprietary software
Microsoft Windows
VVKB (Visual Knowledge Builder)
Dr. Frank Shipman, Dr. Haowei Hsieh
http://www.csdl.tamu.edu/VKB/
Wayback Machine
The Internet Archive
http://www.archive.org/web/web.php
WebWatcher
T. Joachims, D. Freitag, T. Mitchell
http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-6/web-agent/www/project-home.html
Windows Calendar
proprietary software
Microsoft Windows
WinOrganizer 2.4
The Golden Section Labs, Pacific Business Centre
http://www.tgslabs.com/eng/winorganizer/
Wintermute David Jeske and Scott Hassan
http://neomason.com/wm.cst
WorkplaceMirror
Richard Boardman, Robert Spence, M. Angela Sasse
http://www.iis.ee.ic.ac.uk/~rick/research/pubs/struggle-hcii2003.pdf
Wrike Web application
XLibris Bill N. Schilit, Gene Golovchinsky, and Morgan N. Price
http://www.fxpal.com/?id=xlibris
Yahoo! Calendar
Web application
Yawas Laurent Denoue and Laurence Vignollet.
http://www.fxpal.com/people/denoue/yawas/
YellowPen YellowPen, Inc. - Steve Brown, John Leibovitz, and Steve Robinsion
http://www.yellowpen.com/ypsite3/product/ypoverview.htm
Yojimbo proprietary software
Mac OS
Zimbra FOSS Web application
Zoot Zoot Software http://www.zootsoftware.com/index.html
proprietary software
Microsoft Windows
2008 (version 5.1)
Zoot offers a highly efficient process for collecting, classifying and prioritizing information so that it can be viewed in meaningful timeframes and contexts.
3.5. The functionality associated with PIMs (extract) The table which follows is an extract from a table compiled by the University of Washington PIM research group (see for example Keeping Found Things
Found 2008). Unfortunately the table has not been updated recently. Our subsequent research will extend and consolidate this classification as necessary.
Table 3: The functionality associated with PIMs – extract from our research spreadsheet
Product A
nn
ota
tio
ns a
nd
n
ote
-takin
g
Co
nta
ct
man
ag
em
en
t
Do
cu
men
t m
an
ag
er
Em
ail-c
en
tred
Gen
era
l P
IM
Hyp
ert
ext
au
tho
rin
g t
oo
l
Lif
e p
lan
ner
Meeti
ng
pla
nn
er
Mo
bil
e/P
DA
d
evic
es
Org
an
izati
on
of
han
dw
ritt
en
no
tes
Pap
er
org
an
izer
Pro
ject-
cen
tred
Read
ing
an
d
Su
mm
ari
zati
on
Reco
rd e
very
thin
g
Searc
h a
cro
ss
em
ail,
e-d
ocs a
nd
o
ther
info
rmati
on
form
s
Sem
i-str
uctu
red
o
rgan
izati
on
of
sm
all p
ieces o
f
info
rmati
on
(p
ho
ne
nu
mb
ers
, err
an
ds
to r
un
, b
oo
ks t
o
read
..)
Task m
an
ag
em
en
t
Vir
tual 3-d
V
isu
ali
zati
on
of
info
rmati
on
re
lati
on
sh
ips
Web
org
an
izer
Above & Beyond 2000 PRO y
ACT! y
Agenda, Lotus y
AIM 96 y
All-in-1 Personal Organizer y
AmikaFreedom y
Aquanet y
AZZ Cardfile y
Backflip
Chandler y
Haystack y
Info Select y
(many programs omitted here)
Zoot y
This list is based on that maintained by the University of Washington and found at http://pim.ischool.washington.edu/tools.htm
3.6. An initial classification of personal information and functionalities Before any further classification can be attempted, it is necessary to revisit and extend the list
of functionalities now offered and proposed.
3.6.1. Data: Some useful personal information
After studying several extant PIMs, we suggest the following basic classification of
the information they store:
General information
Lists
(a) Ad hoc
Such as shopping lists.
(b) Repeating
E.g. Inventories, Christmas card lists, etc.
Semi-structured organization of small pieces of information (phone numbers, errands to run, books to read...)
Own information
Examples of such information include:
(a) Passwords
(b) Passport
(c) Health
(d) Social security
Contact management, address books, etc.: details such as
Names
Affiliations, e.g. companies, households
Contact mechanisms
Addresses
Personal details
Qualifications
Competences
Calendar
Appointments and meetings
Significant calendar dates:
(a) Birthdays
(b) Anniversaries
Events, alerts, reminders
Meeting planner / scheduler
(a) Within hierarchies (permanent teams within organisations) and “projects”, that is, people who come together in ad hoc groups in order to complete tasks small and large
Diary / journal
Document management
Organization of handwritten notes
Paper organizer
Reading and Summarization
Message management
Email and instant message archives
Fax communications and voicemail
RSS/Atom feeds
Web organizer
Resource management
Paper filing and archives
Computer files
Photos
Books, CDs, etc.
Key documents
Copy management
To Dos: task management for self and others
Day planning
Reminders
Alerts
Project management: project management features
3.6.2. Processes associated with personal information
Among the processes associated with personal information are these:
Diarising: record “everything”
See Gemmell, Jim & Gordon Bell & Roger Lueder 2006. This uses functions
such as:
Personal notes/journal, annotations and note-taking in multiple media
Transcription between media, e.g. handwriting recognition, voice recognition
Search across email, e-docs and other information forms; across multiple media types
Hypertext authoring: making lists which can refer to other items in the same or linked lists; and making references to external (web-based) items
Synchronisation between computers: Mobile/PDA devices and inter-device synchronisation
Coordination between people in hierarchies and in projects
Visualisation of information resources
Graphing, charting, mind maps etc.
Services and service level management
3.6.3. Towards Personal Knowledge Management and knowledge creation
There is emerging support in some PIMs for:
Classification and contents
User-specified keyword classification of information structured in accordance with user design
Rule-based auto-classification
Tagging
Semantic web approaches, such as semantic desktop
We also observe the desirability of learning from library information science, and
encouraging
Thesauri
A lexicon of terms
3.7. Further candidate data management approaches:
XML documents Before we can better understand these emerging functionalities, it is necessary to
extend the list of candidate data management approaches. We start with XML.
3.7.1. What is XML?
The Extensible Markup Language (XML) is a general-purpose specification for
creating custom markup languages. It is itself a simplified subset of the Standard
Generalized Markup Language (SGML), and is designed to be relatively human-
legible. In some ways it is a successor to, and it certainly follows on from, HTML
(HyperText Markup Language), the language in which web pages have been
expressed since the early 1990s.
XML (eXtensible Markup Language) is a specification developed by the W3C (World
Wide Web Consortium). XML became a formal specification in February 1998, and is
a subset of SGML designed for use on the Internet. Like SGML, XML is a
metalanguage that lets users define their own descriptive markup languages. With
XML, it is possible to create customized tags to surpass the functionality of HTML.
XML is described at XML 2008. One of the motivations for the design of XML is
clearly distinguishing between the content of data, its structure and its presentation.
This point is further developed below in section 3.9.4.
3.7.2. What is SGML?
SGML (Standard Generalized Markup Language) is a language for defining markup
languages such as HTML and for specifying the rules for tagging elements in a
document. SGML itself is not a markup language; rather, it is a language to create
markup languages. SGML supports the definition of markup languages that are
hardware- and software-independent. SGML was developed and standardized by the
International Organization for Standardization (ISO), which published it in 1986 (ISO
8879). Because of SGML's complexity, HTML and XML were developed as
simplified subsets of SGML for use on the Internet.
For more information, see: SGML 2008.
3.7.3. Why is XML important?
Extensible Markup Language (XML) is a simple, very flexible text format derived
from SGML. Originally designed to meet the challenges of large-scale electronic
publishing, XML is also playing an increasingly important role in the exchange of a
wide variety of data on the Web and elsewhere.
3.7.4. How does XML compare with other data management approaches?
XML is an excellent data interchange mechanism, and is very widely implemented. It
is verbose and less efficient than SQL for database-to-database exchanges. But it is
unique in forming the basis for web services and service oriented computing; and as
the basis for the Semantic Web.
3.7.5. Applicability of XML to personal information management
Attempts have been made to build open source PIMs which store information as
XML. A PIM widely used on the Apple Mac platform is Tinderbox, which uses XML
as its internal data storage format.
XML is a metalanguage, used in the specification of target languages. As such, it has
been used to create OPML Outline Processor Markup Language, described at
Wikipedia OPML 2008 in these terms:
―
OPML (Outline Processor Markup Language) is an XML format for
outlines. Originally developed by Radio UserLand as a native file format for
an outliner application, it has since been adopted for other uses, the most
common being to exchange lists of web feeds between web feed aggregators.
The OPML specification defines an outline as a hierarchical, ordered list of
arbitrary elements. The specification is fairly open which makes it suitable
for many types of list data.
‖
Perhaps the main significance of XML is the fact that it has been used as the basis for
web services, service oriented computing and the Semantic Web.
3.8. Further candidate data management approaches: RDF, the basis of a semantic web The Resource Description Framework (RDF) integrates a variety of applications from library
catalogues and world-wide directories to syndication and aggregation of news, software, and
content to personal collections of music, photos, and events using XML as an interchange
syntax. The RDF specifications provide a lightweight ontology system to support the exchange
of knowledge on the Web.
See section 3.9 for a fuller description.
XML has been characterised as a meta-language, and RDF as both meta-language and as meta-
data.
3.9. Ontology and the Semantic Web: Towards Web 3.0? What is the applicability of semantic web approaches to personal information management?
3.9.1. Search or classify?
As we work, we find useful ―stuff‖ and store it – hopefully in a way in which we can
find it again afterwards! (See Fichter, D. 2004).
In a Googled world, there is an increasing trend (and temptation) to invest less in
classification and to trust to the raw power of technology to find things when we want
them. And this approach is not without its merits. Certainly the ability to search the
contents of a PC hard drive using Google Desktop is a frequent lifesaver. But it would
be a rash person indeed who no longer imposed any folder structure on the tens or
hundreds of thousands of files which populate a typical knowledge-worker‘s PC. An
earlier generation recognised the limitations of search in recognising the trade-off
which exists between relevance and retrieval. Google appears to solve that problem by
ordering the results of searches in a ―sensible‖ way – but how sensible is the outcome,
and how sensitive is it to the preferences and fads of earlier searchers?
Ontologies or classification schemes are therefore still necessary in many contexts
(see Chandrasekaran et al 2008) – BUT one person‘s ontology is not necessarily the
best choice for another.
There are areas in which some level of standardisation is beginning to emerge. They
include contact data (e.g. the VCF data exchange format) and calendars. Few current
PIMs use these standards, however.
3.9.2. What is the Semantic Web?
The semantic web is an evolving extension of the World Wide Web in which web
content can be expressed not only in natural language, but also in a form that can be
understood, interpreted and used by software agents, thus permitting them to find,
share and integrate information more easily. It derives from W3C director Tim
Berners-Lee's vision of the Web as a universal medium for data, information, and
knowledge exchange
At its core, the semantic web comprises a philosophy, a set of design principles,
collaborative working groups, and a variety of enabling technologies. Some elements
of the semantic web are expressed as prospective future possibilities that have yet to
be implemented or realized. Other elements of the semantic web are expressed in
formal specifications. Some of these include Resource Description Framework (RDF),
a variety of data interchange formats (e.g. RDF/XML, N3, Turtle, N-Triples), and
notations such as RDF Schema (RDFS) and the Web Ontology Language (OWL). All of these are intended to formally describe concepts, terms, and relationships within a
given knowledge domain.
3.9.3. Purpose of the semantic web
Humans are capable of using the Web to carry out tasks such as finding the Finnish
word for "car", to reserve a library book, or to search for the cheapest DVD and buy it.
However, a computer cannot accomplish the same tasks without human direction
because web pages are designed to be read by people, not machines. The semantic
web is a vision of information that is understandable by computers, so that they can
perform more of the tedium involved in finding, sharing and combining information
on the web.
For example, a computer might be instructed to list the prices of flat screen HDTVs
larger than 40 inches with 1080p resolution at shops in the nearest town that are open
until 8pm on Tuesday evenings. To do this today requires search engines that are
individually tailored to every website being searched. The semantic web provides a
common standard (RDF) for websites to publish the relevant information in a more
readily machine-processable and integratable form.
Tim Berners-Lee (Berners-Lee 1998) originally expressed the vision of the semantic
web as follows:
―
I have a dream for the Web [in which computers] become capable of
analyzing all the data on the Web – the content, links, and transactions
between people and computers. A ‗Semantic Web‘, which should make this
possible, has yet to emerge, but when it does, the day-to-day mechanisms of
trade, bureaucracy and our daily lives will be handled by machines talking to machines. The ‗intelligent agents‘ people have touted for ages will finally
materialize.
‖
3.9.4. Architectural principles
The principle of separation of concerns (originally coined by Dijkstra, Edsger W.
(1982) and applied by him to the design of computer programs) has been extended in
many directions, notably in suggesting that the structure, content and presentation of data should wherever possible be kept separate. This is a design motivation
highlighted for example by the World Wide Web consortium in a discussion of
Separation of Content, Presentation, and Interaction in the architecture of the world
wide web: for example, in XML and XML-derived languages (W3C 2004).
3.9.5. Realisation: going beyond the hypertext web
Markup
The files on a typical computer can be loosely divided into documents and
data. Documents, like mail messages, reports and brochures, are read by
humans. Data, like calendars, address books, playlists and spreadsheets, are
presented using an application program which lets them be viewed, searched
and combined in many ways.
Currently, the World Wide Web is based mainly on documents written in
Hypertext Markup Language (HTML), a markup convention that is used for
coding a body of text interspersed with multimedia objects such as images
and interactive forms. The semantic web involves publishing the data in a
language, Resource Description Framework (RDF), specifically for data, so
that it can be manipulated and combined just as can data files on a local
computer.
The HTML language describes documents and the links between them. RDF,
by contrast, describes arbitrary things such as people, meetings, and aircraft
parts.
For example, with HTML and a tool to render it (perhaps Web browser
software, perhaps another user agent), one can create and present a page that
lists items for sale. The HTML of this catalogue page can make simple,
document-level assertions such as "this document's title is 'Widget
Superstore'". But there is no capability within the HTML itself to
unambiguously assert that, say, item number X586172 is an Acme Gizmo
with a retail price of €199, or that it is a consumer product. Rather, HTML can only say that the span of text "X586172" is something that should be
positioned near "Acme Gizmo" and "€199", etc. There is no way to say "this
is a catalogue" or even to establish that "Acme Gizmo" is a kind of title or
that "€199" is a price. There is also no way to express that these pieces of
information are bound together in describing a discrete item, distinct from
other items perhaps listed on the page.
Descriptive, and extensible
The semantic web addresses this shortcoming, using the descriptive
technologies Resource Description Framework (RDF) and Web Ontology
Language (OWL), and the data-centric, customizable Extensible Markup
Language (XML). These technologies are combined in order to provide
descriptions that supplement or replace the content of Web documents. Thus,
content may manifest as descriptive data stored in Web-accessible databases,
or as markup within documents (particularly, in Extensible HTML
(XHTML) interspersed with XML, or, more often, purely in XML, with
layout/rendering cues stored separately). The machine-readable descriptions enable content managers to add meaning to the content, i.e. to describe the
structure of the knowledge we have about that content. This way the
machine can process knowledge itself, instead of text, using processes
similar to human deductive reasoning and inference, thereby obtaining more
meaningful results and facilitating automated information gathering and
research by computers.
XML, XML Schema, RDF, OWL, SPARQL: the W3C Semantic Web Layer Cake
The semantic web comprises the standards and tools of XML, XML Schema,
RDF, RDF Schema and OWL. The OWL Web Ontology Language
Overview describes the function and relationship of each of these
components of the semantic web:
XML provides a surface syntax for structured documents, but imposes no semantic constraints on the meaning of these documents.
XML Schema is a language for restricting the structure and content elements of XML documents.
RDF is a simple data model for referring to objects ("resources") and how they are related. An RDF-based model can be represented in XML syntax.
RDF Schema is a vocabulary for describing properties and classes of RDF resources, with a semantics for generalization-hierarchies of such properties and classes.
OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry) and enumerated classes.
SPARQL is a protocol and query language for semantic web data sources.
Enhancing the usability and usefulness of the Web and its interconnected resources might be achieved by:
Servers which expose existing data systems using the RDF and SPARQL standards. Many converters to RDF exist from different applications. Relational databases are an important source. The semantic web server attaches to the existing system without affecting its operation.
Documents "marked up" with semantic information (an extension of the HTML <meta> tags used in today's Web pages to supply information for Web search engines using web crawlers). This could be machine-understandable information about the human-understandable content of the document (such as the creator, title, description, etc., of the document) or it could be purely metadata representing a set of facts (such as resources and services elsewhere in the site). (Note that anything that can be identified with a Uniform Resource Identifier (URI) can be described, so the semantic web can reason about animals, people, places, ideas, etc.) Semantic markup is often generated automatically, rather than manually.
Common metadata vocabularies (ontologies) and maps between vocabularies that allow document creators to know how to mark up their documents so that agents can use the information in the supplied metadata (so that Author in the sense of 'the Author of the page' won't be confused with Author in the sense of a book that is the subject of a book review).
Automated agents to perform tasks for users of the semantic web using this data
Web-based services (often with agents of their own) to supply information specifically to agents (for example, a Trust service that an agent could ask if some online store has a history of poor service or spamming).
An Issue: whose ontology?
If we accept the necessity for imposing some sort of classification
mechanism to achieve accuracy and precision in searching for information,
the next question which inevitably arises is ―whose ontology shall we
adopt?‖. We can identify three broad and overlapping alternatives:
Standardisation by committee (or by employer): top-down imposition
This is frequently done within communities of experts, such as
pharmacists or medical practitioners.
Emergent ontology - ontologies shared between workers in small, often virtual, groups: bottom-up conceptualisation
This situation is common in areas of fast-changing technology or
practice. A common vocabulary and classification system
―emerges‖ and almost imposes itself. Evolution, when it occurs, is
ad hoc.
Specialist programs which recognise or implement user-defined ontology
E.g. Ideaspace 2008.
3.9.6. Semantic Web: current state of the art
Large-scale research prototypes aimed at the corporate level are beginning to emerge.
Their implementation and use is fraught with practical and conceptual difficulties. The
best-known example to date is MIT‘s Simile project (Simile 2008). See also Gnowsis
2008.
3.9.7. Semantic desktop: the semantic Web represented at the small-group level
Introduction
In computer science, the Semantic Desktop is a collective term for ideas
related to changing a computer‘s user interface so that data is more easily
shared between different applications or tasks and so that data that once
could not be automatically processed by a computer could be. It also
encompasses some ideas about being able to automatically share information
between different people. This concept is very much related to the semantic
web but is distinct.
General description
The vision of the semantic desktop can be considered as a response to the
perceived problems of existing user interfaces. Firstly computers cannot get
a great deal of information about the content of files. For example suppose
one downloads a document by a particular author on a particular subject -
though the document will likely clearly indicate its subject, author, source
and possibly copyright information there is no way for the computer to
obtain this information or process it. This means the computer cannot search, filter or otherwise act upon the information as effectively as it otherwise
could. This is very much the problem that the semantic web is concerned
with.
Secondly there is the problem that information stored on a computer can
only be accessed or sorted in a way related to its format. For example e-
mails are stored separately to data files, and both have nothing to do with tasks, notes and planned activities that may be stored in a calendar program,
whilst contacts might be stored in another program, however all these forms
of information might simultaneously be relevant a necessary for a particular
task. Further even if data is all stored as part of the file system it is often
accessed with different applications, even very similar formats may need to
be accessed with different programs - for example a PDF, PostScript,
Microsoft Word and ASCII files are all opened in different programs despite
being essentially the same in content.
Related to this a user will often access a lot of data for the Internet which is
segregated from the data stored locally on the computer, being accessed
through a browser or other programs. As well as accessed data a user has to
share data, often through e-mail or separate file transfer programs.
The semantic desktop is an attempt to solve some or all of these problems by
changing the user interface.
Different interpretations of the semantic desktop
There are various interpretations of the semantic desktop. At its most limited
it might be interpreted as adding mechanisms for relating machine readable metadata to files. In a more extreme way it could be viewed as a complete
replacement to existing user interfaces, which unifies all forms of data and
provides a consistent single interface. There are many degrees between these
two depending on which of the above problems are being dealt with.
Relationship with the Semantic Web
The semantic web is mainly concerned with making machine readable
metadata to enable computers to process shared information, and the creation
of formats and standards related to this. As such the aims of allowing more
of a users data to be processed by a computer and allowing data to more
easily be shared could be considered as a subset of those of the semantic
web, but extended to a users local computer, rather than just files stored on
the internet.
However the aims of creating a unified interface and allowing data to be
accessed in a format independent way are not really the concerns of the
semantic web.
In practice most projects related to the semantic desktop make use of
semantic web protocols for storing their data. In particular RDF's concepts
are used, and the format itself is frequently used.
3.9.8. “Web 2.0”, Social Networking and personal and group
information management
Fichter (Fichter 2004) presents ―solutions to electronic problems in information
management: tools to solve problems such as collecting, organizing, searching and
sharing information online, simplify storing, keeping, searching and sharing Web
resources‖. Such applications tools are sometimes called social bookmark tools and
have many features in common with the older generation of Web-based bookmark
sites and personal information managers. Many want to accomplish tasks quickly and
easily when a useful online resource is available. Such tasks include bookmarking the
site with one click of the mouse and have the Internet domain name and page title
automatically populated into the appropriate fields, ready to edit, jotting down a
comment or description of the site, clipping out important excerpts, and filing it in
categories that are created for suitability, among others. The new breed of social
bookmarking applications offers more than just a universally accessible search and
store facility for links. Such applications assume that once a site is found, not only do
users want to share it with others but users want to discover other related sites and
people who are interested in the same topics.
3.9.9. Semantic wikis
An emerging approach which arguably combines the power of social networking and
formal knowledge representation is that of the so-called ―semantic wiki‖. Wikipedia -
Personal information management 2008 suggests that a semantic wiki is ―a wiki that
has an underlying model of the knowledge described in its pages. Regular wikis have
structured text and untyped hyperlinks (such as the links in this article). Semantic
wikis allow the ability to capture or identify further information about the pages (metadata) and their relations… semantic wikis try to … allow users to make their
internal knowledge more explicit and more formal, so that the information in a wiki
can be searched in better ways than just with keywords, offering queries similar to
structural databases.‖
3.9.10.Emerging commercial products
The first commercial products to exploit semantic web approaches are beginning to
appear. Radar Networks has recently introduced its Twine service. Radar Networks is
claiming to ―pioneer the mainstream adoption of the Semantic Web, or what is
sometimes called ‗Web 3.0‘‖ (see Twine 2008).
3.9.11.Implications
We can identify three basic approaches to identifying and/or creating more effective
personal information management in order to evaluate that effectiveness.
One is to create a unifying ―super-app‖: one program which does everything, bundling
the world into a super PIM/GIM. Two major research prototypes have emerged which
take this approach and use semantic web techniques (notably RDF and OWL). They
are MIT‘s Haystack (Haystack 2006) and the Gnowsis project (Gnowsis 2008).
A much more conventional ―super PIM‖ approach is being taken by a small
Californian start-up company called NeoTech systems with their SQLNotes product
(still on beta test at the time of writing). SQLNotes, should it ever work properly, is
close to a dream or ultimate ―power user‖ PIM, being based as it is on the decade-old
NetManage Ecco application‘s approach but very much better integrated with
Windows and Office. Information can be stored in an outline, in a spreadsheet-like
grid, or in rich text documents within a grid. SQLNotes even permits access to the
relational tables which store its data. However, it may well fall victim to its own
flexibility, because the flexibility is accompanied by a conceptual complexity which
makes its usefulness difficult to grasp and its power difficult to manage.
The second is to take a federating approach in which minimal assembly or composing
of emerging building blocks is undertaken: just sufficient to provide to a very small
community of users, tools of sufficient usefulness to permit the hypotheses of this
study to be investigated and evolved. Sauermann 2005 suggests a possible
architecture, from which this diagram is extracted to give a flavour for what may be
achievable and usable at some point in the next two to three years. The approach is
very interesting but is likely in practice to suffer from serious performance problems.
Figure 6: Semantic desktop architecture
SOURCE: Sauermann, Leo & Ansgar Bernardi & Andreas Dengel 2005
The work of two research centres is crucial in this context. One is the German
Research Centre for Artificial Intelligence DFKI Gmbh based in Keiserslauten.
Sauermann et al 2005 is a seminal paper in respect of its identification of the
components of the semantic desktop and necessary research directions. The second is
DERI notably at Galway in Ireland. DERI states that its mission is ―to exploit
semantics for people , organisations, and systems to collaborate and interoperate on a
global scale‖ (DERI 2008). Both institutions are involved in the NEPOMUK initiative: the DFKI Knowledge Management Department is in fact the coordinating
organisation. NEPOMUK (Networked Environment for Personalized, Ontology-based
Management of Unified Knowledge) aims to ―bring together researchers, industrial
software developers, and representative industrial users, to develop a comprehensive
solution for extending the personal desktop into a collaboration environment which
supports both the personal information management and the sharing and exchange
across social and organizational relations‖. The approach is technically very
interesting but also very challenging. Its huge potential merit is that it might unify
existing structured information already present on the desktop.
The third approach is consistent with the new phenomenon characterised in a recent
conference as the ―disappearing desktop‖ (see PIM 2008). Increasingly capable client
computers (typically smartphones rather than PCs, at least for a numerical majority of
the world‘s web users) will access semantic networks based on server computers. A
server based approach is typified in the Radar Networks Twine product mentioned
above in section 3.9.10. The biggest single architectural advantage of this approach is
that it makes the mutual recognition of ontological tags very much easier. The
corollary is that the approach might in practice favour the emergence of overbearing
―common‖ tagging schemes which are not, in fact so much emergent as imposed.
3.10.Data storage techniques and their associated metadata – second list
We are now in a position to extend our list of personal information storage approaches:
Technique Metadata Expressiveness and precision
XML The meaning of
an XML
document is
described in an
associated Data
Type Definition (DTD) or
Schema.
Potentially combines the strengths of
outlining and of relational database.
Generalised query languages are
emerging.
RDF and OWL RDF Schema Makes possible the expression of simple forms of knowledge (as opposed simply
to information), and supports processes
like:
User-specified keyword
classification of information
structured in accordance with
user design
Rule-based auto-classification
Table 4: Data storage techniques and their associated metadata – second list
We make the observation that these techniques are very powerful but largely or wholly
unapproachable by end-users in their current raw form.
4. Issues of usability and user acceptance
4.1. User frustrations and their origins Personal information management has not worked well so far for many computer users, no
doubt for a whole lot of reasons, among which may be factors such as these possible ―mini-
hypotheses‖ (more will follow as our research and that of others progress):
Data has no meaning except in context. Context gives meaning and removing context removes meaning. People need to be encouraged to use tools which preserve context and thence meaning.
Effective personal information management needs portable accessible computer resources that currently are not portable enough. Notebooks are heavy, expensive, dependant and lacking in autonomy. Smartphones aren’t smart!
Computers can assist knowledge management. However, structuring knowledge is often alien to the way people want to work.
4.2. Why people use computer-based PIM (and why they don’t): Some observations We can summarise the significance of what we have said so far as follows:
Different kinds of data are best represented in different, and often incompatible, forms
But this makes it difficult to search comprehensively.
No one single PIM approach will work for all groups of computer users. Some will prefer highly expressive, but difficult to query and to manage, general solutions. Others will prefer very packaged, very restrictive approaches which dictate what kinds of information are stored.
Perhaps many computer users might benefit from some of the more sophisticated approaches to
personal information which we have already outlined. Many, perhaps most, will not be able to
realise those benefits without knowledgeable ―hand-holding‖. Achieving a better understanding
of user pain when faced with the complexity of effective personal information management is a
major focus of our ongoing research. Perhaps then too it will be possible to design better
solutions.
4.3. Other issues with current personal information management
4.3.1. Integration
Data quickly risks becoming trapped in isolated islands once we start to use more than one platform on which to store it. A majority of mobile phone users do not exploit the
available synchronisation techniques to consolidate the contact information they have
on their phone with that which they store in a PC-based PIM. The result is
incoherence, lost or obsolete information, and associated user frustration.
Application suites such as Microsoft Office 2007 offer facilities for tying together,
integrating, data stored in different tools on different platforms – but their use implies a level of skill (in so-called end user programming) which is uncommon. Even people
who have the necessary skills would find it very difficult to justify the investment of
time and effort needed to build their own tailored environments.
These issues become more and more difficult with the passage of time. Some people
become locked to a combination of a specific mobile phone platform and a specific
PIM because they depend on the synchronisation facilities which exist between them.
4.3.2. User training and exhaustion
A situation in which people have to search for specific applications to undertake
specific aspects of their information management creates a need for awareness-raising,
for training and for self-learning. And there is a limit to the number of computer
applications which people can cope with; eventually an exhaustion sets in and people
resign themselves to a limited range of approaches.
4.3.3. User ontologies
Most researchers, and many business users and similar professional knowledge
workers, would benefit from being able to classify their information sources and
resources. Some PIMs impose a standard classification or ontology. A few allow
users to devise their own. Almost none allow the sharing of these classification
schemes or ontologies.
4.4. Evaluating user experience We recognise that we are guilty of generalisations and insufficiently-justified assertions in this
area. Further research is indicated. It will be hampered by the needs to:
Ask CURRENT users what their experiences have been and are
Ask why former users drop tools
Ask non-users why they are reluctant to use tools and techniques
5. Conclusions and suggestions for further
research
5.1. Dimensions for classifying personal information
management approaches: an initial summary
5.1.1. Data storage techniques
These are listed in Table 1 and Table 4 above, and can be summarised as:
Technique
Spreadsheets
Relational databases
Outlining and Outliners
Mindmaps
XML
RDF and OWL
5.1.2. An initial classification of personal information and functionalities
See section 3.6.1 for a list of personal information which might be stored in a PIM.
See section 3.6.2 for a list of the processes, associated with the maintenance of
personal information, which might be enabled by a PIM.
See 3.6.3 for a list of knowledge management techniques which might be offered by a
PIM.
5.2. Hypotheses still to be tested We don‘t know how many people actually use programs which can be specifically identified as
PIMs (Personal Information Managers). But although ―all‖ knowledge workers need to store
and manage data which is personal to them, by no means all use a PIM to do it.
The (initial, working) hypotheses follow:
5.2.1. Hypothesis 1
The data-centred approach adopted by most PIMs is not necessarily well
adapted to the working methods adopted by knowledge workers.
Establishing what styles and functionalities appeal to (or repel) different
types of users is not yet well understood.
This hypothesis hides a plethora of mini-hypotheses which will emerge and which we
will refine as we continue our work. We observe, for example, that it is necessary but
not sufficient for a PIM to provide a repository of personal information; specific
functionality is also required. Making PIMs more useful suggests the desirability of
―natural language‖ interfaces, and we will posit the value of a user-interface where the
user can interact with her data: perhaps The Poet and her Muse?
5.2.2. Hypothesis 2
Current PIMs tend to emphasise one particular information management
technique, to the exclusion of others. The absence of complementary
information management techniques is one of the factors which cause
knowledge workers to reject current PIMs.
It is suggested that PIMs are not much used because there are a significant number of
Information Management techniques implemented in various PIMs (and less formally by computer users themselves) which are normally presented as opposing when
instead they should be seen as complementary.
It should be possible for the individual to store and communicate data in a way that is
less closed (less rigid, less structured) and more communicable (able to be understood
by recipients) than current ones. For example, databases are very structured, the real
world is not. We will argue that there‘s a place for ALL of:
Structured data (databases and database-supported products such as contact
managers and email clients).
Semi-structured data (e.g. spreadsheet contents, hierarchical outlines, PIMs
which support the semi-structured organisation of small pieces of
information (phone numbers, errands to run, books to read...).
Rule-based classification (automatic assignment of data items to categories
which the software deems appropriate; the rules may be based simply on data
values, or may take the form of ‗regular expressions‘ - a regular expression
(regex or regexp for short) is a special text string for describing a search
pattern.
Inter-item hyperlinking, folksonomies (Gruber, Tom 2007), semantic tagging
and the like.
The use of a thesaurus and a lexicon of terms to assist in the maintenance and
communication of precise information and more accurate inferencing from
stored data.
5.2.3. Hypothesis 3
PIMs are not much used because PIMs either impose an ontology which
does not correspond to the user’s ontology, or do not permit that ontology
to be made explicit and/or shared. The incorporation of explicit
knowledge representation mechanisms which are tailored to their users’
(plural) needs will make a PIM more useful: by beginning to turn it into a
small-group knowledge manager.
This is a primary example of the functionality we believe to be missing from many
contemporary PIMs.
5.3. Constraints We have described a potentially very rich area of research in which at the moment there is not
a great deal of published material.
We recognise that there are large areas that deserve attention but that we will not have time to
investigate.
In particular, we note with approval the work of Penrose, Roger (1990) – which points to
fundamental limitations on the usefulness of computer-based approaches in this and in other
areas; and the large existing literature on human computer interface (HCI) issues, much of
which has relevance to this enquiry but which we have deliberately excluded from
consideration in this paper.
5.4. Design for Further Research
5.4.1. Further analyse existing PIM / GIM approaches (literature
and software)
Literature review and review of current practice
This will include work on cognate (e.g. linguistic and epistemological)
issues.
Establish a comprehensive list of existing PIM tools and techniques
Choose a small number to concentrate on for subsequent research
Establish Classification and Evaluation criteria
The evaluation of information systems is discussed in Beynon-Davies 2004.
Qualitative research (largely but not exclusively secondary) will be carried
out to identify and classify existing approaches and to characterise them in
accordance with significant dimensions as we identify them.
5.4.2. Identify and/or compose better approaches in order to
evaluate user reaction
The intention is to identify existing tools and approaches, and either minimally to
integrate them, or evaluate them in isolation.
Choose appropriate approaches
Tools
The two tools which we have initially identified are SQLNotes
2008 and Twine 2008.
Techniques
Make available and/or compose improved (or at least, better integrated) basic tool
We hope to identify a small number of tools and to see how users in at least two
communities of practice (Weir & Hutchings 2005) respond to them in a standalone
and in an assisted (accompanied) environment.
5.4.3. Observe and evaluate the behaviour of at least two small-groups of knowledge workers as they confront, learn and
exploit two different GIM approaches
We recognise the need to help people as they learn.
5.5. Research context
5.5.1. Methodological approaches will be based on a combination of:
Experimentation
Action Research (participation in, and concurrent evaluation of, experimental approaches)
Ethnographic approach
in different contexts of use. Two example contexts of use follow: they are Research and Projects and Project Management.
5.5.2. Research as a context of use
Research is archetypal knowledge creation (and therefore management) carried out by
individuals and small groups.
5.5.3. Projects and Project Management (PM) as a context of use
The management of projects is typically represented by mechanisms such as Gantt
charts. A Gantt chart is a specific model which abstracts and simplifies the project. As
such, it is only the static representation of the project.
In projects, functional activities represented as vertical relations interact with project
activities represented as horizontal relations.
Our project is a meta-project for itself, in a recursive relationship with itself.
5.6. Summary We hope that we have succeeded in demonstrating that individuals working in groups should
be encouraged and educated to make better use of the available computer-based tools, and that
the tools themselves should evolve into better ways of representing information and
knowledge.
We recognise that we need further to search for a better understanding of the way people use
these tools and learn new ones, in order subsequently to find strategies on how best to educate
people to make the right choice of the right tools. This paper has suggested a classification
scheme for these tools based primarily on their data representation: e.g. spreadsheet, relational
database, semantic web represented at the desktop level. At least one other dimension of
classification has also been suggested as significant, that of functionality. Usability issues have
been highlighted. The paper has also suggested that a judicious mix of existing and emerging
techniques and tools will permit evolution or revolution in the management of individual and
shared information and knowledge. Establishing the truth of that suggestion is forging our
future research agenda.
6. References
Allen, David 2001 ‗Getting Things Done: The Art of Stress-Free Productivity.‘ Penguin Books.
Beynon-Davies, P. & Owens, I. (2004) 'Information Systems Evaluation and the Information Systems Development Process.', The Journal of Enterprise Information Management, Vol. 17, No. 4. 2004, pp. 276-
282
Boardman, R. & Sasse, M. 2004 'Stuff goes into the computer and doesn't come out: a cross-tool study of
personal information management.‖ Proceedings of the SIGCHI conference on Human Factors, 2004.
Bricklin, Dan & B Frankston (1978) - tech. rep., Lotus Corp., 'Visicalc'
Burnett, Margaret & Curtis Cook & Omkar Pendse & Gregg Rothermel & Jay Summet & Chris Wallace 2003 'End-user software engineering with assertions in the spreadsheet paradigm.' Proceedings of the 25th
International Conference on Software Engineering, Portland, Oregon, 2003. Pages: 93 - 103.
Burnett, M. & Atwood, J. & Walpole Djang, R. & Reichwein, J. & Gottfried, H. & Yang, S. 2001 'Forms/3: A first-order visual language to explore the boundaries of the spreadsheet paradigm.' Journal of
Functional Programming, Vol. 11, Issue 2, pp. 155-206, March 2001.
Buzan, Tony 1996 'The Mind Map Book: How to Use Radiant Thinking to Maximize Your Brain's
Untapped Potential.' Plume, 1996.
Chandrasekaran B & JR Josephson & VR Benjamins 2008 'What Are Ontologies, and Why Do We Need
Them?' Available at http://doi.ieeecs.org accessed 20-06-2008.
Churchman, C.W. (1968) 'The Systems Approach.' New York: Dell. 1968
Codd, E. 1970 'A Relational Model of Data for Large Shared Data Banks'. Communications of the ACM,
Vol. 13, Issue 6, pp. 377-387, 1970.
Date, C.J. (1968) 'SQL Structured Query Language: A guide to the SQL standard.' Addison-Wesley
Longman Publishing Co., Inc. Boston, MA, USA. 1968.
Date, Chris J. 2003 An introduction to database systems 8ed. Addison-Wesley
De Vorsey K., Elson C., Gregorev N., Hansen J. 2006 ‗The Development of a Local Thesaurus to Improve
Access to the Anthropological Collections of the American Museum of Natural History‘ D-Lib Magazine
Volume 12 Number 4 April 2006. Found online at
http://www.dlib.org/dlib/april06/devorsey/04devorsey.html viewed most recently 25-04-2007
DERI 2008 is described at http://www.deri.ie/ accessed 20-06-2008.
Dijkstra, Edsger W. (1982) 'On the role of scientific thought.' In: Selected writings on computing: a
personal perspective. Springer Verlag, 1982.
Ecco 1997 can be found at
http://supportweb.netmanage.com/ts_viewnow/downloads/patchesUnsupported/ecco.asp
EndNote 2008 is described at http://www.endnote.com/ accessed 03-07-2008
Expresso 2008 How to Share Excel Spreadsheets - Free Webinar http://www.expressocorp.com/ accessed
11-07-2008
Fichter, D. 2004 Tools for Finding Things Again. Online; Sep/Oct2004, Vol. 28 Issue 5, p52-56, 5p, 2
charts, 3bw
Freyberg, C.A. 1996 FIND!!!
Gartner Group 2007 'Market Share: Enterprise E-Mail and Calendaring Software, Worldwide, 2004-2006.'
27 July 2007
Gemmell, Jim & Gordon Bell & Roger Lueder 2006 MyLifeBits: a personal database for everything.
Communications of the ACM Volume 49, Number 1 2006, Pages 88-95
Gnowsis 2008 is described at http://www.gnowsis.org/ accessed 20-06-2008.
Gregory M.R. & Norbis M. 2008 'Towards a Systematic Evaluation of Personal and Small Group Information and Knowledge Management‘. Paper presented to 5th International Conference on
Cybernetics and Information Technologies, Systems and Applications: CITSA 2008, in July 2008.
Gruber, Tom 2007 'Ontology of Folksonomy: A Mash-Up of Apples and Oranges.' International Journal on
Semantic Web and Information, 2007.
Haystack 2006 is to be found at http://simile.mit.edu/hayloft/index.html checked 28/07/2008
Ideaspace 2008 is to be found at http://www.ideaspace.com/
IFLANET 1998 'International Federation of Library Associations and Institutions: Functional Requirements for Bibliographic Records. Final Report — 1998. Since frequently updated. Found at
http://www.ifla.org/VII/s13/frbr/frbr1.htm accessed 11/07/2008.
Info Select 2007 Micro Logic Corp., South Hackensack, NJ; 201-342-6518; www.miclog.com.
KDE 2008 is described at http://pim.kde.org/ accessed 20-06-2008.
Kelly, D. 2006 Evaluating Personal Information Management Behaviours and Tools. Communications of
the ACM; Jan2006, Vol. 49 Issue 1, p84-86, 3p
Lotus Domino 2008 is described at http://www-306.ibm.com/software/lotus/products/domino/ accessed
02-07-2008.
Lotus Symphony 2008 is described at http://www-142.ibm.com/software/sw-
lotus/lotus/general.nsf/wdocs/lotusprods accessed 20-06-2008.
Panko, Raymond R. 1998 'What We Know About Spreadsheet Errors.' Journal of End User Computing's
Special issue on Scaling Up End User Development Volume 10, No 2. Spring 1998, pp. 15-21. Available
on the web in an expanded form at
http://www.opssys.com/instantkb/attachments/What_We_Know_About_Spreadsheet_Errors_Whitepaper-
GUID9b35763e2d504ddab36b9e26a4eee631.pdf accessed 11-07-2008.
PIM 2008 The disappearing desktop: PIM 2008 Conference on Human Factors in Computing Systems CHI
'08 extended abstracts on Human factors in computing systems
RefWorks 2008 is described at http://www.refworks.com/ accessed 03-07-2008
Sauermann, Leo & Ansgar Bernardi & Andreas Dengel 2005 Overview and Outlook on the Semantic
Desktop. Proc. of Semantic Desktop Workshop at the ISWC, 2005
SGML 2008 is described at http://www.w3.org/MarkUp/SGML/ accessed 20-06-2008.
Simile 2008 is described at http://simile.mit.edu/hayloft/index.html accessed 20-06-2008.
SQLNotes 2008 is described at http://sqlnotes.wikispaces.com/ accessed 20-06-2008.
Teevan, Jaime & William Jones & Benjamin B. Bederson 2006 Personal information management:
Introduction. Communications of the ACM Volume 49, Number 1 2006, Pages 40-43
Teevan, Jaime & William Jones 2008 'The disappearing desktop: pim 2008.' Conference on Human Factors in Computing Systems CHI '08, extended abstracts on Human factors in computing systems.
Available at
http://portal.acm.org.libezproxy.open.ac.uk/citation.cfm?id=1358628.1358956&coll=Portal&dl=GUIDE&
CFID=76138045&CFTOKEN=51276375 accessed 01-07-2008
Twine 2008 is described at http://www.radarnetworks.com/ accessed 28-04-2008
Ventana Research 2007 'Requirements for 21st Century Spreadsheets: Uses and misuses of a critical business technology: Executive Summary.' San Mateo,CA: Ventana Research, 2007. Found at
http://www.ventanaresearch.com/uploadedFiles/Ventana_Research_Requirements_for_21st_Century_Spre
adsheets_Executive_Summary_FINAL.pdf accessed 11-07-2008.
Visimap 2008 is developed by CoCo Systems and is described at http://www.coco.co.uk/ accessed 23-07-
2008.
W3C 2004 'Architecture of the World Wide Web, Volume One. W3C Recommendation 15 December
2004.' Found at: http://www.w3.org/TR/2004/REC-webarch-20041215/
Weir, D. & Hutchings, K. 2005 ‗Cultural embeddedness and contextual constraints: knowledge sharing in
Chinese and Arab cultures.‘ doi.wiley.com
Whittaker, Steve & Victoria Bellotti & Jacek Gwizdka 2006 Email in personal information management
Communications of the ACM Volume 49 , Issue 1 (January 2006)
Wikipedia - Knowledge representation (2006) Knowledge representation. Permanent link: http://en.wikipedia.org/w/index.php?title=Knowledge_representation&oldid=35105964 Page Version ID:
35105964
Wikipedia - Mind map (2008) Mind map. Permanent link
http://en.wikipedia.org/w/index.php?title=Mind_map&oldid=224639553 accessed 11-06-2008.
Wikipedia - OPML (2008) OPML. Found at http://en.wikipedia.org/wiki/OPML accessed 19/07/2008
Wikipedia - Personal information management (2008) 'Personal information management' found at
http://en.wikipedia.org/wiki/Personal_information_management accessed 18/08/2008.
Wikipedia - Semantic Wiki (2008) 'Semantic Wiki' found at FIND THIS
Wikipedia - Web Ontology Language (2006) Web Ontology Language. Permanent link: http://en.wikipedia.org/w/index.php?title=Web_Ontology_Language&oldid=3421802O Page Version ID:
34218020
XML (2008) is described at http://www.w3.org/XML/ accessed 20-06-2008.