experiences with implementing emu at the australian museum. - lance wilkie (emu unit manager)

Click here to load reader

Post on 15-Jan-2016




0 download

Embed Size (px)


  • Experiences with implementing EMu at the Australian Museum.- Lance Wilkie (EMu Unit Manager).

  • The Australian Museum

  • Collection databasing 1877 to mid 1990s

    Card files in use for over 20 years (still referred to infrequently)

    Data sheets still used in some collections Hardcopy Era: Earliest Museum register in 1877. Traditional registers used till 2000 (or later) in some collectionsDatabasing at the AM falls into two eras hard copy and digital.

  • Collection databasing 1970s to 20045) 2004 - SUN SunFire 280R server, 1.4 TB SAN, and tape robot purchased ~$400,000Digital Era:1) Mid 1970s batch loading of data into CSIRO Cyber 76 computer started2) 1987 to 2001 several old servers used3) 2001 first EMu server purchased for an integrated collection management system.4) 2002 Collections database integration project commenced.

  • Australian Museum Texpress Databases 1987 - data into KE Titan 2.0 1995 - data into KE Texpress 5.0.39 Became the major repositories of Collections data. 1987 - data into KE Titan 2.0 1987 - data into KE Titan 2.0 1995 - data into KE Texpress 5.0.39

  • 2002 - ?: Australian Museum Texpress Databases to EMu

    DisciplineOriginal DatabaseEMu DatabaseCurrent StatusApprox. # Catalogue recordsArachnologyTexpress SpidersAMliveMerged and live93, 000EntomologyTexpress EntomologyAMliveMerged and live122, 000HerpetologyTexpress HerpetologyAMliveMerged and live164, 000Marine InvertsTexpress Marine InvertsAMliveMerged and live167, 000MammalsTexpress MammalogyAMliveMerged and live43, 000MineralogyTexpress MineralogyAMliveMerged and live2, 000OrnithologyTexpress OrnithologyAMliveMerged and live86, 500PaleontologyTexpress PaleontologyPALliveLive, yet to be merged with AMlive25, 000IcthyologyTexpress IcthyologyFISHliveLive, yet to be merged with AMlive175, 000MalacologyTexpress MalacologyMALtestTest phase.452, 000Anthropology?AMANTHliveLive, not merged with Natural History databases74, 000Still To Do:Total:1, 403, 500MineralogyNon-texpress: Dbase, ExcelYet to be mapped and loaded??Evolutionary Biology UnitTexpress EBUYet to be mapped and loaded??Collections Integrity UnitAREVYet to be mapped and loaded??Various disciplinesNon-texpress data formats eg. Access databases, Excel spreadsheets.Yet to be mapped and loaded??

  • Problem #1 - CustomisationWho is going to change, and how much?

  • Cetacea tabMammals tabOrnithology 1 tabMeasurements tab

  • Problem #2 User Level PermissionsWho can do what?

  • Past Present

  • The Implications of Object-Based DatabasesPartiesCatalogue

  • PartiesCatalogueCollection EventsSitesTaxonomy

    Field 1Field 2Field 3Field 4Field 5

  • Record Security Feature

  • Discipline Tagging and Insert/Query Defaults

  • Registry Renovation

  • Problem #3 Impacts on Collection Management Work Practices.Who will do it if I cant?

  • Finally.Some things wed like to see:The capability to copy values from one field to another using the global edit facility.More connectivity between Sites, Collection Events and Catalogue modules particularly an Objects field in Query mode for the former two?A review of the Cerberus issue management facility improved issue tracking

    The Australian Museum is Australias oldest museum.Commenced housing collections in 1827.Characterised by still having large and active research and collect branches in a wide variety of fields of natural history and anthropology.Material continues to flow into the collection at a prodigious rate. Collections had been accumulating at the Museum for over 100 years before digital databasing was even a possible option.The technology and hardware employed has steadily grown in sophistication.However, digital collection management within sections of the museum evolved independently, and largely on an ad hoc basis.The most widely utilised platform for databasing in the Museum prior to the transfer to EMu was KE Titan, followed by KE Texpress.However, Texpress databases were customised individually for each user discipline, and each collections database was entirely separate from any other discipline.When databasing started, it was initially intended to be for vertebrate groups only. It was only later that it was decided to incorporate other disciplines.The different collection databases vary significantly in terms of degree of sophistication.Some are essentially just flat files, others have varying degrees of relational complexity.Many are utilised to perform daily collection management tasks, such as producing loans transaction forms and specimen identification labels.They are often utilised by scientific researchers, who require bulk data dumps.They provide a data source for many web services, such as OZCAM, FaunaNet etc.KE provide the option for individual customisation to institutions needs.A balance needs to be struck between user desires and the resulting impacts in terms of time delays in the migration process, financial cost, and unnecessary complication of the database design.Particularly difficult when transferring users from a flat file design.

    An example of a complicated but necessary modification during the data migration process.Necessary because locational data is critically important in natural history collections.Data is collected in different formats by different departments.Accuracy/precision is lost when transferring from one co-ordinate system to another.Need both a common co-ordinate system for mapping in GIS, and to preserve the original co-ordinates in their original format.

    Other examples of essential customisations are the Preparations tabs, which allow the recording of several objects (with different data field requirements) under the same registration number, and the Mixed Taxa tab which allows several objects with differing taxonomy to be recorded under the same registration number.

    These are examples of design modifications that in hindsight were probably less important, possibly even unnecessary.This is because they have a lot of repetition of essentially the same fields.In addition, most of the data in these fields could have effectively been housed on the Measurements tab.Often, these sort of design changes are requested by users coming from a flat file system, where data is typically stored in this manner.It also arises because of user desires to have all their own data entry fields in the one place rather than having to switch between tabs or modules.A major challenge of requesting client customisation therefore lies in balancing between essential user-specific design changes and those which either offer limited benefit or can even be seen as detrimental in terms of over-complicating the overall design. Then selling it back to the users.

    The registry allows a high degree of custmisation of permissions for both users and user groups.However, this is still limited to a certain degree by the fact that privileges tend to be all or nothing.For example, you can either insert a record or you cant. You can either be able to bulk edit records, edit one record at a time, or not be able to edit records at all.Historically, the Museum databases grew in isolation of each other.Databsing was purely part of collection management practices, and the data for each discipline was the responsibility of each collection division.Each division therefore had a number of users, commonly all utilising database tools that under the EMu system should require quite senior administrative privileges.We are therefore moving from a set of isolated systems with little record security to a shared database system employing tiered record security (higher level permissions granted to successively fewer people).Allow one-to-many relationships between two modules in both directions.

    Allow one field in one module to be linked to multiple fields in another module, or multiple other modules.

    Editing without due consideration of potential implications for other attached records can have disastrous implications!!!!!!

    .Especially once data sets are shared between disciplines.One of the more obvious solutions for ensuring record security and protection from accidental corruption by another user.but is it perfect for our needs?For one thing, we do not necessarily want to prevent other users from utilising records field collections are often multi-disciplinary, so there are many instances where different departments may have unique Collection Event data, but share Site data. The Bibliography module also presents a good example of where records could quite likely be shared between disciplines.There is a considerable amount of legacy data already in the system, and some has already been linked inter-departmentally. Record security setting can stop people from being able to access this record completely (not even viewable), which means if it is already attached to one of their own records they are locked out from even fixing it.On a more general level, it requires registry permissions to be set for a user in order for them to utilise this facility, as well as edit permissions for that particular record. If daSecurity is not set for all users, who sets the record security for new records? If it is set for all users there is an increased risk of records being locked when they perhaps shouldnt be.One thing we have tried to do is to add fields to modules allowing records to be tagged with a Discipline jurisdiction, or ownershipThis allows any user thinking of utilising this record as an attachment to one of their own records which department was responsible for the original data entry. If it was from another department, they should avoid editing the record if it doesnt quite suit their purpose.Data security can be aided by adding query and insert defaults to these Discipline fields to ensure users are restricted to certain records via registry entries, however this is not always desirable as we sometimes want to share records.Query defaults are easily removed if you know how, and insert defaults can be overwritten, so this isnt really an effective security measure.We have also embarked on a major re-design of our registry groups, partially as a means to address the issue of record security in a shared environment, and partially to address the other major criticism of EMu that it appears to complex and intimidating to new users. We are designing registry groups for each discipline in which all redundant modules and tabs (ie. Ones they dont use) are hidden, and insert tab orders are set to maximise efficiency of data entry.Within each discipline, we are also expanding the number of categories of user group, and allowing a less abrupt difference in permissions between category levels.This is still very much at the testing and refining phase.Other measures we are implementing to ensure data security and quality include AM-specific task based user protocols (eg. How to process a loans record) Possibly the greatest impacts on AM collection management practices to date have been the lack of a bulk import facility, and the difficulties in downloading large data sets.There is a variety of situations in which there is the need to enter a large number of data records into the database in a short time entry of site and collection event data on return from a field trip is a good example, as this information is often needed to create locality labels to attach to specimens.We have only just received and begun testing V3.2 with the new import facility, so it is too early to say how great an advantage to us it will be.One potential issue of concern is that this again is or should be a senior administrative privilege, so there is a balance that needs to be struck between a large potential demand for this service, and the requirement to make it available to only a very limited number of people.Similarly, there are some problems with the everyday need to provide data for scientific research purposes or web servicing. For moderately large numbers of records and a large number of fields, a spreadsheet is the preferred form of data delivery. However the Excel report facility in EMu is severely limited in this regard. Currently we are investigating XSLT, but this is a new feature for us and we have limited available experience. Web provision requires very large data sets to be dumped at regular intervals we are in the process of having backend scripts developed to do this for us.The global edit facility currently only allows you to replace a value or set of values in a field with another value in that same field. It is not an uncommon editing problem to want to move a set of values from one field to another eg. wrongly mapped legacy data, or data entry into a wrong field that was not detected for some time. There is no easy way to do this currently other than through the back end.The current multi-module method of conducting searches is extremely slow. A search such as All the specimens of a particular genus collected in the vicinity of Lord Howe island between a depth of 5 and 10m requires utilising the Catalogue, Sites and Collection Events modules and must be conducted from the Catalogue module. The ability to query for Catalogue objects from the Collections Event module for example could potentially make these sort of queries faster and easier to run.