Sharing of Accession-based botanical information
Reduction of Costs in Herbarium Data-entry in Australia Using HISPID3

Barry J. Conn
Royal Botanic Gardens, Mrs Macquaries Road, Sydney NSW 2000, Australia
barry.conn@rbgsyd.nsw.gov.au

1998

SUMMARY INTERCHANGE AGREEMENT (1996)
RATIONALE INTERCHANGE OF ELECTRONIC HERBARIUM DATA
AIMS OF ELECTRONIC EXCHANGE PROGRAM
ORGANISATIONAL CONTEXT

CURRENT SITUATION

COST OF DATA ENTRY HOW TO REDUCE DATA-ENTRY COSTS NATIONALLY
OPTIMUM NUMBER OF REPLICATES MONITORING OF IMPLEMENTATION PLAN
CONCLUSION

RETURN TO REPORTS

Summary

Botanic Gardens, together with their frequently associated herbaria, maintain large amount of botanical information. Increasingly, this information is held in a computerised format. However, these data are rarely shared between similar botanical organisation, but rather, are often inadequately duplicated throughout the botanical community. This presentation discusses the costs of not sharing this information, both to the institutions involved and to the general community.

Since October 1996, National Herbarium of New South Wales, as part of the Royal Botanic Gardens, Sydney, has began the process of incorporating TQM principles and philosophies into its organisational structure. Senior management and staff are currently reviewing the organisation’s corporate plans and strategies according to TQM. At this stage, facilitators have been selected, but are as yet untrained, and no quality improvement implementation plans have been developed. The management of all other Australian herbaria are based on more traditional management principles. Therefore, all members of CHAH are very unfamiliar with quality management processes.

For your information, HISPID3 is an Australian data-transfer standard that defines how herbarium botanical information should be arranged for electronic interchange. In October 1996, this standard was submitted to the Taxonomic Database Working Group (TDWG) for ratification as an international interchange standard for all herbaria (in Toronto, Canada). The latter Group is a member of the Biological Union of the IUCN, Genéve.

Note: the currency quoted in this article is in Australian dollars.

Conn, B.J. (ed.)(1996). HISPID3 - Herbarium Information Standards and Protocols for Interchange of Data, Version 3, Royal Botanic Gardens, Sydney. [Internet URL - http://www.rbgsyd.gov.au/HISCOM].
CONTENTS

INTERCHANGE AGREEMENT (1996)

The agreement by the Council Heads of Australian Herbaria (CHAH), at their general meeting in Darwin (1996), to freely interchange electronic herbarium label data between all major Australian herbaria, represents a significant transnational commitment to the sharing of botanical data. The curatorial policy of all major Australian herbaria accept and expect electronic data to be provided to other Australian institutions, as part of the specimen exchange program. This study reviews the first five months of the application of HISPID3 (Herbarium Information Standards and Protocols for Interchange of Data, Version 3, 1996), the standard format used for the interchange of electronic herbarium label data.
CONTENTS

RATIONALE INTERCHANGE OF ELECTRONIC HERBARIUM DATA

The rationale for sharing electronic herbarium data between Australian herbaria is that data entry consumes considerable personnel and financial resources, as well as time. The relevance of the need for sharing electronic data is based on the assumption that herbaria want a ‘representative’ collection of specimens for a region and/or taxa.
CONTENTS

AIMS OF ELECTRONIC EXCHANGE PROGRAM

Overall Aims
The overall aims of the implementation of an electronic exchange program, using HISPID3, are:

Specific Aim
The specific challenge (aim) for senior management of all Australian herbaria is

ORGANISATIONAL CONTEXT

There are eight major herbaria in Australia, one in each Capital city. The managers of these institutions are members of the Council Heads of Australian Herbaria (CHAH). The ‘Darwin 1996’ commitment by CHAH requires a paradigm shift from the frequently politically State-defined philosophies and strategies of the past, to a more collective and cooperative national approach. Like all organisations in public and private sectors, herbaria are being asked to ‘do more with less’. Therefore, it is opportune for them to minimise demands on reducing resources while being able to provide an expanded level of accession-based botanical information to their stakeholders. The interchange of herbarium data, using HISPID3, represents an important first step in the implementation of quality management principles and philosophy across the Australian herbarium community.
CONTENTS

CURRENT SITUATION

The National Herbarium of New South Wales
The specific details presented here, are based on statistics gathered from the National Herbarium of New South Wales (NSW), Royal Botanic Gardens, Sydney.

Current Specifications

Current Statistics

Figure 1. Average number of replicates collected by NSW staff as a percentage of the total number of collections.

Analysis Of Data

Figure 2. Total number of samples (collections plus replicates) per number of replicates in each collection.

CONTENTS

COST OF DATA ENTRY

The average cost of data-entering a specimen is based on the considerable data entry projects funded through the Environment Resources Inventory Network (ERIN) during the 1980’s.

This figure allows for the many daily activities, besides data-processing, that data entry staff have to deal with.

Therefore, the collection databased by NSW staff from 1992 until the end of 1996 (180,943 collections), represents a financial commitment of over $180,000 per year for this period. However, 41,705 of these collections had been received as donations, many of which had already been databased at the originating herbarium. Therefore, of the previous amount, NSW committed approximately $41,000 per year duplicating data held at the originating or other herbaria.

Figure 3. The factors contributing to the costs of Data processing the Herbarium label information.

Factors Contributing to Data Entry Costs

There are several factors (causes) that contribute to the final cost of data-processing herbarium label information (refer Fig. 3). A significant investment of resources are required to establish an appropriate computer system and database. Associated with these expenses, Information Technology personnel, either as staff or consultants, are required to develop and maintain the computer system. Resources are directly required for IT staff training or indirectly in the form of equivalent consultant expertise. Likewise, resources are required to ensure data-entry staff have adequate skills. The completeness, and hence, standard of the curation of herbarium specimens also effects the amount of time spent completing or adding herbarium label information (eg. adding latitude and longitude geocodes for spatial data, or deciphering handwriting or abbreviations on the label).

However, on a national or global scale, the ‘Cost of Data Entry’ is only one of the contributing factors of the ‘Overall Costs of Herbarium Label Data Capture’ for any institution. The herbarium label information can either be directly entered by institutional data-processes (as assumed in Figure 3, above) or this information can be electronically provided by donating institutions (Fig. 4).
CONTENTS

HOW TO REDUCE DATA-ENTRY COSTS NATIONALLY

Aim
The specific aim of this proposal is:

to reduce the cost of data entry by donating electronic herbarium label data between all major Australian herbaria, as part of the normal herbarium exchange program, with implementation of the methodology by the end of 1997.


Assessment of Current Practice
Approximately 300,000 collections have been databased at NSW, representing a cost of about $1.5 million just for data entry. That is, approximately $150,000 per annum over 10 years. Although, many of these collections are replicates received from other institutions, a significant amount of the herbarium label information of this material, is not held in an electronic form elsewhere. However, modern Australian material exchanged between herbaria, is almost always held in an electronic form by the donating institution. Therefore, this form of the data represents an important opportunity for Australian herbaria to reduce the national costs of electronic data capture.

 

Figure 4. The factors contributing to the ‘Overall Cost of Herbarium Label Data Capture’.

 

Cost of Incorporating Electronic Data into an Herbarium Database

Apart from hardware and software expenses specific to the database, there is an initial expense in developing software to manipulate the information held in a database so that it is compatible with the HISPID3 interchange standard. Likewise, software is required to transfer incoming HISPID3-formatted data ‘back’ into a form that can be incorporated into the institution’s accession database. For NSW to be in a position to exchange electronic data, more or less according to HISPID3 format, has cost approximately $4,500, representing approximately one month of development time by institutional IT staff.

In recent years, Australian herbaria receive approximately 3,500 replicate collections and donate a similar number, as part of the herbarium exchange program. Although the cost of direct data entry was known, the actual cost of incorporating similar, but electronic data, transferred from one institution to another, had not been calculated. Herbarium data transferred from the National Herbarium of Victoria (MEL)(in HISPID3 format), from the Australian National Herbarium (CANB) and the Northern Territory Herbarium (DNA)(both converted by NSW to HISPID3 format) established an average cost of $0.60 per specimen.

Reducing the National Cost of Data-Capturing Herbarium Label Data

Figure 5 diagrammatically illustrates that the costs of directly data-processing the original collection plus each of the replicates (in different institutions) is significantly more costly than data-entering the information once and then electronically interchanging this information with the replicate material. For example, the average cost of data-processing a unicate collection is $5.00; whereas a collection with one replicate is $10.00 (compared to $5.60 ($5.00 + $0.60) for sharing the electronic data with the recipient herbarium); one collection with two replicates is $15.00 (compared to $6.20); and so on.

Cost-Benefit Analysis

If it is assumed that an organisation is committed to electronically storing the herbarium label data of their botanical accessions, then it is relatively easy to estimate the more significant financial costs of developing and maintaining an electronic interchange system. As part of this analysis, it is assumed that the computer hardware, software and IT staff are such that the potential capability of interchanging data is possible. Therefore, these costs are excluded.

The Estimated Costs are as follows:

Initial (One-Off) Costs -

Development of software to manipulate the exchange data into HISPID3 format and to translate it from HISPID3 format for storage in the database

$4,500.00


Recurring Costs -

Based on the following four assumptions:

1. Cost of selecting replicate electronic records from database

$ 5.00

[Note: this assumption is based on the selection of records consisting of 3 replicates, at an hourly rate of $20.00. Since it is assumed that it takes approximately 1 hour to select a standard-size exchange set of records, it is effectively cheaper to select records with more replicates than those with fewer ones.]

2. Cost of semi-automatic conversion of data (from database) into HISPID3 format (including verification of standard of resulting HISPID3 files)

$ 10.00

[Note: it is here assumed that is takes half an hour to convert each exchange file, including random checks of each to ensure it satisfies the HISPID3 standard. It also assumes that each file is interchanged using email.]

3. Cost of semi-automatic conversion of data (from exchange HISPID3 file) into format for inclusion into NSW database

$ 10.00

[Note: the assumptions of point 2 are also relevant here.]

4. Each exchange file consists of an average of 50 specimens. [Based on costs of assumptions 1 and 2 (or 3) per 50 specimens.]

Therefore:
Costs per Electronically transferred Record (by email)

$ 0.30

Figure 5: The average cost of directly data-processing the herbarium label data of one collection with none, 1, 2, 3, 4 or 5 replicates, compared to the cost of capturing that data by electronically sharing this data.

[Note: this is based on the assumption that it costs approximately $0.60 to capture electronically transferred data into a database, by the data-processor plus $0.30 to convert the HISPID3 file, for each record, into a suitable format for inclusion in the institutional database.]
CONTENTS

Optimum Number of Replicates
Although it is clear that it requires less resources to share electronic herbarium label data of replicates, is there an optimum number of replicates (per collection) that minimises the costs without placing undue demands on other aspects of the exchange program? The difference between the cost of data-processing unicate collections and those with replicates, where the latter information is transferred electronically, is illustrated in Figure 5 (above). However, a large number of replicates add to the costs of processing (including pressing, drying, pest-control and dispatch). In Figure 6, the % cost of capturing the information of replicates by electronic transfer is compared with the cost of directly data-processing each replicate. It can be seen that one replicate reduces the cost of capturing the data of both the original (primary) specimen and its replicate by almost half (actually 56%). If two replicates are collected (three samples in total), then the cost is reduced by about 40%. Whereas three replicates reduces the overall cost to 34%. To make further significant savings, it is necessary to collect many more replicates. For example, to reduce the costs to a quarter, it is necessary to collected 6 replicates (25.1%).

Therefore:

Figure 6: The % cost of capturing the information of replicates by electronic interchange, compare to the costs of directly data-processing each replicate.

Although the mean number of replicates can be increased by each plant collector including more replicates in each collection, an excessive increase in the amount of additional material would significantly add to herbarium processing costs. Collectors already collect a range of replicates. Therefore, a significant increase in this number would be required before a noticeable change in the mean occurred. A more effective method of increasing the overall mean number of replicates is to collect fewer unicates rather than by just collecting more replicates.
CONTENTS

MONITORING OF IMPLEMENTATION PLAN

The effectiveness of an electronic exchange program, requires the ‘Collection- Replicate’ profile of the organisation to be monitored. That is, the average of the number of collections compared to the number of replicates in each collection (similar to that illustrated in Fig. 1). Furthermore, a similar chart of the profile of each individual collector is necessary to assess the ‘progress’ (quality) of their collection- replicate ratio.
CONTENTS

CONCLUSION

There are several challenges remaining before continuous improvement of quality in the current herbarium specimen exchange program will be realised.

On average, 3,500 specimens are exchanged between each Australian herbarium per annum. Therefore, the electronic sharing of herbarium label data represents a saving (benefit) of approximately $15,000 per institution, when compared with current practices.

A National Reduction in the ‘Overall Cost of Herbarium Label Data Capture’ can be achieved by: