Sharing
of Accession-based botanical information
Reduction of Costs in Herbarium Data-entry in Australia Using HISPID3
Barry J. Conn
Royal Botanic Gardens, Mrs Macquaries Road, Sydney NSW 2000, Australia
barry.conn@rbgsyd.nsw.gov.au
1998
Botanic Gardens, together with their frequently associated herbaria, maintain large amount of botanical information. Increasingly, this information is held in a computerised format. However, these data are rarely shared between similar botanical organisation, but rather, are often inadequately duplicated throughout the botanical community. This presentation discusses the costs of not sharing this information, both to the institutions involved and to the general community.
Since October 1996, National Herbarium of New South Wales, as part of the Royal Botanic Gardens, Sydney, has began the process of incorporating TQM principles and philosophies into its organisational structure. Senior management and staff are currently reviewing the organisations corporate plans and strategies according to TQM. At this stage, facilitators have been selected, but are as yet untrained, and no quality improvement implementation plans have been developed. The management of all other Australian herbaria are based on more traditional management principles. Therefore, all members of CHAH are very unfamiliar with quality management processes.
For your information, HISPID3 is an Australian data-transfer standard that defines how herbarium botanical information should be arranged for electronic interchange. In October 1996, this standard was submitted to the Taxonomic Database Working Group (TDWG) for ratification as an international interchange standard for all herbaria (in Toronto, Canada). The latter Group is a member of the Biological Union of the IUCN, Genéve.
Note: the currency quoted in this article is in Australian dollars.
Conn, B.J. (ed.)(1996). HISPID3 - Herbarium
Information Standards and Protocols for Interchange of Data, Version 3, Royal Botanic
Gardens, Sydney. [Internet URL - http://www.rbgsyd.gov.au/HISCOM].
CONTENTS
The agreement by the Council Heads of Australian Herbaria (CHAH), at their general meeting in Darwin (1996),
to freely interchange electronic herbarium label data between all major Australian
herbaria, represents a significant transnational commitment to the sharing of botanical
data. The curatorial policy of all major Australian herbaria accept and expect electronic
data to be provided to other Australian institutions, as part of the specimen exchange
program. This study reviews the first five months of the application of HISPID3 (Herbarium Information Standards and
Protocols for Interchange of Data, Version 3, 1996), the standard format used for the
interchange of electronic herbarium label data.
CONTENTS
The rationale for sharing electronic herbarium data between Australian herbaria is that
data entry consumes considerable personnel and financial resources, as well as time. The
relevance of the need for sharing electronic data is based on the assumption that herbaria
want a representative collection of specimens for a region and/or taxa.
CONTENTS
Overall Aims
The overall aims of the implementation of an electronic exchange program, using HISPID3, are:
Specific Aim
The specific challenge (aim) for senior management of all Australian herbaria is
There are eight major herbaria in Australia, one in each Capital city. The managers of
these institutions are members of the Council Heads of Australian Herbaria (CHAH). The Darwin 1996 commitment by CHAH requires a paradigm shift from the frequently
politically State-defined philosophies and strategies of the past, to a more collective
and cooperative national approach. Like all organisations in public and private sectors,
herbaria are being asked to do more with less. Therefore, it is opportune for
them to minimise demands on reducing resources while being able to provide an expanded
level of accession-based botanical information to their stakeholders. The interchange of
herbarium data, using HISPID3, represents an
important first step in the implementation of quality management principles and philosophy
across the Australian herbarium community.
CONTENTS
The National Herbarium of New South Wales
The specific details presented here, are based on statistics gathered from the National
Herbarium of New South Wales (NSW), Royal
Botanic Gardens, Sydney.
Current Specifications
Current Statistics

Figure 1. Average number of replicates collected by NSW staff as a percentage of the total number of collections.
Analysis Of Data

Figure 2. Total number of samples (collections plus replicates) per number of replicates in each collection.
The average cost of data-entering a specimen is based on the considerable data entry projects funded through the Environment Resources Inventory Network (ERIN) during the 1980s.
This figure allows for the many daily activities, besides data-processing, that data entry staff have to deal with.
Therefore, the collection databased by NSW staff from 1992 until the end of 1996 (180,943 collections), represents a financial commitment of over $180,000 per year for this period. However, 41,705 of these collections had been received as donations, many of which had already been databased at the originating herbarium. Therefore, of the previous amount, NSW committed approximately $41,000 per year duplicating data held at the originating or other herbaria.

Figure 3. The factors contributing to the costs of Data processing the Herbarium label information.
Factors Contributing to Data Entry Costs
There are several factors (causes) that contribute to the final cost of data-processing herbarium label information (refer Fig. 3). A significant investment of resources are required to establish an appropriate computer system and database. Associated with these expenses, Information Technology personnel, either as staff or consultants, are required to develop and maintain the computer system. Resources are directly required for IT staff training or indirectly in the form of equivalent consultant expertise. Likewise, resources are required to ensure data-entry staff have adequate skills. The completeness, and hence, standard of the curation of herbarium specimens also effects the amount of time spent completing or adding herbarium label information (eg. adding latitude and longitude geocodes for spatial data, or deciphering handwriting or abbreviations on the label).
However, on a national or global scale, the Cost of Data Entry is only one
of the contributing factors of the Overall Costs of Herbarium Label Data
Capture for any institution. The herbarium label information can either be directly
entered by institutional data-processes (as assumed in Figure 3, above) or this
information can be electronically provided by donating institutions (Fig. 4).
CONTENTS
Aim
The specific aim of this proposal is:
| to reduce the cost of data entry by donating electronic herbarium label data between all major Australian herbaria, as part of the normal herbarium exchange program, with implementation of the methodology by the end of 1997. |
Assessment of Current Practice
Approximately 300,000 collections have been databased at NSW,
representing a cost of about $1.5 million just for data entry. That is, approximately
$150,000 per annum over 10 years. Although, many of these collections are replicates
received from other institutions, a significant amount of the herbarium label information
of this material, is not held in an electronic form elsewhere. However, modern Australian
material exchanged between herbaria, is almost always held in an electronic form by the
donating institution. Therefore, this form of the data represents an important opportunity
for Australian herbaria to reduce the national costs of electronic data capture.

Figure 4. The factors contributing to the Overall Cost of Herbarium Label Data Capture.
Cost of Incorporating Electronic Data into an Herbarium Database
Initial Costs
Apart from hardware and software expenses specific to the database, there is an initial expense in developing software to manipulate the information held in a database so that it is compatible with the HISPID3 interchange standard. Likewise, software is required to transfer incoming HISPID3-formatted data back into a form that can be incorporated into the institutions accession database. For NSW to be in a position to exchange electronic data, more or less according to HISPID3 format, has cost approximately $4,500, representing approximately one month of development time by institutional IT staff.
In recent years, Australian herbaria receive approximately 3,500 replicate collections and donate a similar number, as part of the herbarium exchange program. Although the cost of direct data entry was known, the actual cost of incorporating similar, but electronic data, transferred from one institution to another, had not been calculated. Herbarium data transferred from the National Herbarium of Victoria (MEL)(in HISPID3 format), from the Australian National Herbarium (CANB) and the Northern Territory Herbarium (DNA)(both converted by NSW to HISPID3 format) established an average cost of $0.60 per specimen.
Reducing the National Cost of Data-Capturing Herbarium Label Data
Figure 5 diagrammatically illustrates that the costs of directly data-processing the original collection plus each of the replicates (in different institutions) is significantly more costly than data-entering the information once and then electronically interchanging this information with the replicate material. For example, the average cost of data-processing a unicate collection is $5.00; whereas a collection with one replicate is $10.00 (compared to $5.60 ($5.00 + $0.60) for sharing the electronic data with the recipient herbarium); one collection with two replicates is $15.00 (compared to $6.20); and so on.
Cost-Benefit Analysis
If it is assumed that an organisation is committed to electronically storing the herbarium label data of their botanical accessions, then it is relatively easy to estimate the more significant financial costs of developing and maintaining an electronic interchange system. As part of this analysis, it is assumed that the computer hardware, software and IT staff are such that the potential capability of interchanging data is possible. Therefore, these costs are excluded.
The Estimated Costs are as follows:
Initial (One-Off) Costs -
| Development of software to manipulate the exchange data into HISPID3 format and to translate it from HISPID3 format for storage in the database | $4,500.00 |
Recurring Costs -
Based on the following four assumptions:
| 1. | Cost of selecting replicate electronic records from database | $ 5.00 |
[Note: this assumption is based on the selection of records consisting of 3 replicates, at an hourly rate of $20.00. Since it is assumed that it takes approximately 1 hour to select a standard-size exchange set of records, it is effectively cheaper to select records with more replicates than those with fewer ones.]
| 2. | Cost of semi-automatic conversion of data (from database) into HISPID3 format (including verification of standard of resulting HISPID3 files) | $ 10.00 |
[Note: it is here assumed that is takes half an hour to convert each exchange file, including random checks of each to ensure it satisfies the HISPID3 standard. It also assumes that each file is interchanged using email.]
| 3. | Cost of semi-automatic conversion of data (from exchange HISPID3 file) into format for inclusion into NSW database | $ 10.00 |
[Note: the assumptions of point 2 are also relevant here.]
| 4. | Each exchange file consists of an average of 50 specimens. [Based on costs
of assumptions 1 and 2 (or 3) per 50 specimens.] Therefore: |
$ 0.30 |

Figure 5: The average cost of directly data-processing the herbarium label data of one collection with none, 1, 2, 3, 4 or 5 replicates, compared to the cost of capturing that data by electronically sharing this data.
| Cost of each electronically received record (by email) | $ 0.90 |
[Note: this is based on the assumption that it costs approximately $0.60 to capture
electronically transferred data into a database, by the data-processor plus $0.30 to
convert the HISPID3 file, for each record, into a suitable format for inclusion in the
institutional database.]
CONTENTS
Optimum Number of Replicates
Although it is clear that it requires less resources to share electronic herbarium label
data of replicates, is there an optimum number of replicates (per collection) that
minimises the costs without placing undue demands on other aspects of the exchange
program? The difference between the cost of data-processing unicate collections and those
with replicates, where the latter information is transferred electronically, is
illustrated in Figure 5 (above). However, a large number of replicates add to the costs of
processing (including pressing, drying, pest-control and dispatch). In Figure 6, the %
cost of capturing the information of replicates by electronic transfer is compared with
the cost of directly data-processing each replicate. It can be seen that one replicate
reduces the cost of capturing the data of both the original (primary) specimen and its
replicate by almost half (actually 56%). If two replicates are collected (three samples in
total), then the cost is reduced by about 40%. Whereas three replicates reduces the
overall cost to 34%. To make further significant savings, it is necessary to collect many
more replicates. For example, to reduce the costs to a quarter, it is necessary to
collected 6 replicates (25.1%).
Therefore:

Figure 6: The % cost of capturing the information of replicates by electronic interchange, compare to the costs of directly data-processing each replicate.
Although the mean number of replicates can be increased by each plant collector
including more replicates in each collection, an excessive increase in the amount of
additional material would significantly add to herbarium processing costs. Collectors
already collect a range of replicates. Therefore, a significant increase in this number
would be required before a noticeable change in the mean occurred. A more effective method
of increasing the overall mean number of replicates is to collect fewer unicates rather
than by just collecting more replicates.
CONTENTS
MONITORING OF IMPLEMENTATION PLAN
The effectiveness of an electronic exchange program, requires the Collection-
Replicate profile of the organisation to be monitored. That is, the average of the
number of collections compared to the number of replicates in each collection (similar to
that illustrated in Fig. 1). Furthermore, a similar chart of the profile of each
individual collector is necessary to assess the progress (quality) of their
collection- replicate ratio.
CONTENTS
There are several challenges remaining before continuous improvement of quality in the current herbarium specimen exchange program will be realised.
On average, 3,500 specimens are exchanged between each Australian herbarium per annum. Therefore, the electronic sharing of herbarium label data represents a saving (benefit) of approximately $15,000 per institution, when compared with current practices.
A National Reduction in the Overall Cost of Herbarium Label Data Capture can be achieved by:
| CONTENTS | Back to REPORTS |