Biodiversity Data Journal 9: e65023 OO) doi: 10.3897/BDJ.9.e65023 open access The Atlas of Living Australia: History, current state and future directions Lee Belbin?, Elycia Wallis’, Donald Hobern*, Andre Zerger* $ Atlas of Living Australia, CSIRO, Canberra, Australia § Atlas of Living Australia, CSIRO, Melbourne, Australia Corresponding author: Andre Zerger (andre.zerger@csiro.au) Academic editor: Lyubomir Penev Received: 25 Feb 2021 | Accepted: 29 Mar 2021 | Published: 21 Apr 2021 Citation: Belbin L, Wallis E, Hobern D, Zerger A (2021) The Atlas of Living Australia: History, current state and future directions. Biodiversity Data Journal 9: e65023. https://doi.org/10.3897/BDJ.9.e65023 Abstract The Atlas of Living Australia (ALA) is Australia’s national biodiversity database, delivering data and related services to more than 80,000 Australian and international users annually. Established under the Australian Government's National Collaborative Research Infrastructure Strategy to provide trusted biodiversity data to support the research sector, its utility now extends to government, higher education, non-government organisations and community groups. These partners provide data to the ALA and leverage its data and related services. The ALA has also played an important leadership role internationally in the biodiversity informatics and infrastructure space, both through its partnership with the Global Biodiversity Information Facility and through support for the international Living Atlases programmes which has now delivered 24 instances of ALA software to deliver sovereign biodiversity data capability around the world. This paper begins with a historical overview of the genesis of the ALA from the collections, museums and herbaria community in Australia. It details the biodiversity and related data and services delivered to users with a primary focus on species occurrence records which represent the ALA's primary data type. Finally, the paper explores the ALA's future directions by referencing results from a recently completed national consultation process. © Belbin L et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 2 Belbin L et al Keywords Atlas of Living Australia, biodiversity, research data infrastructure, informatics Introduction The Atlas of Living Australia (ALA) was established in 2010 by the Australian Government's National Collaborative Infrastructure Strategy (NCRIS) to support the needs of the Australian and international research community for comprehensive and timely access to Australian biodiversity data. The ALA is now delivering data and related services to more over 80,000 users a year across research, industry, governments and the public. It supports programmes in taxonomy, biodiversity, genomics and ecosystem science, contributes to major natural resource management programmes and supports the international community as the Australian node of the Global Biodiversity Information Facility (GBIF) and the code base for the successful international Living Atlases community. The ALA was established on open-access principles, with data publishers by default using Creative Commons licences and with an open-source code base. This approach has encouraged re-use and maximised the value of data, especially for data that have been funded, produced or collected by public institutions in Australia. As of February 2021, the ALA holds almost 95 million records associated with more than 111,000 species, predominantly from the Australian region. As a complement to its species data, the ALA also manages a wide range of other categories of data, including information on natural historical collections themselves, spatial layers, indigenous ecological knowledge, taxonomic profiles, biodiversity literature, data on biodiversity projects and animal tracking data. Investment in the ALA and in its partner capabilities (including GBIF and the Living Atlases) has radically enhanced ease of access to biodiversity data. Fundamental to ALA's business has also been the development of tools and platforms to enable different stakeholders to collect, manage and deliver open biodiversity data. Examples include BioCollect for field-based data collection and management, DigiVol to engage volunteers in the digitisation of analogue data, the Australian node of the Biodiversity Heritage Library, ALA's spatial portal and species lists tools. Most recently, the ALA has partnered with the global iNaturalist platform to support citizen scientists in the acquisition and identification of biodiversity observations. Collectively, this portfolio of capability has been fundamental in improving how ALA captures and utilises biodiversity data. This paper provides a history of the ALA including its origins and key drivers, a description of the data and services it delivers and concludes with a summary of the findings from recent stakeholder consultations that will provide information for the ALA's future directions. Although these consultations focused on the ALA, the results offer insights of importance to other national and international biodiversity data infrastructures regarding future trends, stakeholder expectations and limitations around current approaches to delivering biodiversity data and services. The Atlas of Living Australia: History, current state and future directions 3 Biodiversity in Australia As a result of its isolation for around 100 million years and its distinctive environment, Australia's fauna and flora are rich and unique, exhibiting high degrees of endemism. Human influence has led to significant loss and/or transformation of this biodiversity (Woinarski et al. 2019). Environmental challenges and human consumption place unprecedented and increasing stresses on the environment and species. At the global scale, the Intergovernmental Science-Policy Platform on Biodiversity and Ecosystem Services (IPBES 2019) reports rapid losses in biodiversity and ecosystem health, and states that insufficient information exists to monitor and respond to these trends. Biodiversity researchers and managers commonly face the challenge of delivering and interpreting disparate information to answer the greatest environmental questions facing society. For example: ° Signs point to massive losses in insect numbers and diversity, but what is the actual scale of these losses and the implications for ecosystem health? (Braby 2018) ° Australia is home to a number of global biodiversity hotspots, which have preserved unique evolutionary lines through many climatic changes. How are current and expected pressures on these ecosystems likely to modify these areas? ° How can we design effective ecosystem restoration programmes and respond to ecological changes in response to major disturbances, such as the 2019-2020 bushfire season? The Australian Government’s recent Independent Review of the Environmental Protection and Biodiversity Conservation Act Interim Report (Samuel 2020) identified that a ‘quantum shift in the quality of information, accessible data and information available to decision- makers’ is necessary to support future regulatory environmental management programmes. The ALA has an important role to play in supporting these emerging regulatory policy needs of government. Historical information on species distributions and population abundances is central to ecology, conservation and to all areas of environmental planning and sustainability. The ALA is highly regarded for the progress it has made over the last 10 years in significantly improving open access to integrated information about species from previously diverse and isolated sources. This impact has been valued by both national and global research communities. Establishment of the ALA Australian herbaria and museums have a long history tackling the issues of data sharing, standards and collaboration for natural history specimens. Standards development for this work in Australia took place in an international context through involvement of Australian biologists and data scientists in Biodiversity Information Standards (known by the acronym TDWG) from its inception in the mid-1980s (http://old.tdwg.org/about-tdwa/history/). The 4 Belbin L et al botanical community first published the HISPID data standard for herbarium specimens in 1989 (http:/AWwww.anbg.gov.au/projects/hispid/hispid3.html). In the late 1990s, the peak body for herbaria in the region, the Council of Heads of Australasian Herbaria, formed a consortium with the Australian Biological Resources Study (ABRS, hittps://www. environment.gov.au/science/abrs), which held responsibility for coordination of taxonomic research. This consortium sought funding to digitise herbarium specimens in all the state and federal herbarium collections and received AUD$10 million for this purpose from the Australian Government, states and territories and private sources. Its focus was on the capture of herbarium specimen data into electronic databases, with the eventual goal of producing an online resource. The result was Australia’s Virtual Herbarium (now Australasian Virtual Herbarium, AVH, established in 2001). AVH currently provides access to eight million records for specimens of plants, bryophytes and fungi across Australia and New Zealand, both directly and through its connections to the ALA (https://avh.ala.org.au). Internationally, interest in establishment of biodiversity data infrastructures had been growing for more than a decade before the ALA was established. The Biodiversity in World Science Report, published by UNESCO in 1996, identified biodiversity as ‘our most precious “unknown” and made the case for developing better understanding of genes, species and ecosystems (di Castri 1996). In Australia, the Environmental Resources Information Network (ERIN, https:/Awww.environment.gov.au/about-us/environmental- information-data/erin) was launched in 1992 through the former Australian Government Department of the Environment with the first national remit to draw together, supplement and make publicly available biodiversity data from the environmental departments from Australian states and territories. By 1999, the OECD had developed a focus on research infrastructure and emphasised the need for international collaboration. One result was the recommendation by the OECD Megascience Forum Working Group on Biological Informatics in 1999 (OECD Megascience Forum 1999) to establish the Global Biodiversity Information Facility (GBIF, h ttos://gbif.org) to make biodiversity data and information accessible worldwide. GBIF was conceived as an online network funded by membership fees from participating countries, who would contribute data to GBIF through their national ‘nodes’. The Working Group’s recommendations were shaped by the experiences of a handful of key countries in developing systems, based especially on digitised botanical specimens, including Australia through ERIN. In the early 2000s, the zoological community in Australian museums formed a peak body called the Council of Heads of Australian Faunal Collections (CHAFC). This group was also interested in sharing data through a public website. The zoological community had to engage in extensive discussions to overcome philosophical hurdles around open data sharing. In particular, the community was concerned that providing precise locality data for threatened and rare species would encourage poaching and illegal collecting. There were also concerns about protecting the privacy of collectors and donors. Technology fixes were proposed for data sensitivity issues, including denaturing locality data through gridding and excluding some data elements from sharing arrangements. The result was OZCAM -— the Online Zoological Collections of Australian Museums, now a portal in the ALA The Atlas of Living Australia: History, current state and future directions 5 (https://ozcam.ala.org.au). The establishment of the ALA owes much to the prior existence of AVH and OZCAM. The Australian Government commissioned a review of national research infrastructure with the intention of funding new initiatives. The resulting National Collaborative Research Infrastructure Strategy Roadmap in 2006 (Department of Education 2006) outlined 16 areas of priority infrastructure, including Integrated Biological Systems. The Government sought a proposal from the collections community to continue their existing efforts to database animal, plant and microbial collections and aggregate the results into a single online platform. A group of approximately 25 representatives from Australia’s biological collections, with representatives from ABRS, AVH and OZCAM, met at the Australian Museum in Sydney in May 2006. This meeting identified many benefits from an aggregated database, based on existing standards and settled on the concept of the ‘Atlas of Living Australia’. The collections developed a strong case that the data held by collecting institutions should be considered as significant research infrastructure. A submission from the major museums and herbaria was successful and CSIRO was appointed as the contracting agency. The original scope of the ALA The ALA was approved for Australian Commonwealth funding as part of the NCRIS programme starting in 2007. NCRIS established a new generation of Australian national research infrastructures (NRIs) to promote and support world-class research across multiple domains. The rationale for the ALA funding was to enhance access to Australia’s biological collections as an ‘important supporting infrastructure for research relating to models of disease, biosecurity and biodiversity, and supporting quarantine, environmental remediation and management.’ Accordingly, the ALA was established as a partnership between CSIRO (which curates national collections for multiple taxonomic groups), major state and territory museums and herbaria (and the associated national Councils), key university collections, the Australian Biological Resources Study (ABRS, responsible for the national species lists and funding for taxonomic projects) and the pest collections of the Department of Agriculture, Forestry and Fisheries and the Victoria Department of Primary Industries. The announcement in 2007 also created two other NRIs with relevance to biodiversity and the wider work of the ALA: The Integrated Marine Observing System (IMOS: http://imos.org.au/) and the Terrestrial Ecosystem Research Network (TERN: https://www. tern.org.au/). IMOS and TERN were established to support environmental research and data management in the marine and terrestrial spaces. In both cases, the scope included data collection and processing activities that relate to biodiversity composition across space and time. The original NCRIS strategy did not specify how linkages would be formed between ecological datasets and the largely collection-based data of the ALA. The ALA itself was positioned in an Integrated Biological Systems cluster with the Australian Plant Phenomics Facility (APPF: https:/Avwww.plantphenomics.org.au/) and the 6 Belbin L et al Australian Phenomics Network (APN: http://australianphenomics.org.au/). This association was based on all three NRIs delivering integrated data related to target species, and the ALA was initially given responsibility to provide informatics support for APPF and APN. The Australian Biosecurity Intelligence Network (ABIN) was also funded as an NRI to address requirements around all aspects of biosecurity and overlapped with the ALA in the area of observations and collections of pest species. Other investments with scope relevant to the ALA included Bioplatforms Australia (BPA: https://bioplatforms.com/) for "-omics" technologies and the Australian Urban Research Infrastructure Network AURIN: https://aurin.org.au/) which addresses issues around urban environments. Simultaneously with the establishment of these NRIs, a set of cross-domain research data and computing infrastructures were created as the NCRIS Platforms for Collaboration cluster. These included the Australian National Data Service (ANDS) to address issues around data management and storage, and the Australian Research Collaboration Service (ARCS) to support collaborative activities. These two facilities have now merged and evolved into the Australian Research Data Commons (ARDC: https://ardc.edu.au/). All NRIs, apart from ABIN, are still active and have developed in parallel with the ALA. The Australian Government’s approach through NCRIS has been transformational in encouraging collaborative effort throughout the Australian research sector and in delivering a wide range of datasets and tools for use by researchers and the wider community. Some downsides resulted from the simultaneous establishment of all NCRIS infrastructures. The ALA needed to solve many issues in research data management long before Platforms for Collaboration could offer stable and standardised models. This forced the ALA to develop its own approaches to metadata standards, vocabulary services, repository services and GIS functionality. Some of these elements are currently addressed in more standardised ways by ARDC and other infrastructure partners. Similarly, the ALA, TERN and IMOS all had to address their own needs around biodiversity data management before reaching the necessary maturity to interconnect services. The overlap in responsibilities between the ALA and ABIN limited opportunities for the ALA to fully address the issues associated with biosecurity collections. Linkages with APPF and APN absorbed some ALA resources at an early stage, but the three infrastructures shared no use cases and these relationships have weakened over time. If these initiatives had been in a more mature state when the ALA was brought into existence, it would have significantly affected how decisions were made about priorities and investments. NCRIS funding (AUD$8.5 million over the period 2006-2011) gave the ALA the stability to address the following initial scope, using digital content from the partner institutions: ° names and nomenclatural data ° specimen and observational data ° descriptions and descriptive data ° DNA and genetic data ° multimedia The Atlas of Living Australia: History, current state and future directions 7 The role of the ALA, as established under NCRIS, was to build the data integration infrastructure required to support research use of the natural history collections and associated digital assets. It was not funded for the generation of new digital content. In May 2009, as part of a national response to the Global Financial Crisis, the Australian Government announced the Super Science Initiative, a $989 million initiative to build and create research infrastructure, funded through the Education Investment Fund (EIF). As part of the Super Science Initiative, the ALA received an additional AUD$30M to build on and enhance its work through the period 2009-2011. The scale of the funding and the short time period justified a significant expansion of the scope of the ALA's work across five major areas: ° Collection Data Management — tools and services to optimise the data supply chain through Australia’s natural history collections, from field collection through accession, digitisation and web publication. This included support to reinforce tools around key national collection platforms, including AVH and OZCAM. ° Rich Data Stores — shared infrastructure to manage and maintain biodiversity datasets on behalf of Australian institutions and projects, including mirrors or local nodes for the Biodiversity Heritage Library (BHL: https:/Awww.biodiversitylibrary. org/), Morphbank (htips:/Awww.morphbank.net/) and the Barcode of Life Database (BOLD: http:/Awww.boldsystems.org/) and upgrades to the DELTA software for taxonomic identification keys (https:/Awww.delta-intkey.com/). ° Australian National Species Lists — tools, services and expert curation to bring together and complete the Australian National Species Lists, including the Australian Plant Name Index (APNI: https:/www.anbg.gov.au/apni/), the Australian Plant Census (APC: https:/Avww.anbg.gov.au/chah/apc/index.html) and _ the Australian Faunal Directory (AFD: https://biodiversity.org.au/afd/nome), together forming the taxonomic framework for Australian biodiversity data. ° Spatial Data Management — shared models, tools and services to ensure interoperability of all spatial data accessed through the ALA and compatibility with data shared through related NCRIS capabilities (particularly TERN and IMOS). This activity led to the development of the ALA’s Spatial Portal (Belbin 2011, https://spatial.ala.org.au). ° Data Dissemination — web portals and applications to improve delivery of biodiversity data to end-user communities, including conservation and biosecurity stakeholders and citizen scientists. Some of these areas had limited long-term impact, but the EIF funding established the scope still delivered by current ALA data and services. Most significantly, the work on the Australian National Species Lists represented a recognition within NCRIS that core datasets such as these can themselves be regarded as significant national infrastructure. This recognition allowed funding to be directed to taxonomists for contributions to expand the coverage or quality of sections of the national species lists. 8 Belbin L et al The ALA was conscious from its inception of the significant role that citizen science could play in contributing valuable information to Australia’s biodiversity. This undertaking has evolved into ALA projects and capabilities, such as DigiVol and BioCollect, covered below. ALA data and services As of January 2021, the ALA contains nearly 95 million occurrence records of over 111,000 species from a total of over 195,000 species listed in the Australian National Species Lists (https:/Awww.rbg.vic.gov.au/science/projects/taxonomy/atlas-of-living-australia-national- species-lists-project). A total of 84.4% of the records are terrestrial and relate to areas of Australian jurisdiction, 8.8% are marine, while the remaining 6.8% of the records are spread across 270 other countries and dependencies. The ALA contains data from observations of Australian species outside the Australian region and also data on introduced species that can be found in the Australian region. Of the total records, around 72 million are field observations (categorised as human observations in the Darwin Core standard: Wieczorek et al. 2012) and 13.5 million are of preserved specimens. There are 1.7 million machine observation records and we anticipate these will rise as a proportion of total records. The earliest Australian record is from the late 1600s and, on average, thousands of occurrence records are being added daily. Fig. 1 provides an overview of summary metrics describing the various dimensions of the ALA. 27 countries around the world using ALA inf 1,880,920 data taset downloads a | Os 61 eo Mp 371,890 : pag lileratu ed for the diver ie Librar 1,184,255 re tralian Magpie 99% of data G cen), the most in the Creative en), th 919,919 ed by he ALA have es in the ALA Commons licen Figure 1. EES] Summary metrics describing dimensions of the ALA. Real-time data regarding selected metrics is also available at https://dashboard.ala.org.au/. Field observations of species range from single ad hoc sighting records to hundreds of data collections from over 500 institutions that provide data to the ALA. The largest single data provider is Birdlife Australia (https://birdlife.org.au/) with over 15 million records. As noted elsewhere in this paper, the ALA also manages bio-related terrestrial and marine environmental layers, species lists, images, sounds, ecological related projects and over 500,000 location definitions in its gazetteer. The Atlas of Living Australia: History, current state and future directions FS) Over 800,000 specimen labels and 124,000 pages of field notes have been transcribed by over 6000 public volunteers using the ALA’s DigiVol volunteer portal, hosted by the Australian Museum (http://digivol.ala.org.au/). Collecting institutions around the world have an incredible backlog of specimens, images, field notes and archives that are inaccessible because they have not yet been digitised. DigiVol provides a way to harness the power and passion of volunteers to help in the digitisation effort to make more information available to science. Recently, an additional area that can benefit from volunteer input is around automated camera trap data where the task is to identify and tag animals in photographs taken by cameras mounted in the environment. Along with its success at attracting volunteers, DigiVol is an excellent example of making infrastructure meet many different objectives. The wide range of data types requires the ALA to maintain an equally wide range of services to accept, process and expose the data to meet the needs of diverse communities. The landing page of the ALA (Fig. 2, htips://ala.org.au/) provides a simple search for species, datasets and most of the information in the ALA. It is possible to explore species-level information and drill down to any of the occurrence records. A suite of application programming interfaces (APIs) are provided, as well as CSV downloads and an ALA4R environment to support further research using data from the ALA, but outside of the ALA website. The ALA also maintains a suite of portals that support specific communities or specialised data and services, each with a web address of the form .ala.org.au. Examples include hitps://dashboard.ala.org.au/ -— a data dashboard listing dimensions of ALA data holdings and usage; https://biocollect.ala.org.au/ for ecological project management and data collection; https://spatial.ala.org.au/ — a map- and analysis-focused portal for the research community; and htips://lists.ala.org.au/ for lists that group species for any purpose, including threat categories, presence in defined areas or common traits. A more complete list of portals and services can be found at https://www.ala.org.au/sites-and-services/. Atlas of Living Australia —— _——~ \3 i Open access to Australia’s biodiversitydata —— 90,935,517 = 8,466 th palace Feat Explore & contribute Tools M format @ meysten fag aera a terrenewctes earch relat All sites, services & tool the ALA by cat rm lity Project sas $ BloCollect & Search datasets eas Figure 2. EES ALA landing page at htips://ala.org.au/ 10 Belbin L et al The ALA's Biocache (https://biocache.ala.org.au) is a tool that provides an organised view combining and linking specimen data, genetic information, field observations, sampling events, animal tracking data and media collected by diverse stakeholders and arranges these data for search, access, analyses and download through a standardised record structure. The Biocache includes records from museum and herbarium specimens, citizen science observations, field surveys, eDNA studies, literature, remote sensing, electronic tags and machine observations and any other biological research activity that records the occurrence of species in time and space. The ALA is active across three of the tiers for biodiversity informatics identified in the Global Biodiversity Informatics Outlook (GBIO, Hobern et al. 2012), promoting common and consistent approaches around FAIR (findable, accessible, interoperable and reusable) and open access to biodiversity data (Culture tier), supporting biodiversity stakeholders in turning their assets, observations and measurements into digital formats (Data tier) and developing the integrated views required by users to make use of these data (Evidence tier). The ALA supports work in the fourth tier (Understanding tier) by making all these views and services accessible for researchers and decision-makers to apply in their work. FAIR data and software code underpin all ALA products and services. The following sections provide a summary of ALA activities by data type. Each section provides an outline of the nature of the data, how it is processed and how the data are exposed publicly through various portals and tools. Fig. 3 provides a schematic overview of the relationship between data partners, data systems and applications to support ALA users and Table 1 provides URLs for key data described under Sections 4.2 to 4.8. a) Biodiversity data Data ingests from data providers Ko)) 5 Species Dataset metadata . _— _ aro Universities “@® occurrence and data quality (JSON, XML, Species ji Download \ Species and herbaria Ra records assertions WMS, CSV) ay profiles | data , list tool @ National @ Species checklists & Aust Sctence Information Date = © taxonomies (NSL) t Nav 4 O— Osx. CB) rw FE) et) ae Wa pages Environmental Images, sounds, pag os he NGOs and Ciuzen layers videos ommuni science apps We), Search ALA: species information groups Spatial BioCollect x and cocurrences classifications project date Indigeno: eotry knowledg holders Services Occur { Spatial Expl f Expk Biodiversity National Sensitive di (i) yes A beaten ) wo Herttag checklists & service download Ubrary taxonomie: (NSL) image Laye service pr aie’) Use Digital Object as —, on , management identifier (DO!) ‘ |] ashe © alacrgou a te The ALA is ma: ibl 1¢ ALA is made poss ble by eal Aggregated BioCollect Mobile apps Digivol FF: contributions from its partners,is sengeteiee. taxonomic and Oashboord supported by NCRIS, hosted by nr . AM DO is th stralic m ye Ys, x . CSIRO and is the Australia node Application Date prove ZoaTrack 5 hp WNaturaties i Helpdesk of GBIF support 7 ZV > ie metrics *