|
|
 |
 |
 |
 |
Better maps! More bibliographic detail! - Posted: 4/25/2008 Following some excellent suggestions gathered at a recent Encyclopedia of Life meeting, we've made changes to our Google Maps browse interface. To recap, we take Library of Congress Subject Headings and geocode and map them using the Google Maps API ( details here). Now that we're managing nearly 10,000 volumes the standard Google Maps interface was getting cluttered and clunky, so we've refined the interface to show smaller points, weight the results using color, and display links to the titles for a given subject heading within the map itself (as demonstrated for "Africa" above). To view the map in full, visit http://www.biodiversitylibrary.org/browse/map. We made another change based on requests to view full bibliographic details for a scanned title. When we harvest scans from the Internet Archive, we copy the MARCXML for the title to our servers and siphon off just enough of the metadata to facilitate our browse & search capabilities - to pull in the contents of the entire MARCXML would unnecessarily bloat our database with info we don't expect to search across or expose via browse. But, it's important data to have in the display, so we've skinned the MARCXML using XSLTs provided by the Library of Congress. To view in action, click the "Brief|Detailed|MARC" links at http://www.biodiversitylibrary.org/bibliography/1583, or for any title in our collection. Finally, we've enhanced the display for our Discovered Bibliographies to return results in a more performant way, providing more visual feedback to the user that processes are at work. To view the refined interface, visit the result for Pomatomus saltatrix at http://www.biodiversitylibrary.org/name/Pomatomus_saltatrix. |
|
BHL Portal Updates! - Posted: 4/15/2008The BHL portal (http://www.biodiversitylibrary.org) has been updated with the following changes:
- A new option has been added to filter results by the language in which items are published. For example, Titles published in English, or Authors with works published in German. This option complements the pre-existing option to filter results by contributing institution.
- An advanced search page (http://www.biodiversitylibrary.org/advancedsearch.aspx) has been added. This page allows a user to search on any combination of search categories (Titles, Authors, Names, or Subjects), instead of just one or all of the categories. It also allows search results to be limited by the publishing language.
- OCLC numbers associated with each title have been cleaned up. This means that the “Find in a local library” link on each title’s bibliography page should now work correctly.
- Sorting of individual items within a single title has been improved. See the right side of this page (http://www.biodiversitylibrary.org/bibliography/702) for an example. You can see that the volumes are listed in order, v1 to v 92. Prior to the correction, the volumes were sorting as follows: v1, v10, v11, v12, … v2, v20, v21, v22, … v3, v30, v31, v32.
- Call numbers, when available, should now display correctly on bibliography pages.
- Some minor updates have been made to the page that displays the discovered bibliography for a name. An example is http://www.biodiversitylibrary.org/name/Poa_annua. Changes have been made to retrieve data as needed, instead of all at once, which has improved performance greatly. However, there remains a lengthy delay in retrieving large data sets, so we know that more work is needed here. At a minimum, we know that we need to improve the feedback given to the user while large data sets are being retrieved.
Also of note:
Some inconsistencies with title information have been identified. We had believed that the MARC leader assigned to an item would be sufficient to uniquely identify a title. This has turned out to not be the case (affecting about one-half of one percent of the titles we’ve ingested from Internet Archive), so we’ve had to adjust how we identify which items belong to which titles. The cleanup of this data is ongoing. - Mike Lichtenberg |
|
Harvesting Process from Internet Archive - Posted: 3/14/2008NOTE: Internet Archive has changed their query interface and these instructions are no longer valid. We will update this information once their query interface has stabilized.OverviewThe following steps are taken to download data from Internet Archive and host it on the Biodiversity Heritage Library. Diagrams of the process are available in PDF. - Get item identifiers from Internet Archive for items in the "biodiversity" collection that have been recently added/updated.
- For each item identifier:
- Get the list of files (XML and images) that are available for download.
- Download the XML and image files
- Download the scan data if it is not included with the other downloaded files
- Extract the item metadata from the XML files and store it in the import database.
- Extract the OCR text from the XML files and store it on the file system (one file per page).
- For each "approved" item, clean up and transform the metadata into an "importable" format and store the results in the import database.
- Read all data that is ready for import and insert/update the appropriate data in the production database.
Internet Archive Metadata FilesThe following table lists the key XML files containing metadata for items hosted by Internet Archive. It is possible that one or more of these files may not exist for an item. However, most items that have been "approved" (i.e. marked as "complete" by Internet Archive) do include each of these files. | Filename | Description | | _files.xml | List of files that exist for the given identifier | | _dc.xml | Dublin Core metadata. In many cases the data include here overlaps with the data in the _meta.xml file. | | _meta.xml | Dublin Core metadata, as well as metadata specific to the item on IA (scan date, scanning equipment, creation date, update date, status of the item, etc) | | _metasource.xml | Identifies the source of the item… not much meaningful data here | | _marc.xml | MARC data for the item. | | _djvu.xml | The OCR for the item, formatted as XML. | | _scandata.xml | Raw data about the scanned pages. In combination with the OCR text (_djvu.xml), the page numbers and page types can be inferred from this data. This file may not exist, though in most cases it does. For the most part, only materials added to IA prior to late summery 2007 are likely to be missing this file | | scandata.xml | Raw data about the scanned pages. If there is no _scandata file for an item, we look in scandata.zip (via an IA API) for this file, which contains the same information. | Internet Archive ServicesSearch for ItemsInternet Archive items belong to one or more collections. To search a particular Internet Archive collection for items that have been updated between two dates, use the following query: http://www.archive.org/services/search.php ?query={0}+AND+updatedate:[{1}+TO+{2}] &submit=submit
where
{0} = name of the Internet Archive collection; in our case, "collection:biodiversity" {1} = start date of range of items to retrieve {2} = end date of range of items to retrieve
To limit the item search to a particular contributing institution, modify the query as follows: http://www.archive.org/services/search.php ?query={0}+AND+updatedate:[{1}+TO+{2}]+AND+contributor:(MBLWHOI Library) &submit=submit
To limit the results of the query to a particular number of items, modify the query as follows: http://www.archive.org/services/search.php ?query={0}+AND+updatedate:[{1}+TO+{2}] &limit=1000 &submit=submit
To search for one particular item, use: http://www.archive.org/services/search.php ?query={0} &submit=submit
where
{0} = an Internet Archive item identifier
Download FilesTo download a particular file for an Internet Archive item, use the following query: http://www.archive.org/download/{0}/{1}
where
{0} = an Internet Archive item identifier {1} = the name of the file to be downloaded
Downloading Files Contained In ZIP ArchivesIn some cases, a file cannot be downloaded directly, and may instead need to be extracted from a ZIP archive located at Internet Archive. One example of this is the scandata.xml file, which in some cases must be extracted from the scandata.zip file. To do this, two queries must be made. First invoke this query to get the physical file locations (on IA servers) for the given item: http://www.archive.org/services/find_file.php ?file={0} &loconly=1
where
{0} = and Internet Archive item identifier
Then, invoke the second query to extract the scandata.xml file from the scandata.zip file (using the physical file locations returned by the previous query): http://{0}/zipview.php ?zip={1}/scandata.zip &file=scandata.xml
where
{0} = host address for the file {1} = directory location for the file
Note that the second query can be generalized to extract the contents of other zip files hosted at Internet Archive. The format for the query is: http://{0}/zipview.php ?zip={1}/{2} &file={3}.jpg
where
{0} = host address for the file {1} = directory location for the file {2} = name of the zip archive from which to extract a file {3} = the name of the file to extract from the zip archive
Documentation written by Mike Lichtenberg. |
|
On Name Finding in the BHL - Posted: 3/4/2008An important feature of the Biodiversity Heritage Library that sets it apart from other mass digitization projects is our incorporation of algorithms and services to mine taxonomically-relevant data from of the 2.9 million (as of the date of this posting) pages digitized through our partnership with the Internet Archive. These services, including TaxonFinder, developed by partners at uBio.org, allow BHL to identify words in digitized literature that match the characteristics of latin-based scientific names, then verify accuracy of the word or words being a scientific name by comparing them to NameBank, uBio.org's repository of more than 10.7 million recorded scientific names and their variants. The resulting index of names found throughout these historic texts is an incredibly valuable dataset, whose richness and use has just begun development. The massive index and interfaces to it are new (from development to production within 8 weeks), so the BHL Development Team has been gathering feedback from users, evaluating usage statistics, and working with both librarians and scientists to determine what is working with the interface and what needs refinement. The following issues have been identified: 1. Volume and scalabilityBHL currently manages 2.9 million pages in its database, with each page equating to an image & its derivatives stored on a filesystem at the Internet Archive. Using uBio's services, we've located a total of 14.7 million name strings across texts, with 10.4 million of those verified to an entry in NameBank. Scalability quickly becomes an issue as BHL expects to digitize 60 million pages within 5 years. Faced with hundreds of millions of name occurrences, the challenge becomes how to efficiently store and query this dataset. BHL data are currently stored in SQL Server 2005, which can scale to expected volumes and contains tools for load balancing and clustering. Ultimately, though, these issues of volume and scalability are resolvable as the dataset is not excessively complicated in structure. With enterprise-level hardware, optimized code and data access layers, and intelligent cacheing (all of which are currently in use), BHL can efficiently store and provide access to the vast index of scientific names identified through algorithmic means. 2. OCRCommercial Optical Character Recognition (OCR) programs, such as ABBY FineReader or PrimeOCR, work very well for texts printed after the advent of industrialized and standardized printing techniques (loosely since the late 1800's). Unfortunately the OCR programs are considerably less accurate on texts that match the characteristics of much of what BHL is scanning, including texts printed with irregular typeface and typesetting, and texts printed in multiple languages, including Latin. The impact here is that if the texts are not accurately recognized, the names contained within can't be identified. The accuracy of the OCRed text is therefore incredibly important, and unfortunately nearly impossible to improve through automated means as OCR technology has not really changed much since the mid-1980's. Alternatives such as offshore rekeying or volunteer text conversion through the Distributed Proofreaders or other crowdsourcing projects are either prohibitively expensive or would require enormous effort above and beyond what could be volunteered given BHL's estimated page count. BHL is not alone in facing this problem; every initiative that OCRs historic texts has encountered this unfortunate gap in accuracy. If you are aware of any new efforts to improve OCR, please use the comment form below. 3. False positives As BHL was indexing botanical texts repeated occurrences of "Ovarium" were being located; an unusual result as Ovarium is both an echinoderm (marine invertibrate) as well as a term used in botany to describe the lower part of the pistil or female organ of the flower. After reviewing the page occurrences it became clear that the TaxonFinder algorithm was accurately identifying a word and making a match to an entry in NameBank, but in this case the context was off. In nearly every entry, the word "ovarium" was not used to describe the marine invertebrate, but rather to describe the form of a flower in a taxonomic description. Similar false positives exist, such as Capsula and Fructus. Upon further review the problem is most prevalent with names used at higher classification levels; results for "Genus species", such as Carcharodon carcharias (Great white shark) are much less likely to be false positives. Clearly more evaluation is needed to understand the true magnitude of the problem, hopefully resulting in refinement of the TaxonFinder algorithm. 4. UsabilityGregory Crane of Tufts University asked, in an oft-cited paper, " What Do You Do With a Million Books?" The challenge facing BHL Developers (and users) is more along the lines of "What do you do with 19,000 pages containing Hymenoptera?" Because the BHL names index is growing rapidly, the methods of viewing and filtering results in a meaningful way becomes challenging. It's clear that a user isn't going to manually sift through and review every one of those pages. We can facilitate downloading the results in standard forms for reference management software, such as Zotero or EndNote, but how does BHL introduce relevancy rankings or other metrics for refining results - what exactly defines relevancy for occurrences of a name throughout scientific literature? 5. Accuracy and completenessAnd now for a reality check. BHL text will never be 100% accurate, and our names index will never be 100% complete. We're using automated software and services to process the millions of pages in the BHL collection because to do anything but an automated analysis simply won't scale. The names index and the services that support its creation and display are modular - should radically new character or word recognition software come along, the scanned images can be reprocessed and reindexed using TaxonFinder. And should a better taxonomic name finding algorithm emerge, it can replace TaxonFinder in our application. As technologies emerge to improve text transcription and indexing, BHL will evaluate them and deploy them with our app is they prove effective. Future workIt's clear that we've identified enhancements needed in TaxonFinder to reduce the number of false positives. How best to implement those enhancements is yet to be determined, but at least we have data to guide us. We also plan to enhance the interface used for the discovered bibliographies, as the current implementation is not performant for large result sets. Further, we expect to facilitate downloading of the results in a standard format, such as BibTeX. In closing, BHL is currently employing emerging technologies to transcribe and index a large collection of digitized scientific literature, and providing innovative interfaces into the data mined from it. These interfaces are rapidly evolving to meet user needs, based on user feedback, so if you have a suggestion for improvement please provide it via our Feedback form or on the comments below. - |
|
A Leap for All Life: BHL & EOL - Posted: 3/2/2008The Biodiversity Heritage Library and the Encyclopedia of Life shared a table at the Congressional Family Night held at the Smithsonian's National Museum of Natural History. The event (March 1, 2008) showcased a wide range of scientific endeavors engaged in by Smithsonian staff and was attended by members of Congress, their staff, and families. Here, Cristián Samper, Acting Secretary of the Smithsonian and EOL steering committee member looks on as Gil Taylor (Smithsonian Institution Libraries) and Dawn Mason (EOL) demonstrate the recently launched EOL species pages. |
|
Major updates to BHL Portal released - Posted: 2/25/2008BHL developers have released several significant updates to the BHL portal today. These updates include: - Display of materials scanned by Internet Archive. BHL now manages more than 2.8 million pages from 7,500 digitized scientific texts. To stay updated on new titles, view our Recent Additions and subscribe to our feeds.
- Filtering by Contributing Library. When users select "Browse By:" functions, they can filter results using the "For:" dropdown to view, for example, Authors from the New York Botanical Garden, or a Map of titles scanned by Smithsonian Institution Libraries, or Titles from All Contributors.
- Feedback tracking. Users can submit feedback or comments on records using the Feedback link at the top of the portal.
For a complete list of bugs and enhancements included in this release, visit our issue tracking web site. |
|
Happy Birthday Mr. Darwin! - Posted: 2/11/2008“The cultivation of natural science cannot be efficiently carried on without reference to an extensive library.” (1) - Charles Darwin, et al (1847) Today, February 12, 2008, we celebrate the 199th anniversary of the birth of Charles Darwin. Last year we honored the 300th anniversary of the birth of Carl Linné and next year will be the double celebrations for Darwin's bicentenary and the sesquicentennial (mark your calenders now for November 24th!) of the publication of On the Origin of Species. 2008 is thus a good year for those of us involved with the Biodiversity Heritage Library (BHL) to pause for a moment between these landmark anniversary years of 2007 and 2009. Those working in systematics and taxonomy are heavily dependent on the historic literature – to a greater extent than perhaps most of the sciences. This importance of the literature, as well as the ongoing importance of publication (and library deposit) to validate taxonomic concepts, contribute to the mission and continue to inform the day to day development of the BHL. Darwin himself acknowledged the importance of library materials to the study of natural history in the passage quoted above (in a document signed by Darwin and over 30 other notables including Charles Lyell, W.J. Hooker, and Richard Owen) which was part of an appeal for support of natural history research at the British Museum. - Martin Kalfatovic Portrait of Charles Darwin by Ernest Edwards From Scientific Identity: Portraits from the Dibner Library of the History of Science and Technology. Smithsonian Institution Libraries
(1) Darwin, C. R. et al. 1847. Copy of Memorial to the First Lord of the Treasury [Lord John Russell], respecting the Management of the British Museum. Parliamentary Papers, Accounts and Papers 1847, paper number (268), volume XXXIV.253 (13 April): 1-3. [Complete Works of Charles Darwin Online] |
|
BHL part of the "Biological Moon Shot" - Posted: 2/4/2008Thomas Garnett of the Smithsonian's National Museum of Natural History heads a scanning and digitization group of encyclopedia workers. They are cooperating with the Biodiversity Heritage Library, a project through which 10 major libraries are scanning and placing on the Web pages from volumes that describe species. Some 80 million pages come from publications old enough to be in the public domain, and the scanners are starting with those. The Feb. 2, 2008 issue of Science News includes an article by Susan Milius (" Biological Moon Shot") on the Encyclopedia of Life and the Biodiversity Heritage Library. BHL member staff Tom Garnett and Martin Kalfatovic are quote in the article. In talking about the vital business of opening library resources to far-flung scientists, Garnett rolls his eyes at the mention of a specialized source for historians of science that has become one of the library's most popular downloads—the 1904 treatise Ants and Some Other Insects: An Inquiry Into the Psychic Powers of These Animals. |
|
BHL presentation at the National Agriculture Library - Posted: 1/30/2008Smithsonian Institution Libraries staff members Martin Kalfatovic and Suzanne Pilsk gave a presentation on BHL to staff from the National Agriculture Library, the USDA Agriculture Research Service, the NASA Goddard Space Flight Center, and others. - The Biodiversity Heritage Library. Martin R. Kalfatovic and Suzanne C. Pilsk. National Agriculture Library: Issues and Answers Seminar. January 30, 2008. Beltsville, MD.
 |
|
"How's THAT for a tag cloud?!" - Posted: 1/24/2008The NC State Insect Museum blog gave the Biodiversity Heritage Library Finally got around to perusing my December, 2007 issue of Systematic Biology and saw this article by Godfray et al. about taxonomy and the Web. The authors provide nice summaries of emerging, alternative strategies for tackling the biodiversity and bioinformatics crises: CATE, uBio, DiGIR (to be replaced by TAPIR soon?), GBIF, Biodiversity Heritage Library initiative (how's THAT for a tag cloud?!), ZooBank, TDWG, iSpecies, and Wikispecies (my least favorite; at least I am not yet totally convinced that this is a good model for taxonomy). I find it curious that the Encyclopedia of Life was barely mentioned (and never by name) in that article, especially given its high profile and funding level. I'll have to remember to link some of these projects to our museum page, as we will undoubtedly be exploiting these resources and techniques to expose the data housed within our cabinets. - Insect Museum blog
|
|
Senior Programmer needed to assist BHL development - Posted: 1/4/2008The Missouri Botanical Garden (MOBOT), located in St. Louis, MO, is seeking to hire a Senior Programmer Analyst to work on several large biodiversity informatics projects, including the Biodiversity Heritage Library (BHL) online at www.biodiversitylibrary.org. Primary responsibilities for this position include leading the development effort for MOBOT's LAMP-based applications, complementing the existing .Net team. Up first on the development schedule is the instantiation of Fedora ( www.fedora-commons.org) at MOBOT as a repository layer in our multi-platform, SOA-based infrastructure, then refactoring applications and building new ones to utilize Fedora. Future projects include enhancement of the BHL GUI and development of tools for managing digital library content. Qualifications include a BS in Computer Science or related field, 5 years experience developing enterprise-level applications, and 2 years experience leading a development team. Experience managing data and applications in an open source environment (LAMP and its variants) required. Experience managing biodiversity and/or library datasets preferred, but not required. To apply online, please visit: http://www.mobot.org/jobs/mbgjobs.asp#H005 |
|
Biodiversity Heritage Library - Europe - Posted: 1/2/2008The BHL (http://www.biodiversitylibrary.org/About.aspx) currently consists of English language collections from the USA and UK (although we have huge amounts of material in over 40 other languages). I am working with European colleagues to develop a programme of activity in Europe to cover the other European languages. German and Netherlands colleagues are already working on bids and trial scanning. We are preparing a bid to the EU eContentplus programme for money to manage these activities across Europe (unfortunately, the EU will not fund scanning directly) and this will be lead by the Museum für Naturkunde, Berlin. We are currently looking for partners to join the eContentplus bid - in particular, we are looking for institutions with substantial collections of biodiversity literature, experts in scanning and digitisation, and researchers interested in OCR (optical character recognition) technologies. If you are interested in joining us, please contact me on g.higley@nhm.ac.uk. Graham Higley Wednesday, January 2nd, 2008 |
|
Eggplant Leafroller Moth reared on Potatoes - Posted: 12/18/2007The following page from Biologia Centrali Americana, Insecta Lepidoptera-Heterocera v. 4 shows an interesting example of a proximity search we'd like to support with BHL Name Services - "find species x within n characters/words of species y." http://www.biodiversitylibrary.org/page/593637Halfway through the entry for Lineodes integra you'll see a character that looks like a crosshair, followed by " Solanum spp. 4-5,8, S. radula4-5, S. jasminifolium4-5, S. tuberosum (=Potato) 8." According to Wolfram Mey, the leading lepidopterist of the Museum of Natural History (MfN), Berlin: The symbol means that the species has been reared from/on the particular plant. The symbol has been in use particularly by the old British authors, particularly Lord Walsingham, and is also used on the labels attached to the specimens. (translation by Dr. Michael Ohl) What this tells us is Lineodes integra (Eggplant Leafroller Moth) is reared on a variety of Solanum species, including Solanum tuberosum (Potato). This example was uncovered during a Name search for Solanum tuberosum; the resulting bibliography included a link to this volume on insects from the Biologia Centrali-Americana, which seemed unusual given the search was for a plant species. This demonstrates why we'd want to facilitate proximity searches, so that users could find pages where both Lineodes integra and Solanum tuberosum occurred to aid in the discovery of predator-prey, plant-pollinator, or other coevolutionary relationships. This example also suggests that our OCR algorithms are woefully inadequate to infer these kinds of relationships through automated means; the crosshair symbol was identified as ©. -Chris Freeland |
|
| "An amazing service ... " - Posted: 12/6/2007 |
|
BHL Name Services v.1.0 released - Posted: 12/5/2007Name ServicesLast updated: January 28, 2008 Mike Lichtenberg Overview The name services are XML-based web services that can be invoked via SOAP or HTTP GET/POST requests. Responses can be received in one of three formats: XML wrapped in a SOAP envelope, XML, or JSON. If you want to use SOAP to invoke the service methods, you can navigate to http://www.biodiversitylibrary.org/services/name/NameService.asmx to view the available methods. From that page, you can view the WSDL document for the web service, or click on each method to see detailed information about invoking the method and about the data that is returned. If you are using HTTP to invoke the methods, the services are located at http://www.biodiversitylibrary.org/services/name/NameService.ashx. Note the difference in the extension on the service URL: ASHX for HTTP vs. ASMX for SOAP. Descriptions of each service, as well as more details on invoking the methods via HTTP follow. Methods NameCount Returns the number of unique confirmed names in the BHL database. If the optional start and end dates are specified, then only names added or updated between the dates are counted. RequestsSOAP: NameCount() NameCountBetweenDates(“01/01/2008”, “01/31/2008”) HTTP returning XML: http://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameCount&format=xmlhttp://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameCount&startDate=01/01/2008&endDate=01/31/2008&format=xml HTTP returning JSON (with and without a user-specified callback): http://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameCount&format=jsonhttp://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameCount&startDate=01/01/2008&endDate=01/31/2008&format=jsonhttp://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameCount&format=json&callback=MyCallbackhttp://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameCount&startDate=01/01/2008&endDate=01/31/2008&format=json&callback=MyCallbackResponsesXML: <?xml version="1.0" encoding="utf-8" ?> <NameResponse xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd=" http://www.w3.org/2001/XMLSchema"> <Status>ok</Status> <NameResult>436445</NameResult> </NameResponse> JSON: { "Status":"ok", "ErrorMessage":null, "NameResult":436445 } These responses show that there are 436445 unique names. NameList Returns a list of unique names from the BHL database. There are two required parameters. “startRow” identifies the first name to return, and “batchSize” indicates how many names to return. The maximum allowed “batchSize” is 1000. Optionally, “startDate” and “endDate” parameters can also be specified. If the dates are specified, then only names added or updated between the dates are returned. Each of the following request and response examples assumes a startRow value of 1 and a batchSize value of 5. RequestsSOAP: NameList(“1”, “5”) NameListBetweenDates(“1”, “5”, “01/01/2008”, “01/31/2008”) HTTP returning XML: http://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameList&startRow=1&batchSize=5&format=xmlhttp://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameList&startRow=1&batchSize=5&startDate=01/01/2008&endDate=01/31/2008&format=xmlHTTP returning JSON (with and without a user-specified callback): http://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameList&startRow=1&batchSize=5&format=jsonhttp://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameList&startRow=1&batchSize=5&startDate=01/01/2008&endDate=01/31/2008&format=jsonhttp://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameList&startRow=1&batchSize=5&format=json&callback=MyCallbackhttp://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameList&startRow=1&batchSize=5&startDate=01/01/2008&endDate=01/31/2008&format=json&callback=MyCallbackResponsesXML: <?xml version="1.0" encoding="utf-8" ?> <NameResponse xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd=" http://www.w3.org/2001/XMLSchema"> <Status>ok</Status> <NameResult> <Name> <NameBankID>3456919</NameBankID> <NameConfirmed>Aalius</NameConfirmed> </Name> <Name> <NameBankID>8498321</NameBankID> <NameConfirmed>Aamia</NameConfirmed> </Name> <Name> <NameBankID>1803753</NameBankID> <NameConfirmed>Aaronsohnia</NameConfirmed> </Name> <Name> <NameBankID>4053043</NameBankID> <NameConfirmed>Ababactus</NameConfirmed> </Name> <Name> <NameBankID>240834</NameBankID> <NameConfirmed>Abacina</NameConfirmed> </Name> </NameResult> </NameResponse> JSON: { "Status":"ok", "ErrorMessage":null, "NameResult":[ { "NameBankID":3456919, "NameConfirmed":"Aalius", "Titles":null }, { "NameBankID":8498321, "NameConfirmed":"Aamia", "Titles":null }, { "NameBankID":1803753, "NameConfirmed":"Aaronsohnia", "Titles":null }, { "NameBankID":4053043, "NameConfirmed":"Ababactus", "Titles":null }, { "NameBankID":240834, "NameConfirmed":"Abacina", "Titles":null } ] } Calling this method repeatedly, you can parse the entire list of names. Here is an example of how that might be accomplished: x = 1; numberOfNames = BHLService.NameCount(); while (x <= numberOfNames) { // Get the next 1000 names Names = BHLService.NameList(x, 1000); … do something with Names… x += 1000; } In this example, “BHLService.NameCount()” and “BHLService.NameList()” represent calls to the Name Service methods. Implementation details for these will vary depending on the toolset (PHP, Java, .NET or other) and method (SOAP or HTTP) used to interact with the web service. NameSearch Returns a list of names that match exactly or start with the specified name. The required “name” parameter identifies the name for which to search. Results are limited to the first 100 matches. Each of the following request and response examples assumes a name search for “zea mays”. RequestsSOAP: NameSearch("zea mays") HTTP returning XML: http://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameSearch&name=zea+mays&format=xmlHTTP returning JSON (with and without a user-specified callback): http://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameSearch&name=zea+mays&format=jsonhttp://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameSearch&name=zea+mays&format=json&callback=MyCallbackResponsesXML: <?xml version="1.0" encoding="utf-8"?> <NameResponse xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd=" http://www.w3.org/2001/XMLSchema"> <Status>ok</Status> <NameResult> <Name> <NameBankID>3875305</NameBankID> <NameConfirmed>Zea mays</NameConfirmed> </Name> <Name> <NameBankID>5416258</NameBankID> <NameConfirmed>Zea mays ceratina</NameConfirmed> </Name> <Name> <NameBankID>5416273</NameBankID> <NameConfirmed>Zea mays convar. ceratina</NameConfirmed> </Name> <Name> <NameBankID>5416702</NameBankID> <NameConfirmed>Zea mays convar. mays</NameConfirmed> </Name> <Name> <NameBankID>5416216</NameBankID> <NameConfirmed>Zea mays subsp mays</NameConfirmed> </Name> <Name> <NameBankID>5416216</NameBankID> <NameConfirmed>Zea mays subsp. mays</NameConfirmed> </Name> </NameResult> <NameResponse> JSON: { "Status":"ok", "ErrorMessage":null, "NameResult":[ { "NameBankID":3875305, "NameConfirmed":"Zea mays", "Titles":null }, { "NameBankID":5416258, "NameConfirmed":"Zea mays ceratina", "Titles":null }, { "NameBankID":5416273, "NameConfirmed":"Zea mays convar. ceratina", "Titles":null }, { "NameBankID":5416216, "NameConfirmed":"Zea mays subsp mays", "Titles":null }, { "NameBankID":5416216, "NameConfirmed":"Zea mays subsp. mays", "Titles":null } { "NameBankID":5416232, "NameConfirmed":"Zea mays tunicata", "Titles":null } ] } NameGetDetail Returns the publication details for the specified NameBankID. The required “nameBankID” parameter identifies the NameBankID for which to retrieve publication details. RequestsSOAP: NameGetDetail("4906323") HTTP returning XML: http://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameGetDetail&nameBankID=4906323&format=xmlHTTP returning JSON (with and without a user-specified callback): http://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameGetDetail&nameBankID=4906323&format=jsonhttp://www.biodiversitylibrary.org/services/name/NameService.ashx?op=NameGetDetail&nameBankID=4906323&format=json&callback=MyCallbackResponsesXML: <?xml version="1.0" encoding="utf-8"?> <NameResponse xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd=" http://www.w3.org/2001/XMLSchema"> <Status>ok</Status> <NameResult> <NameBankID>4906323</NameBankID> <NameConfirmed>Ternatea</NameConfirmed> <Titles> <Title> <TitleID>340</TitleID> <MarcBibID>b11931073</MarcBibID> <PublicationTitle>Bulletin of the Torrey Botanical Club.</PublicationTitle> <PublicationDetails>New York : Torrey Botanical Club, 1870-</PublicationDetails> <BPH>284.15</BPH> <TL2>8194</TL2> <Abbreviation>Bull. Torrey Bot. Club</Abbreviation> <TitleUrl> http://www.biodiversitylibrary.org/title/b11931073</TitleUrl> <Items> <Item> <ItemID>8004</ItemID> <BarCode>21753000029560</BarCode> <MarcItemID>i12323901</MarcItemID> <CallNumber>QK1 .B9673</CallNumber> <VolumeInfo>1899 v. 26</VolumeInfo> <ItemUrl> http://www.biodiversitylibrary.org/item/21753000029560</ItemUrl> <Pages> <Page> <PageID>710633</PageID> <Year>1899</Year> <Volume>26</Volume> <Issue>12</Issue> <Prefix>Page</Prefix> <Number>658</Number> <PageUrl> http://www.biodiversitylibrary.org/page/710633</PageUrl> <ThumbnailUrl> http://images.mobot.org/viewer/viewerthumbnail.asp?cat=botanicus7&client=b11931073/21753000029560/jp2&image=21753000029560_0774.jp2</ThumbnailUrl> <ImageUrl> http://images.mobot.org/viewer/vieweronly.asp?cat=botanicus7&client=b11931073/21753000029560/jp2&image=21753000029560_0774.jp2</ImageUrl> <PageTypes> <PageType> <PageTypeName>Text</PageTypeName> </PageType> <PageType> <PageTypeName>Index</PageTypeName> </PageType> </PageTypes> </Page> </Pages> </Item> </Items> </Title> </Titles> </NameResult> </NameResponse> JSON: { "Status":"ok", "ErrorMessage":null, "NameResult": { "NameBankID":4906323, "NameConfirmed":"Ternatea", "Titles":[ { "TitleID":340, "MarcBibID":"b11931073", "PublicationTitle":"Bulletin of the Torrey Botanical Club.", "PublicationDetails":"New York : Torrey Botanical Club, 1870-", "Author":null, "BPH":"284.15", "TL2":null, "Abbreviation":"Bull. Torrey Bot. Club", "TitleUrl":" http://www.biodiversitylibrary.org/title/b11931073", "Items":[ { "ItemID":7997, "BarCode":"31753002261557", "MarcItemID":"i12323834", "CallNumber":"QK1 .B9673", "VolumeInfo":"1892 v. 19", "ItemUrl":" http://www.biodiversitylibrary.org/item/31753002261557", "Pages":[ { "PageID":653636, "Year":"1892", "Volume":"19", "Issue":"2", "Prefix":"Page", "Number":"56", "PageUrl":" http://www.biodiversitylibrary.org/page/653636", "ThumbnailUrl":" http://images.mobot.org/viewer/viewerthumbnail.asp?cat=botanicus6&client=b11931073/31753002261557/jp2&image=31753002261557_0083.jp2", "ImageUrl":" http://images.mobot.org/viewer/vieweronly.asp?cat=botanicus6&client=b11931073/31753002261557/jp2&image=31753002261557_0083.jp2", "PageTypes":[ { "PageTypeName":"Text" } ] } ] }, { "ItemID":8004, "BarCode":"21753000029560", "MarcItemID":"i12323901", "CallNumber":"QK1 .B9673", "VolumeInfo":"1899 v. 26", "ItemUrl":" http://www.biodiversitylibrary.org/item/21753000029560", "Pages":[ { "PageID":710633, "Year":"1899", "Volume":"26", "Issue":"12", "Prefix":"Page", "Number":"658", "PageUrl":" http://www.biodiversitylibrary.org/page/710633", "ThumbnailUrl":" http://images.mobot.org/viewer/viewerthumbnail.asp?cat=botanicus7&client=b11931073/21753000029560/jp2&image=21753000029560_0774.jp2", "ImageUrl":" http://images.mobot.org/viewer/vieweronly.asp?cat=botanicus7&client=b11931073/21753000029560/jp2&image=21753000029560_0774.jp2", "PageTypes":[ { "PageTypeName":"Text" }, { "PageTypeName":"Index" } ] } ] } ] } ] } } |
|
'Discovered Bibliographies' through Natural Language Processing algorithms - Posted: 11/21/2007"Names, especially those ascribed to organisms, serve as a primary entry point into the scientific, medical, and technical literature..."- Garrity, Lyons, 2003, Future Proofing Biological Nomenclature A characteristic of the Biodiversity Heritage Library (BHL) that distinguishes it from other mass digitization projects is the incorporation of service-based algorithms to identify scientific name strings throughout digitized content. These 'taxonomically intelligent' services, powered by uBio.org's TaxonFinder and NameBank, have been incorporated into the BHL Portal to provide names-based interfaces into taxonomic literature. To begin a search, visit http://www.biodiversitylibrary.org/NameSearch.aspx, or view an example 'discovered bibliography' for Tapirus bairdi (Baird's Tapir), including an illustration, at http://www.biodiversitylibrary.org/name/Tapirus_bairdi. The ability to generate these 'discovered bibliographies' for taxa will enable users to data mine taxonomic literature for references and resources in ways not previously possible. How it worksEach digitized page image in BHL has an accompanying OCR text file. As users navigate to a page, the uncorrected OCR file is sent to uBio's TaxonFinder, which identifies text strings that match the characteristics of Latin binomials. Those potential name strings are then compared to the 10.7 million+ names in uBio's NameBank, and the results, both matched and unmatched, are stored in the BHL database. BHL also has automated processes to reindex pages at regular intervals since NameBank is a growing repository. What we've foundAs of 20 Nov 2007 more than 6.8 million potential name strings have been identified throughout the BHL corpus, with more than 3.8 million matched to a corresponding NameBank identifier. There are more than 431,000 unique names within that 3.8 million set. Of those, more than 156,000 are known by a single occurrence. These results will be evaluated more thoroughly in the coming months to determine potential errors such as false positives and how to refine the TaxonFinder algorithm to reduce them. Caveat: These results are generated from uncorrected OCR, which range in quality from pretty good (contemporary publications, such as modern issues of Rhodora) to downright terrible (18th century Latin texts, such as Species Plantarum). Again, further evaluation is required to determine the full scope of this problem. Where we're headedTo see a simple example of how this can be used from external sites, check out the 'External Links' at the bottom of the Wikipedia article for Mimosa pudica L., the sensitive plant: http://en.wikipedia.org/wiki/Mimosa_pudicaUp next is development of a service layer on top of the names index so that other application providers can query & display 'discovered bibliographies' within their own applications. This service will be deployed in early 2008. These services are now available for use. Chris Freeland chris.freeland (at) mobot (dot) org |
|
Three Hundred Years of Linnaean Taxonomy - Posted: 11/13/2007 The Smithsonian's National Museum of Natural History hosted a day long symposium to celebrate 300 years of Linnaean taxonomy. In addition to the symposium, the museum featured an exhibition of a 1st Edition of Linnaeus' Systema Naturae. The exhibition, "A Tribute to Carl Linnaeus, 1707-1778" (November 13-14) features the author's own copy of Systema Naturae (courtesy of the Swedish Embassy), with illustrations by Georg Dionysius Ehret. At the evening reception, the Biodiversity Heritage Library displayed the online version of the 1758 edition of Systema (from the Missouri Botanical Garden Library) and there was also an appearance by Linnaeus [as envisioned by Hans Odöö]. The Linnaean Systema was previously on display at the LuEsther T. Mertz Library of the New York Botanical Garden (November 8-10). |
|
Flora, Fauna, and Fine Books - Posted: 11/9/2007 The latest issue of Fine Books & Collections (November/December 2007) includes an excellent article on the Biodiversity Heritage Library. "Flora and Fauna: Creating a Global Library of Life, One Digital Page at a Time" by Rebecca Rego Barry and Scott Brown is an excellent overview of the BHL project. Interviews with Doug Holland (Director, Missouri Botanical Garden Library), Graham Higley (Chair, BHL board and Head of Library and Information Services, Natural History Museum, London), and Tom Garnett (BHL Director). Bibliophiles will enjoy the illustration from Herbarius (1484) from the Missouri Botanical Garden (MBG) Library collections. Doug Holland also steps out from behind his desk to pose with a portion of the MBG rare book collection. - Martin Kalfatovic |
|
BHL Portal, an early review - Posted: 11/7/2007This site displays an elegantly designed simplicity that the web developer in me finds irresistible. It’s a marketing nightmare, but a researcher’s dream - a system to quickly and easily find the information you want with minimal distractions. Is this any way to run an archive? You bet it is!
The above quote is from the " Family Matters" blog, a site dedicated to family historians. And now, it's not discussing the latest and greatest genealogy site, but rather the Biodiversity Heritage Library portal. This is an encouraging notice - not only for the fact that the outstanding usability and functionality that Chris Freeland and his team at the Missouri Botanical Garden is building into the site - but also for the fact that the BHL literature will have uses beyond the taxonomic community in areas we can't even think of at this time! - Martin Kalfatovic |
|
More BHL material online - Posted: 10/27/2007 The Biodiversity Heritage Library is currently scanning material five locations around the world. As materials are scanned, they are deposited directly into the Internet Archive repository. The links below will provide you with feed information on materials from the individual scanning centers as it becomes available: Internet Archive scanned materials will eventually be processed for delivery through the BHL portal site |
|
BHL Presentation at LITA National Forum - Posted: 10/27/2007Suzanne C. Pilsk and Martin R. Kalfatovic (Smithsonian Institution Libraries) made an hour long presentation, "The Biodiversity Heritage Library Mass Digitizing Project: A Grandeur in this View of Digital Libraries", on the BHL at the LITA National Forum held in Denver, Colorado, October 6, 2007. |
|
BHL members attend Open Content Alliance in San Francisco - Posted: 10/27/2007 Eight BHL member staff attended the 3rd Open Content Alliance meeting held in San Francisco on October 17, 2007. In addition to the main meeting, BHL member staff took to time to arrange a number of technical meetings with Internet Archive staff, the development team of the OpenLibrary.org project, and others in the Bay Area. The BHL was also featured in the Open Content Alliance article that appeared "above the fold" on the front page of the New York Times (October 22, 2007): " Libraries Shun Deals to Place Books on the Web" |
|
New name finding functionality - Posted: 10/16/2007We've released new functionality in BHL to allow users to search across all the scientific names we've indexed throughout our digital library and view a bibliography of occurrences - what we're tentatively calling a "discovered bibliography". To view in action, begin here: http://www.biodiversitylibrary.org/NameSearch.aspxYou can search by any taxonomic name, such as Poa annua, or Poaceae, to return results. Next steps will be to allow users to search for a taxonomic name & return results for it and its synonyms, or taxa below. But before implementing that advanced functionality, we'd like to make sure this works well for a single taxonomic name. Please give it a try and leave comments below. --Chris Freeland |
|
2 Million pages! - Posted: 9/13/2007 BHL member libraries have approached the 2 million page mark on digitized taxonomic literature. In addition to the 717,000 pages hosted on the BHL portal and approximately 800,000 pages currently hosted on member library websites, there are now nearly 400,000 pages available through the Internet Archive. Work is underway to integrate all pages within the BHL portal with applied taxonomic intelligence. Internet Archive scanning operations are underway at four locations: - Natural History Museum (London): 247,357 pages
- Boston Library Consortium (Boston Public Library): 135,482 pages
- University of Illinois, Urbana-Champaign (Fieldiana and others): 12,638 pages
- Smithsonian Institution Libraries: 12,514 pages
|
|
BHL members gather at Missouri Botanical Garden - Posted: 9/13/2007 September 12-13, 2007. St. Louis. Twenty-six BHL member staff gathered at the Missouri Botanical Garden to discuss technical and organizational meetings. BHL members were joined by Jim Edwards (Encyclopedia of Life), Betsy Kruger (University of Illinois, Urbana-Champaign), and Robert Miller (Internet Archive). Right: Dr. Peter Raven (Director, Missouri Botanical Garden), welcomes the BHL to the Garden. |
|
 |
 |
 |
 |
|
 |