What is an LSID?
The Life Sciences Identifier (LSID) is an I3C and OMG Life Sciences Research (LSR) Uniform Resource Name (URN) specification in progress.
The LSID concept introduces a straightforward approach to naming and
identifying data resources stored in multiple, distributed data stores
in a manner that overcomes the limitations of naming schemes in use
today. Almost every public, internal, or department-level data store
today has its own way of naming individual data resources, making
integration between different data sources a tedious, never-ending
chore for informatics developers and researchers.
By defining a simple, common way to identify and access biologically
significant data, whether that data is stored in files, relational
databases, in applications, or in internal or public data sources, LSID
provides a naming standard underpinning for wide-area science and
interoperability.
A detailed LSID URN naming specification is available at the OMG LSR.
What does an LSID look like?
A LSID conforms to the URN standards defined by the IETF.
Every LSID consists of up to five parts: the Network Identifier
(NID); the root DNS name of the issuing authority; the namespace chosen
by the issuing authority; the object id unique in that namespace; and
finally an optional revision id for storing versioning information.
Each part is separated by a colon to make LSIDs easy to parse.
Here are a few examples:
urn:lsid:pdb.org:1AFT:1
This is the first version of the 1AFT protein in the Protein Data Bank.
urn:lsid:ncbi.nlm.nih.gov:pubmed:12571434
References a PubMed article
urn:lsid:ncbi.nlm.nig.gov:GenBank:T48601:2
Refers to the second version of an entry in GenBank
These LSIDs name and refer to one unchanging data object each.
Unlike the familiar URLs of the World-Wide-Web, LSIDs are location
independent. This means that a program or a user can be certain that
what they are dealing with is exactly the same data if the LSID of any
object is the same as the LSID of another copy of the object obtained
elsewhere.
The problem with URLs is that they always point to a particular web
server (which may not always be in service) and worse, that the
contents referred to by a URL often change - think about your favorite
news URL. For researchers and legal authorities the requirement to be
able to exactly reproduce any observations and experiments based on a
data object means that it is essential that data be uniquely named and
available from many cached sources.
What is a Resolver
An
LSID Resolver is a software system that implements an agreed LSID
resolution protocol in order to allow higher level software to be able
to locate and access the data uniquely named by any LSID URN.
At a minimum this software system usually comprises of two parts
that communicate over a network. The first part is server software
operated by any party that wishes to make data available and that has
assigned LSID names to this data. This party is also known as the LSID
issuing authority. The second part is software that usually executes on
a client that can communicate over a network using an agreed protocol
with the LSID authority server in order to retrieve the data or
metadata associated with a particular LSID instance. A schematic of the
client and server network interaction can be found at the I3C.
The amount of data generated in the Life Sciences field is estimated
to be doubling every month. The general adoption of the LSID naming
specification and an agreed resolution protocol will provide a standard
method for locating and accessing these resources for the industry.
Online Resolver
A basic LSID resolved can be accessed
on this web site. Append the LSID to http://www.lsid.info/,
for example http://www.lsid.info/urn:lsid:marinespecies.org:taxname:138474.
The format of data returned depends on what the issuing authority
supports, and what is requested by your client. Many authorities
only provide machine readable data. In this case, the detailed
resolver may be useful.
The source code for the resolver is available on GitHub.