One of the critical insights we came up with in the PILIN persistent identifier project is: if you want the identifier to persist, it's not enough to just keep updating URLs that the identifier resolves to. You want to record somewhere a piece of metadata, that tells you what the thing identified is—independent of the URLs. That piece of metadata will itself be persistent: it will not be affected by any changes in the service endpoints of your identifier. But it doesn't have to be machine-readable: it can be a description in prose.
- Having that piece of information helps you in disaster recovery. If all your URLs go out the window, you can still use the description to reconstruct how the identifier should resolve (and reformulate the URLs). And you can't really claim persistence if you don't have some kind of disaster recovery.
- Having that piece of information is also critical for archival use of identifiers—after the services resolved to are no longer accessible. (And persistent identifiers should persist longer than the services they had resolved to.)
- Getting to that piece of metadata in itself involves a service, and in itself is a resolution. (That means it can integrate into the current XRDS as a service endpoint.)
- But if you entrust that piece of metadata to a service outside your identifier management system, you are putting persistence at risk.
Let me first illustrate this principle with the technology we used in PILIN, Handle.
info:hdl:102.100.272/0N8J991QH
resolves to the Handle record:
URL: https://www.pilin.net.au
EMAIL: opoudjis@gmail.com
HS_ADMIN: [admin bit masks]
I can update my URLs and Emails as things change, but that's pretty poor information management. If I disappear, and the DNS registration expires, I'm not allowing anyone to reconstruct what the identifier resolved to. If someone's found the Handle
102.100.272/0N8J991QH
on a printout at some point in the distant future (like, say, 5 years), and they find a Handle resolver which gives the information above, they too are none the wiser about what the Handle was supposed to identify. Because the Handle was supposed to be persistent, it has failed.But Handle also provides a DESCription field, which allows you to say what is being identified:
URL: https://www.pilin.net.au
EMAIL: opoudjis@gmail.com
HS_ADMIN: [admin bit masks]
DESC: Website for the PILIN project (Persistent Linking Infrastructure),
funded by the Australian Government to investigate policy and technology
for digital identifier persistence.
That description is at least a fallback if the URL does not get maintained. I'd argue further that the description is the real resolution of the identifier (as PILIN defined resolution this year: information distinctive to the thing identified, differentiating it from all other things). The description actually tells you what is being identified, and it stays the same even if the URL location of the website does not. It gives a persistent resolution of the Handle, which is not constrained by a particular service or protocol.
Moreover, if the description is part of the Handle record, then it will persist so long as the Handle record itself persists. It does not depend on an external agent to guarantee it sticks around. Which is what you want for the metadata that will guarantee the persistence of the Handle.
If on the other hand I put my descriptions in an external service, like http://description-of.org/hdl/102.100.272/0N8J991QH , then I will lose my persistent descriptions if http://description-of.org goes down: I am dependent on http://description-of.org for the long-term persistence of my identifiers. And I should not be dependent: persisting my 102.100.272/0N8J991QH Handle is my responsibility (for which I am accountable), and it's what I set up my identifier management system to do.
Next Post, we run that notion against XRI.
No comments:
Post a Comment