2008-11-19

XRI, Handle, and persistent descriptors, Pt 2

(Back to Pt 1)

Let's now look at our favourite XRI, =drummond. If I retrieve the XRDS for =drummond, through the resolution service http://xri.net/=drummond?_xrd_r=application/xrds+xml, I get (at current writing!)

  • a canonical (and persistent!) i-number corresponding to the i-name =drummond, =!F83.62B1.44F.2813
  • A Skype call service endpoint
  • A Skype chat service endpoint
  • A contact webpage service endpoint
  • A forwarding webpage service endpoint
  • An OpenID signon endpoint


The XRDS does not anywhere say what =drummond is identifying; just some services associated with =drummond contingently. I could infer Drummond's full name from the Skype username being drummondreed, but that's hardly failsafe. What I would like is access to some text like...

VP Infrastructure at Parity Communications (www.parity.inc), Chief Architect to Cordance Corporation (www.cordance.net), co-chair of the OASIS XRI and XDI Technical Committees (www.oasis-open.org), board member of the OpenID Foundation (www.openid.net) and the Information Card Foundation(www.informationcard.net), ...


Oh, as in the contact webpage that http://xri.net/=drummond resolves to, http://2idi.com/contact/=drummond . Well, yes, but I did not know ahead of time that the contact webpage would have the information I wanted, with enough bio information to differentiate Drummond from other candidates: it's a contact page, not a bio page. (Drummond providing bio info is a lagniappe, which simply proves he knows about identity issues.)

What I want is some consistent way of getting from =drummond to a description of what =drummond identifies. XRDS is a descriptor already, which is why =drummond resolves to it: it describes the service interfaces that get to =drummond. But it's a descriptor of service endpoints and synonyms; it still doesn't persistently describe Drummond, the way the DESC field does in Handle. (Or would, if anyone ever used DESC).

Now, the technology-independent description of what is being described is needed for persistent identifiers; it's not as important for reassignable identifiers. So even if =drummond doesn't take me directly to a persistent description, persistence is still satisfied if =drummond takes me to =!F83.62B1.44F.2813, and =!F83.62B1.44F.2813 takes me to a persistent description. XRI allows =drummond and =!F83.62B1.44F.2813 to have different XRDS (because they can have different services attached)—though typically when an i-name is registered against an i-broker, the XRDS is the same. The requirement would be for the persistent description to be accessed through the i-number's XRDS, which may not be the same as the i-name's.

The easy way of adding a persistent description to an XRDS is treating it as yet another service endpoint on the identifier: I give you an identifier, I get back a persistent description. Drummond's contact page already accidentally the description. What I'd like is some canonical class of service for getting to the persistent description. It could be something as simple as an +i-service*(+description)*($v*1.0) service type, to match the xri://+i-service*(+contact)*($v*1.0) type which gave me Drummond's contact page.

This description service is actually the reverse of David Booth's http://thing-described-by.org/. David starts with the URL for a description as a web page, http://dbooth.org/2005/dbooth/, and creates an abstract identifier http://thing-described-by.org?http://dbooth.org/2005/dbooth/ for the entity described by the web page . XRI starts with @xri*david.booth (I can't see David actually registering his own XRI), which is already an inherently abstract identifier—unlike HTTP URIs.

Getting from there back to the description http://dbooth.org/2005/dbooth/ is a resolution; we could access it through http://is-description-of.org/?@xri*david.booth . (We would likely access it through normal HXRI proxy http://xri.net/@xri*david.booth too; the point is, we're constraining the HTTP resolution to a specific kind of representation. David Is Not His Homepage.)

I'll note that David's description is worth emulating: "The URI http://thing-described-by.org?http://dbooth.org/2005/dbooth/ hereby acts as a globally unique name for the natural person named David Booth with email address dbooth@hp.com (as of 1-Jan-2005)."


The catch with that approach is, we're now relying on an external service to guarantee the persistent metadata for our persistent identifier. And as I argued in the previous post, you don't want to do that: your system for persistence should be self-contained, since you are accountable for it. It is easier for the description to persist if it sits inside the i-number's XRDS than outside it.

Even that does not give much of a guarantee of archival-level persistence. It is a feature and not a bug of XRI that users manage their own XRDS for personal i-names: the i-broker refers resolution queries back out to the user's XRDS, and promises only not to reassign the i-number. i-brokers do not commit to registering their own persistent metadata against the i-number. But once the user's XRDS goes offline, noone is able to resolve the i-name or the i-number. The trick with persistence in identifiers is, it's always persistence of something. Once the service endpoints for your identifier go away, you lose persistence of actionability. Not reassigning the i-number maintains persistence of reference (the i-number can't start referring to something else). But without a description accessible down the road, it does not maintain persistence of resolution (a user finding out what it referred to, even if no service endpoints are available).

Maybe that's OK: XRIs are addressing a particular issue—digital identity across multiple services. If the user is trusted to maintain their digital identity, then XRI is not geared to address long-term archival needs. In the same way, the user-centered practice of self-archiving has nothing to do with long-term archives (as Stevan Harnad has to keep repeating—with only himself to blame for introducing the term in the first place. )

Oh, can't resist: Wikipedia entry on self-archiving:

Bwahah. And don't get me started on an "archivangelism" with its emphasis on "arch"...

XRI, Handle, and persistent descriptors, Pt 1

This post is to suggest that XRDS (or equivalent) includes not just service endpoints, but also persistent descriptions—potentially as a distinct service endpoint. It takes a while to build up the argument, so I'm splitting it in parts.

One of the critical insights we came up with in the PILIN persistent identifier project is: if you want the identifier to persist, it's not enough to just keep updating URLs that the identifier resolves to. You want to record somewhere a piece of metadata, that tells you what the thing identified is—independent of the URLs. That piece of metadata will itself be persistent: it will not be affected by any changes in the service endpoints of your identifier. But it doesn't have to be machine-readable: it can be a description in prose.


  • Having that piece of information helps you in disaster recovery. If all your URLs go out the window, you can still use the description to reconstruct how the identifier should resolve (and reformulate the URLs). And you can't really claim persistence if you don't have some kind of disaster recovery.
  • Having that piece of information is also critical for archival use of identifiers—after the services resolved to are no longer accessible. (And persistent identifiers should persist longer than the services they had resolved to.)
  • Getting to that piece of metadata in itself involves a service, and in itself is a resolution. (That means it can integrate into the current XRDS as a service endpoint.)
  • But if you entrust that piece of metadata to a service outside your identifier management system, you are putting persistence at risk.


Let me first illustrate this principle with the technology we used in PILIN, Handle.

info:hdl:102.100.272/0N8J991QH 


resolves to the Handle record:


URL: https://www.pilin.net.au
EMAIL: opoudjis@gmail.com
HS_ADMIN: [admin bit masks]


I can update my URLs and Emails as things change, but that's pretty poor information management. If I disappear, and the DNS registration expires, I'm not allowing anyone to reconstruct what the identifier resolved to. If someone's found the Handle 102.100.272/0N8J991QH on a printout at some point in the distant future (like, say, 5 years), and they find a Handle resolver which gives the information above, they too are none the wiser about what the Handle was supposed to identify. Because the Handle was supposed to be persistent, it has failed.

But Handle also provides a DESCription field, which allows you to say what is being identified:


URL: https://www.pilin.net.au
EMAIL: opoudjis@gmail.com
HS_ADMIN: [admin bit masks]
DESC: Website for the PILIN project (Persistent Linking Infrastructure),
funded by the Australian Government to investigate policy and technology
for digital identifier persistence.


That description is at least a fallback if the URL does not get maintained. I'd argue further that the description is the real resolution of the identifier (as PILIN defined resolution this year: information distinctive to the thing identified, differentiating it from all other things). The description actually tells you what is being identified, and it stays the same even if the URL location of the website does not. It gives a persistent resolution of the Handle, which is not constrained by a particular service or protocol.

Moreover, if the description is part of the Handle record, then it will persist so long as the Handle record itself persists. It does not depend on an external agent to guarantee it sticks around. Which is what you want for the metadata that will guarantee the persistence of the Handle.

If on the other hand I put my descriptions in an external service, like http://description-of.org/hdl/102.100.272/0N8J991QH , then I will lose my persistent descriptions if http://description-of.org goes down: I am dependent on http://description-of.org for the long-term persistence of my identifiers. And I should not be dependent: persisting my 102.100.272/0N8J991QH Handle is my responsibility (for which I am accountable), and it's what I set up my identifier management system to do.

Next Post, we run that notion against XRI.

Introduction to XRI

Yet another introduction to XRI, which I presented at the !DEA 2008 workshop.

Introduction to XRI

2008-11-11

Using UML Sequence diagrams to derive e-Framework Service Usage Models

The e-Framework is a documentation standard for service-oriented system development, that I've been involved in. It has a registry of abstract services (service genres) and services profiled to communities and standards (service expressions). It also has service usage models (SUM), which present the services needed to realise a system, by lining up the services and data sources that each business process uses in the system. Like this:



That's just the SUM diagram; there is a whole document that goes with it, explaining how the business processes map to services via system functions, the usage scenarios, what situations the SUM is applicable to, design considerations, and so on. But the SUM diagram already gives an overview of the wherewithal for putting such a system together. And so long as the services and data sources are kept reasonably abstract, the diagram can be used to compare different systems from different domains, and work out their common infrastructure requirements.

In a project I've been working on recently for Link Affiliates, I had to come up with a range of implementation options for the solution I was describing, and use the e-Framework to do so. I had been describing the solutions with UML Sequence diagrams. The following is a way of mapping from the former to the latter that I came up; it's pretty obvious, but I thought it might be of interest anyway.

I am assuming you're already familiar with UML Sequence diagrams.



UML Sequence diagrams are a good match for service usage models, because both are concerned with how a system interacts with the outside world. The interactions are drawn explicitly in the UML; in the SUM, interactions happen through the services that the system exposes. (That's why it's a service usage model: it's how external users interact with the system.)

We make the following assumptions:


  1. Any sequence of interactions initiated by a human actor corresponds to a business process meaningful to a human. Some sequences initiated by computer agents are also potentially meaningful business processes.
  2. All interactions between objects are through services. (We are taking a service-oriented view of the interactions, after all.)
  3. All objects sending or receiving data through messages are potentially data sources and data sinks. (The two are not differentiated in the e-Framework.)


Given the first assumption, we can break up a large sequence of interactions into several business processes, depending on how actors intervene:



Of course, this step is cheating: you probably already have an idea of what business processes you want to see. Anyway.

Given the next assumption, if you want to know what services your application uses, just read off the messages from the UML diagram. Each of those messages should be communicated as a service --- through a defined interface between systems. So the messages are all service calls.

Some provisos:

  • Like we said, the SUM is about how the system interacts with external users and systems. So any interactions within a system are out of scope: they aren't exposed as a service.
  • Some services will in fact involve several subsidiary service interactions. They would be described in a distinct service usage model, which can be modularised out of the current SUM.
  • Return messages are included in the definition of a service; so they do not need to be counted separately.
  • A message forwarding a request from one actor to another may be ignored, as it does not represent a new service instance.

    • For instance, we choose not to model the Ordering message from the Orders system to the Warehouse Manager; that message is really only forwarding the initial order made by the customer, and can instead be counted as service choreography.

  • The e-Framework consolidates services into a minimal-ish vocabulary (at the service genre level). So the messages should be mapped to established types of services wherever possible; the point of the exercise is to compare between systems, and that means the services have to make sense outside their particular business context.

    • So in the example below, "Disambiguate" will actually be done through a search; so that message is counted as an instance of Search.

  • Likewise, if a message is described only in terms of its payload, you will have to come up with a sensible service to match.

    • The message from the Orders system to the Warehouse Management system is described just as "Part Name". Because this is a retrieval of information based on the part name, we describe it explicitly as Search.






Likewise, the swimlanes acting as data sources and sinks are interpreted as e-framework data sources.

Now that we know the business processes, the services, and the data sources from our UML Sequence diagram, we only have to line them up into the e-Framework SUM diagram: