2009-06-14

Using UML Component diagrams to embed e-Framework Service Usage Models

Given the background of what embedding SUMs in other SUMs can mean, I'm going to model what that embedding can look like from a systems POV, using UML component diagrams. The tool is somewhat awkward to the task, but I was rather taken with the ball-and-socket representation of interfaces in UML 2.0—even if I have to abandon that notation where it counts. I'm also using this as an opportunity to explore specifying the data sources for embedded SUMs—which may not be the same as the data sources for the embedding SUM.

The task I set myself here is to model, using embedded SUMs, functionality for searching for entries in a collection, annotating those entries, and syndicating the annotations (but not the collection entries themselves).

We can represent what needs to happen in an Activity diagram, which captures the fact that entries and annotations involve two different systems. (We'll model them as distinct data stores):

We can go from that Activity diagram to a simple SUM diagram, capturing the use of four services and two data sources:

But as indicated in the previous post, we want to capitalise on the existence of SUMs describing aspects of collection functionality, and modularise out service descriptions already given in those SUMs (along with the context those SUMs set). So:

where a "Searchable Collection" is a service usage model on searching and reading elements in a collection, and "Shareable Collection" is a service usage model on syndicating and harvesting elements in a collection—and all those services modelled may be part of the same system. We are making an important distinction here: the embedded searchable and shareable collection SUMs are generic, and can be used to expose any number of data sources. We nominate two distinct data sources, and align a different data source to each embedded SUM. So we are making the entries data source searchable, but the annotations data source shareable; and we are not relying on the embedded SUMs to tell us what data sources they talk to, when we do this orchestration.

Which is all very well, but what does embedding a SUM actually look like from a running application? I'm going to try to answer that through ball-and-socket. The collection SUM models a software component, which exposes several services for other systems and users to invoke. That software component may be a standalone application, or it may be integrated with other components to build something greater; that flexibility is of course the point of Service Oriented Architecture (and Approaches). The software component exposes a number of services, which can be treated as ports into the component:

And an external component can interface through one or more of those exposed services, giving software integration:

Each service defines its own interface, and the interface to a port is modelled in UML as a realisation of a component (hollow arrowhead): it's the face of the component that the outside world sees:

And outside components that use a port depend on that interface (dashed arrow): the integration cannot happen without that dependency being resolved, so the component using our services depends on our interface:

Exposed services have their interfaces documented in the SUM: that is part of the point of a SUM. But a SUM may not document the interface of just one exposed service, but of several. By default, it documents all exposed services. But if we allow a SUM to model only part of a system's functionality, then we can have different SUMs capturing only subsets of the exposed functionality of a system. By setting up simple, searchable and shareable collections, we're doing just that.

Now a SUM is much more than just an interface definition. But if a single SUM includes the interface definitions for all of Add Read Replace Remove and Search, then we can conflate the interfaces for all those services into a single reference to the searchable collection SUM—where all the interfaces are detailed. We can also have both the simple and the searchable collection SUMs as alternate interfaces into our collection: one gives you search, the other doesn't. (Moreover, we could have two distinct protocols into the collection, so that the distinction may not just be theoretical.)

This is not a well-formed UML diagram, on purpose: the dependency arrows are left hanging, as a reminder that each interface (a SUM) defines several service endpoints into the component. The reason that's not quite right is that the UML interface is specific to a port—each port has its own inteface instance; so a more correct notation would have been to preserve the distinct interface boxes, and use meta-notation to bundle them together into SUMs. Still, the very act of embedding SUMs glosses over the details of which services are being consumed from the embed. So independently of the multiple incoming (and one outgoing) arrows per interface, this diagram is telling us the story we need to tell: a SUM defines bundles of interfaces into a system, and a system may have its interfaces bundled in more than one way.

Let's return to our initial task; we want to search for entries in a collection, annotate those entries, and syndicate the annotations. We can model this with component diagrams, ignoring for now the specifics of the interfaces: we want the functionality identified in the first SUM diagram, of search, read, annotate, and syndicate. In a component diagram, what we want looks like this:

The Entries component exposes search and read services; the Annotations component (however it ends up realised) consumes them. The Annotations component exposes an annotate service to end users, and a syndicate service to other components (wherever they may be).

That's the functionality needed; but we already know that SUMs exist to describe that functionality, and we can use those SUMs to define the needed interfaces:

The Entries collection exposes search and read services through a Searchable Collections SUM, which targets the Entries data source. The Annotations collection exposes syndicate services through a Shareable Collections SUM, which targets the Annotations data source.

Now, in the original component diagram, Annotate was something you did on the metal, directly interfacing with the Entries component:

Expanding it out as we have, we're now saying that realising that Annotate port involves orchestration with a distinct Annotation data source, and consumes search and read services. So we map a port to a systems component realising the port:

Slotting the Annotate port onto a collection is equivalent to slotting that collection into the Search and Read service dependency of the the Annotate system:

So we have modelled the dependency between the Entries and Annotate components. But with interfaces, the services they expose, and data sources as proxies for components, we have enough to map this component diagram back to a SUM, with the interface-bundling SUMs embedded:

The embedded SUMs bundle and modularise away functionality. Notice that they do not necessarily define functionality as being external, and so they do not only describe "other systems". The shareable SUM exposes the annotations, and the searchable SUM exposes the entries: their functionality could easily reside on the same repository, and we can't think of both the Entries and the Annotations as "external" data—if we did, we'd have no internal data left. The embedded SUMs are simply building blocks for system functionality—again, independently of where the functionality is provided from.

What does anchor the embedded SUMs and the services alike are the data sources they interact with. An Annotations data sources can talk to a single Annotate service in the SUM, as readily as it can to a Syndicate service modularised into Shareable Collection. Because an embedded SUM can be anchored to one of "our" data sources, just like a standalone service can. That means that, if a SUM will be embedded within another SUM, it's important to know whether the embedded SUM's data sources are cordonned off, or are shared with the invoking context.

An authentication SUM will have its own data sources for users and credentials, and no other service should know about them except through the appropriate authorisation and authentication services. But a Shareable Collections SUM needs to know what data source it's syndicating—in this case, the same data source we're putting our annotations into. So the SUM diagram needs to identify the embedded SUM data source with its own Annotations data source. If data sources in a SUM can be accessed through external services, then embedding that SUM means working out the mapping between the embedding and embedded data sources—as the dashed "Entries" box shows, two diagrams up.

SUM diagrams are very useful for sketching out a range of functionality, and modularisation helps keep things tractable, but eventually you will want to insert slot A into tab B; if you're using embedded SUMs, you will need to say where the tabs are.

Embedding e-Framework SUMs

I've already posted on using UML sequence diagrams to derive e-Framework Service Usage Models (SUMs). SUMs can be used to model applications in terms of their component services. That includes the business requirements, workflows, implementation constraints and policy decisions are in place for an application, as well as the services themselves and their interfaces.

However, in strict Service Oriented Architecture, the application is not a well-bounded box, sitting on a single server: any number of different services from different domains can be brought together to realise some functionality: the only thing binding these services together is the particular business goal they are realising. We can go even further with this uncoupling of application from service: a service usage model, properly, is just that: a model for the usage of certain services for a particular goal. It need not describe just what a single application does; and it need not exhaustively describe what a single application does. If a business goal only requires some of the functionality of an application, the SUM will model only that much functionality. And since an application can be applied to multiple business problems, there can be multiple SUMs used to describe what a given application does (or will do).

This issue has come up in modelling work that Link Affiliates has been doing around Project Bamboo, and on core SUMs dealing with collections. The e-framework has already defined a SUM for simple collections, with CRUD functionality, and searchable collections, which offer CRUD functionality plus search. The searchable collection SUM includes all the functionality of the simple collection SUM, so the simple collection SUM is embedded in the searchable collection SUM:

The e-framework already has notation for embedding one SUM within another:

And in fact, the embedded SUMs are already in the diagram for the searchable collection: they are the nested rectangles around "Provision {Collection}" and "Manage {Collection}".

Embedding a SUM means that the functionality required is not described in this, but in another SUM. There is a separate SUM intended for managing a collection. That does not mean that the embedded SUM functionality is sourced from another application: the functionality for adding content, searching for content, and managing the content may well be provided by a single system. Then again, it may not: because the SUM presents a service-oriented approach, the functionality is described primarily through services, and the systems they may be provided through are a matter of deployment. But that means that the simple collection SUM, the searchable collection SUM, and the manage collection SUM can all be describing different bundles of functionality of the same system.

Embedding SUMs has been allowed in the e-Framework for quite a while, and has been a handy device to modularise out functionality we don't want to detail, particularly when it is only of secondary importance. Authentication & Authorisation, for instance, are required for most processes in most SUMs; but because SUMs are typically used as thumbnail sketches of functionality, they are often outsourced to an "Identity" SUM.

That modularisation does not mean that the OpenURL SUM shares all its business requirements or design constraints with the Identity SUM. After all, the Identity functionality may reside on a completely different system on the bus. Nor does it mean that every service of the Identity SUM is used by the OpenURL SUM—not even every service exposed to external users. The Identity SUM may offer Authentication, Authorisation, Accounting, Auditing, and Credentials Update, but OpenURL may use only a subset of those exposed services. In fact, the point of embedding the SUM is not to go into the details of which services will be used how from the embedded SUM: embedding the SUM is declining to detail it further, at least in the SUM diagram.

On the other hand, embedding the Identity SUM, as opposed to merely adding individual authentication & authorisation services to the SUM-

—lets us appeal to the embedded SUM for specifics of data models, protocols, implementation, or orchestration, which can also be modularised out of the current SUM.