Monthly Archives: August 2011

Direct rdf mapping in Thea

The current implementation of Thea makes use of the SWI-Prolog semweb package as means of parsing OWL encoded as RDF/XML, according to the Mapping to RDF Graphs specification. After the triples have been translated into owl2_model facts such as EquivalentClasses/1, they are discarded. This is somewhat analagous to how the java OWLAPI views RDF – merely as an exchange format.

This means it is possible to use Thea to convert an OWL ontology encoded in RDF to native prolog facts such as:


subClassOf(finger,someValuesFrom(partOf,hand)).
equivalentClasses([finger,intersectionOf([digit,someValuesFrom(partOf,hand)])]).

(real URIs have been replaced by names in the above)

And then use another prolog system that lacks the rich RDF libraries of SWI to process the ontology.

This also has lots of advantages when working in a purely OWL world, but has some disadvantages when working with mixed RDF and OWL views. From an engineering perspective it would be nice to be able to take more advantage of the useful features of the SWI semweb library (in particular, namespace support). Ideally the programmer could choose whether the OWL predicates were served from an RDF store or from native prolog facts.

This is now possible, to a certain extent, using the newrdf branch in github. Note that the posh and pkb branches frequently merge in from this branch.

This branch includes a module owl2_rdf that serves owl2_model predicates directly from semweb/rdf_db.pl in SWI. Check the comments in the code for how DCGs are used for a very compact declarative coding of the mapping.

The idea is to allow seamless switching between backing stores. To use the direct RDF store, specify “rdf_direct” as the format. In prolog:


load_axiom('foo.owl',rdf_direct).

On the command line:


thea --format rdf_direct foo.owl --query select A where "axiom(A)"

For most purposes, there should be no noticeable differemce. However, if you now wish to mix and match RDF, SPARQL and OWL in prolog the picture is much better. E.g. try the following:


thea-poshj --format rdf_direct ceph.owl
?- use_module(library(semweb/rdf_db)).
?- rdf_has(X, rdfs:subClassOf, Y).
?- subClassOf(X,Y).

The direct rdf query returns the same axioms (with the difference that the rdf query Y may bind to bNodes, whereas with subClassOf/2, Y binds to prolog terms corresponding to class expressions.

In fact, from within Posh, you can now type the command “clio.” to launch a ClioPatria semantic web server complete with custom OWL views.

There are still a few quirks that need ironed out before newrdf is merged into master

  • Efficiency is a challenge. When mapping a query such as subClassOf/2 to rdf_has/3, additional calls have to be made to map bNodes to prolog terms. This has to be done in the correct order for efficiency. This has now been done for a few predicates, but others are noticeably slow on ontologies such as snomed
  • This is particularly challenging for OWL axioms that takes sets as arguments (e.g. equivalentClasses/1) and have to be mapped to pairwise RDF calls
  • assert_axiom/2 needs to be mapped

However, for many purposes the current behavior should be fine.

Advertisements

Generating a variant ontology using POPL

Ontologies such as the FMA represent reference anatomical entities. Many actual existing anatomical entities would not be classified in reference anatomical ontologies, due to widespread variation found in nature. This applies across multiple scales and modalities: genes and proteins are typically represented using some reference structure, pathways are abstractions that conveniently ignore all the messy crosstalk and stochastic events ubiquitous in cells.

From a practical point of view it makes sense to ignore the majority of this variation and represent some possibly hypothetical reference model. This is what most bio-ontologies do. Sometimes it can be useful to generate an ontology of variants, together with abstractions over the union of the variant and the reference. I call this here a Reference-Variant-Abstraction triad model, with a nod to SNOMED-CT SEP triples.

Generation of a skeleton variation ontology can be automated using the following POPL script:


:- [idfixer].

% ========================================
% GENERATION OF VARIANT CLASSES
% ========================================
% we add both variant classes and abstract classes, in a R-V-A triad
add
(   class(CV),
    CV == variant and variantOf some C,
    label(CV,CVN),

    class(CA),
    CA == CV or C,
    label(CA,CAN)
)
where (
       class(C),
       labelAnnotation_value(C,CN),
       C\=variant,
       extend_iri(C,'variant_',CV),
       atom_concat('variant ',CN,CVN),
       extend_iri(C,'abstract_',CA),
       atom_concat('abstract ',CN,CAN)
      ).

% ========================================
% PROPERTY CHAINS FOR VARIANTS
% ========================================
% we make a property chain for each object property,
% traversing the variantOf property (it is assumed the ontology
% already has this). We also add the reflexive form
add
(   objectProperty(PV),
    variantOf*P @< PV,
    label(PV,PVN),
    variantOf*P @< PVR,
    subPropertyOf(P,PVR),
    label(PVR,PVRN)
)
where (
       objectProperty(P),
       P\=variantOf,
       labelAnnotation_value(P,PN),
       extend_iri(P,'variantOf_',PV),
       atom_concat('variantOf ',PN,PVN),
       extend_iri(P,'reflexive_variantOf_',PVR),
       atom_concat('reflexive variantOf ',PN,PVRN)       
       ).

This relies on additional program called idfixer.pl:


extend_iri(Iri,Suffix,New) :-
        % HASH-style URIs
        atomic_list_concat([A,B],'#',Iri),
        atomic_list_concat([A,'#',Suffix,B],New).
extend_iri(Iri,Suffix,New) :-
        % OBO-style URIs
        atomic_list_concat(Parts,'_',Iri),
        reverse(Parts,[A,B|Rest]),
        atom_concat(B,Suffix,B2),
        reverse(Parts2,[A,B2|Rest]),
        atomic_list_concat(Parts2,'_',New).

If we save the popl file as rvs.popl, we can execute it like this:


thea --popl-file rvs.popl myont.owl --to owl

The resulting ontology will have 3x the number of classes. Use a reasoner to classify this.

If the original ontology contained “tooth”, “mouth” and “tooth SubClassOf partOf some mouth”, then the new ontology would include:


Class: 'variant tooth'
EquivalentTo: variantOf some tooth

Class: 'abstract tooth'
EquivalentTo: 'variant tooth' or tooth

(annotations omitted)

You can try DL queries within Protege

A query such as:

partOf some mouth

Will return the reference class for “tooth”, but not the abstract tooth or a variant tooth. This is because “variant tooth” encompasses ectopic teeth (e.g. a tooth may be part of a teratoma in the lung). At a stretch, we would also include shark dermal denticles as variants of human teeth. This is all well and good, but we might want to query for what is “typical”. In this case we can ask:


'reflexive variantOf partOf' some mouth

The property name is not very intuitive, but what we mean here is “any variant of a tooth that is part of a mouth, or part of a mouth”. The following should be equivalent:


('variantOf partOf' some mouth) or partOf some mouth

As should this:


(variantOf some (partOf some mouth)) or partOf some mouth

The named property chain just makes querying easier.

In this case we get “abstract tooth” and “variant tooth” in the descendants. If we manually classify teratoma teeth or dermal denticles here, we will get these too.

References:

Prolog OWL Shell at OWLED 2011

My slides and paper for POSH are now online:

Manuscript: owled2011_submission_15.pdf

Slides: POSH (slideshare)