Unfolding T-boxes in GO using POPL

“Unfolding a T-box” may sound like some quaint tea ceremony ritual, but in fact in the context of description logics it refers to the iterative replacement of classes by equivalent anonymous class expressions.

Many reasoners take advantage of T-box unfolding behind the scenes. But there may be reasons to unfold your T-box in a more public fashion.

For example, there are frequently criticisms of highly specific wordy GO terms such as:
GO:2001043 positive regulation of cytokinetic cell separation involved in cell cycle cytokinesis

Whilst not as unwieldy as some of the infamous ICD9 examples, some feel that this is taking pre-composition too far. In fact these detailed pre-composed terms are very useful for systems that aren’t capable of consuming anonymous class expressions. However, for some purposes it may be useful to replace this with a nested class expression.

This can be done in 3 lines with a POPL script:


class(X) ===> null where X==_.
annotationAssertion(_,X,_) ===> null where X==_.

X ===>Y where X==Y.

The first two lines remove class declarations and annotation assertions (e.g. label assignments) for any defined classes. The final line does all the work: it replaces every occurrence of a defined class with the equivalent class expression.

This means if we have the following gene association (e.g. from a GAF file):


Class: :Gene1234
Types: 'positive regulation of cytokinetic cell separation involved in cell cycle cytokinesis'

it will be translated to:

Class: :Gene1234
Types:
  capable_of some 
    ('biological regulation'
     and (positively_regulates some 
        ('cytokinetic cell separation'
         and (part_of some 
            (cytokinesis
             and (part_of some 'cell cycle'))))))

We can also choose to selectively unfold - e.g. unfold all regulation terms:


X ===>Y where (X==Y, Y='biological regulation and _).

Resulting in:

Class: :Gene1234
Types:
  capable_of some
    ('biological regulation'
      and (positively_regulates some 'cytokinetic cell separation involved in cell cycle cytokinesis'))

The same thing could be done in java, but would require significantly more code and messing around with visitor classes. The results would be less declarative, and harder to customize.

What if we want to perform the reverse operation? This is similar to finding the most specific subsuming class, which is a standard reasoner operation. However, in this case we want to find a more specific class expression, which is slightly more difficult. This might be the topic of a future post.

Change in posh syntax

The syntax for posh has changed in the latest release on the “posh” branch.

Previously “q”, “l” and other shortcut commands were unary predicates. This caused a few problems when combining with other programs.

Now there is a single generic infix predicate “\” for all shortcut commands. The first argument in the command.

E.g


q\'wing'.


l\'wing'.

init/1 remains the same:


init pellet.
q\{_<neuron}.

Direct rdf mapping in Thea

The current implementation of Thea makes use of the SWI-Prolog semweb package as means of parsing OWL encoded as RDF/XML, according to the Mapping to RDF Graphs specification. After the triples have been translated into owl2_model facts such as EquivalentClasses/1, they are discarded. This is somewhat analagous to how the java OWLAPI views RDF – merely as an exchange format.

This means it is possible to use Thea to convert an OWL ontology encoded in RDF to native prolog facts such as:


subClassOf(finger,someValuesFrom(partOf,hand)).
equivalentClasses([finger,intersectionOf([digit,someValuesFrom(partOf,hand)])]).

(real URIs have been replaced by names in the above)

And then use another prolog system that lacks the rich RDF libraries of SWI to process the ontology.

This also has lots of advantages when working in a purely OWL world, but has some disadvantages when working with mixed RDF and OWL views. From an engineering perspective it would be nice to be able to take more advantage of the useful features of the SWI semweb library (in particular, namespace support). Ideally the programmer could choose whether the OWL predicates were served from an RDF store or from native prolog facts.

This is now possible, to a certain extent, using the newrdf branch in github. Note that the posh and pkb branches frequently merge in from this branch.

This branch includes a module owl2_rdf that serves owl2_model predicates directly from semweb/rdf_db.pl in SWI. Check the comments in the code for how DCGs are used for a very compact declarative coding of the mapping.

The idea is to allow seamless switching between backing stores. To use the direct RDF store, specify "rdf_direct" as the format. In prolog:


load_axiom('foo.owl',rdf_direct).

On the command line:


thea --format rdf_direct foo.owl --query select A where "axiom(A)"

For most purposes, there should be no noticeable differemce. However, if you now wish to mix and match RDF, SPARQL and OWL in prolog the picture is much better. E.g. try the following:


thea-poshj --format rdf_direct ceph.owl
?- use_module(library(semweb/rdf_db)).
?- rdf_has(X, rdfs:subClassOf, Y).
?- subClassOf(X,Y).

The direct rdf query returns the same axioms (with the difference that the rdf query Y may bind to bNodes, whereas with subClassOf/2, Y binds to prolog terms corresponding to class expressions.

In fact, from within Posh, you can now type the command "clio." to launch a ClioPatria semantic web server complete with custom OWL views.

There are still a few quirks that need ironed out before newrdf is merged into master

  • Efficiency is a challenge. When mapping a query such as subClassOf/2 to rdf_has/3, additional calls have to be made to map bNodes to prolog terms. This has to be done in the correct order for efficiency. This has now been done for a few predicates, but others are noticeably slow on ontologies such as snomed
  • This is particularly challenging for OWL axioms that takes sets as arguments (e.g. equivalentClasses/1) and have to be mapped to pairwise RDF calls
  • assert_axiom/2 needs to be mapped

However, for many purposes the current behavior should be fine.

Generating a variant ontology using POPL

Ontologies such as the FMA represent reference anatomical entities. Many actual existing anatomical entities would not be classified in reference anatomical ontologies, due to widespread variation found in nature. This applies across multiple scales and modalities: genes and proteins are typically represented using some reference structure, pathways are abstractions that conveniently ignore all the messy crosstalk and stochastic events ubiquitous in cells.

From a practical point of view it makes sense to ignore the majority of this variation and represent some possibly hypothetical reference model. This is what most bio-ontologies do. Sometimes it can be useful to generate an ontology of variants, together with abstractions over the union of the variant and the reference. I call this here a Reference-Variant-Abstraction triad model, with a nod to SNOMED-CT SEP triples.

Generation of a skeleton variation ontology can be automated using the following POPL script:


:- [idfixer].

% ========================================
% GENERATION OF VARIANT CLASSES
% ========================================
% we add both variant classes and abstract classes, in a R-V-A triad
add
(   class(CV),
    CV == variant and variantOf some C,
    label(CV,CVN),

    class(CA),
    CA == CV or C,
    label(CA,CAN)
)
where (
       class(C),
       labelAnnotation_value(C,CN),
       C\=variant,
       extend_iri(C,'variant_',CV),
       atom_concat('variant ',CN,CVN),
       extend_iri(C,'abstract_',CA),
       atom_concat('abstract ',CN,CAN)
      ).

% ========================================
% PROPERTY CHAINS FOR VARIANTS
% ========================================
% we make a property chain for each object property,
% traversing the variantOf property (it is assumed the ontology
% already has this). We also add the reflexive form
add
(   objectProperty(PV),
    variantOf*P @< PV,
    label(PV,PVN),
    variantOf*P @< PVR,
    subPropertyOf(P,PVR),
    label(PVR,PVRN)
)
where (
       objectProperty(P),
       P\=variantOf,
       labelAnnotation_value(P,PN),
       extend_iri(P,'variantOf_',PV),
       atom_concat('variantOf ',PN,PVN),
       extend_iri(P,'reflexive_variantOf_',PVR),
       atom_concat('reflexive variantOf ',PN,PVRN)       
       ).

This relies on additional program called idfixer.pl:


extend_iri(Iri,Suffix,New) :-
        % HASH-style URIs
        atomic_list_concat([A,B],'#',Iri),
        atomic_list_concat([A,'#',Suffix,B],New).
extend_iri(Iri,Suffix,New) :-
        % OBO-style URIs
        atomic_list_concat(Parts,'_',Iri),
        reverse(Parts,[A,B|Rest]),
        atom_concat(B,Suffix,B2),
        reverse(Parts2,[A,B2|Rest]),
        atomic_list_concat(Parts2,'_',New).

If we save the popl file as rvs.popl, we can execute it like this:


thea --popl-file rvs.popl myont.owl --to owl

The resulting ontology will have 3x the number of classes. Use a reasoner to classify this.

If the original ontology contained "tooth", "mouth" and "tooth SubClassOf partOf some mouth", then the new ontology would include:


Class: 'variant tooth'
EquivalentTo: variantOf some tooth

Class: 'abstract tooth'
EquivalentTo: 'variant tooth' or tooth

(annotations omitted)

You can try DL queries within Protege

A query such as:

partOf some mouth

Will return the reference class for "tooth", but not the abstract tooth or a variant tooth. This is because "variant tooth" encompasses ectopic teeth (e.g. a tooth may be part of a teratoma in the lung). At a stretch, we would also include shark dermal denticles as variants of human teeth. This is all well and good, but we might want to query for what is "typical". In this case we can ask:


'reflexive variantOf partOf' some mouth

The property name is not very intuitive, but what we mean here is "any variant of a tooth that is part of a mouth, or part of a mouth". The following should be equivalent:


('variantOf partOf' some mouth) or partOf some mouth

As should this:


(variantOf some (partOf some mouth)) or partOf some mouth

The named property chain just makes querying easier.

In this case we get "abstract tooth" and "variant tooth" in the descendants. If we manually classify teratoma teeth or dermal denticles here, we will get these too.

References:

Prolog OWL Shell at OWLED 2011

My slides and paper for POSH are now online:

Manuscript: owled2011_submission_15.pdf

Slides: POSH (slideshare)

Posh — the Prolog OWL Shell

Posh (Prolog OWL Shell) is a command line utility
that wraps the Thea OWL library to allow for advanced querying and
processing of ontologies, combining the power of prolog and OWL
reasoning.

Installation

Install SWI

Download and install SWI-Prolog (http://swi-prolog.org). This is a
simple point and click procedure for most platforms. If you want to use
reasoners such as Pellet, make sure you have JPL installed (see
troubleshoooting section).

Posh Git

Get the latest version of Thea from github. Git clone is
recommended, but you can also use the githb download link.

Assuming you placed the project in your toplevel directory, set your path:


export PATH="$PATH:$HOME/thea"

You're now set to use Posh

Getting Started

First we start thea using the --shell option. We also start it in JPL
mode, as we will be making use of the OWLAPI to interface with
reasoners. We'll also load the OWL translation of the fly anatomy
ontology, from the OBO library.


thea-jpl --jvm-opt -Xmx2048M  http://purl.obolibrary.org/obo/fbbt.owl --shell 

This starts us up in an ehanced prolog shell with the fly anatomy
loaded. You can make arbitrary prolog queries, as in any prolog shell,
for example:


?- member(X,[a,b,c]).
X = a ;
X = b ;
X = c.

If you don't know prolog, you should still be able to get by. The
crucial syntax to remember is that variables commence with an
uppercase letter (or '_') and each line should be terminated by a
'.'. Also, if you get stuck, type 'help.' for a list of commands.

Let's start by checking our list of ontologies:


?- ls.
*http://purl.obolibrary.org/obo/fbbt.owl
true.

The first thing we'll do is set some display options. fbbt uses labels
for all classes, so we will make sure these are displayed.


 ?- set display + labels.
true.

We will also choose to display all class expressions as 'plsyn'. This
looks like a mixture of DL syntax and manchester syntax. Why another
syntax? Because plsyn is pure prolog, which means it can be used
directly in prolog queries and operations. It's not the most
user-friendly syntax, but it's worthwhile getting to know if you
intend to be doing any advanced operations from within this shell.


 ?- set display + plsyn.
true.

These settings are automatically saved in your ~/.thearc file.

Type "settings." to see the full list. To clear the display settings:


 ?- unset display.
true.

You should stick with labels and plsyn for this tutorial. Other useful
values are "tabular" and "combined".

Initial exploration

We can use the l/1 command to find all axioms associated with a class (in this case 'wing'):


?- l wing.
% Axiom Type: annotationAssertion
annotationAssertion('http://purl.obolibrary.org/obo/IAO_0000115',wing,'A flight organ of the adult external thorax that is derived from a dorsal mesothoracic disc.').
annotationAssertion('http://purl.obolibrary.org/obo/IAO_id',wing,'FBbt:00004729').
annotationAssertion('http://purl.obolibrary.org/obo/IAO_subset',wing,'FB_gloss').
annotationAssertion('http://purl.obolibrary.org/obo/IAO_subset',wing,cur).
annotationAssertion('http://purl.obolibrary.org/obo/IAO_xref',annotation(wing,'http://purl.obolibrary.org/obo/IAO_0000115','A flight organ of the adult external thorax that is derived from a dorsal mesothoracic disc.'),'FBC:gg').
annotationAssertion(label,wing,wing).
% Axiom Type: class
class wing.
% Axiom Type: subClassOf
'chordotonal organ of wing'<part_of some wing.
'wing hair'<part_of some wing.
wing<appendage.
wing<develops_from some 'wing disc'.
wing<part_of some 'adult mesothoracic segment'.
wing<part_of some 'adult external thorax'.
tegula<part_of some wing.
'wing hinge'<part_of some wing.
'wing septum'<part_of some wing.
'wing margin'<part_of some wing.
'wing blade'<part_of some wing.
'dorsal wing blade'<part_of some wing.
'ventral wing blade'<part_of some wing.
true.

If the term of interest starts with an uppercase character, or includes a space, you need to quote the label:


?- l 'wing disc'.
...

Quotes must be escaped:


?- l 'Wheeler\'s organ'.
...

You can list add the axioms in the current ontology by typing
"lsa.". This might produce quite a long list. You can query more
precisely using prolog, but if you're not ready for that, you can rely
on your unix wizardry skills:


?- lsa -- 'grep wing'.
...

The "--" predicate will pipe the results of the posh command to any
unix command or script.

Trees and graphs

We can draw ascii trees showing the denormalized subclass hierarchy tree using t/1:


?- t wing.
anatomical entity
.material anatomical entity
..anatomical structure
...organism subdivision
....appendage
.....wing
true.

Note that this is just the asserted developmental axioms - so
far these are just queries on the ontology structure - we will get to
entailed facts later on.

Use the v/1 command to visualize an object (graphviz required):


?- v 'wing disc'.

By default, this will show the closure over subclass axioms, as well
as positive restrictions. The default behavior can be controlled, and
the graphviz display is highly configurable, but this is out of scope
for this tutorial.

Prolog query shorthand

You can use this shell to query any of the prolog facts in the owl
model database. For example, to find all development axioms:


?- labelAnnotation_value(DF,develops_from),
   subClassOf(X,someValuesFrom(DF,Y)).
DF = 'http://purl.obolibrary.org/obo/TODO_develops_from',
X = 'http://purl.obolibrary.org/obo/FBbt_00000000',
Y = 'http://purl.obolibrary.org/obo/FBbt_00000110' ;
DF = 'http://purl.obolibrary.org/obo/TODO_develops_from',
X = 'http://purl.obolibrary.org/obo/FBbt_00000090',
Y = 'http://purl.obolibrary.org/obo/FBbt_00016018' ;
DF = 'http://purl.obolibrary.org/obo/TODO_develops_from',
X = 'http://purl.obolibrary.org/obo/FBbt_00000091',
Y = 'http://purl.obolibrary.org/obo/FBbt_00004891' ;
DF = 'http://purl.obolibrary.org/obo/TODO_develops_from',
X = 'http://purl.obolibrary.org/obo/FBbt_00000095',
Y = 'http://purl.obolibrary.org/obo/FBbt_00017000' .

This isn't very meaningful - you can use labelAnnotation/2 to map results back to labels:


?- labelAnnotation_value(DF,develops_from),
  subClassOf(X,someValuesFrom(DF,Y)),
  labelAnnotation_value(Y,YN).

But this is a bit tedious. Posh has the convenience command q/1
which launches a query takes care of all label/IRI mapping for you


?- q X < develops_from some Y.
'germ layer derivative'<develops_from some 'germ layer'.
'dorsal ridge'<develops_from some 'dorsal ridge primordium'.
'pole bud'<develops_from some 'pole plasm'.
amnioserosa<develops_from some 'amnioserosa primordium'.
ectoderm<develops_from some 'ectoderm anlage'.
'dorsal ectoderm'<develops_from some 'dorsal ectoderm anlage'.
'anterior ectoderm'<develops_from some 'anterior ectoderm anlage'.
'posterior ectoderm'<develops_from some 'posterior ectoderm anlage'.
endoderm<develops_from some 'endoderm anlage'.
mesoderm<develops_from some 'mesoderm anlage'.
...

Note we're also using the infix operator '<' as a shorthand for the
prolog subClassOf/2 predicate, and the infix operator 'some' as
shorthand for someValuesFrom terms. The result is a bastard hybrid of
prolog, DL syntax by way of Manchester. Unfortunately the default
output is quite stingy with whitespace, and is overly compact.

The following shows all subclass axioms:


?- q _<_.

The list of axioms is quite large. We can filter this set using any
unix command, such as perl or grep using '--'.


?- q X < develops_from some Y -- 'grep muscle'.
'embryonic/larval somatic muscle'<develops_from some 'somatic muscle primordium'.
'longitudinal muscle'<develops_from some 'longitudinal visceral muscle primordium'.
'prothoracic pharyngeal muscle'<develops_from some 'dorsal pharyngeal muscle primordium'.
'abdominal ventral acute muscle 1'<develops_from some 'abdominal ventral acute muscle 1 founder cell'.
'abdominal ventral acute muscle 2'<develops_from some 'abdominal ventral acute muscle 2 founder cell'.
'abdominal ventral acute muscle 3'<develops_from some 'abdominal ventral acute muscle 3 founder cell'.
'abdominal dorsal oblique muscle 3'<develops_from some 'abdominal dorsal oblique muscle 3 founder cell'.
'abdominal lateral oblique muscle 1'<develops_from some 'abdominal lateral oblique muscle 1 founder cell'.
'abdominal dorsal transverse muscle 1'<develops_from some 'abdominal dorsal transverse muscle 1 founder cell'.
'abdominal ventral transverse muscle 1'<develops_from some 'abdominal ventral transverse muscle 1 founder cell'.
'adult myoblast'<develops_from some 'adult muscle precursor primordium'.
'circular visceral muscle fiber'<develops_from some 'circular visceral muscle primordium'.
'dorsal pharyngeal muscle primordium'<develops_from some 'dorsal pharyngeal muscle anlage'.
'dorsal pharyngeal muscle primordium'<develops_from some 'head mesoderm'.
'adult muscle precursor primordium'<develops_from some 'trunk mesoderm'.
'somatic muscle primordium'<develops_from some 'somatic mesoderm'.
'visceral muscle primordium'<develops_from some 'visceral mesoderm'.
'larval muscle system'<develops_from some 'embryonic muscle system'.
'embryonic gonadal sheath muscle'<develops_from some 'gonadal sheath proper primordium'.
'larval gonadal sheath muscle'<develops_from some 'embryonic gonadal sheath muscle'.
'hindgut visceral muscle fiber'<develops_from some 'hindgut visceral muscle primordium'.
'foregut visceral muscle fiber'<develops_from some 'foregut visceral muscle primordium'.
'embryonic/larval midgut longitudinal visceral muscle'<develops_from some 'midgut longitudinal visceral muscle primordium'.
'circular visceral muscle primordium'<develops_from some 'visceral mesoderm'.
'esophageal visceral muscle primordium'<develops_from some 'head mesoderm'.
'esophageal visceral muscle'<develops_from some 'esophageal visceral muscle primordium'.
true.

In Posh, '==' translates to equivalent_to/2, and 'and' translates
to intersectionOf, so we can ask for all definitions that
directly use the neuron class:


?- q X == neuron and A.
'abdominal neuron'==neuron and part_of some abdomen.
'A8 neuron'==neuron and part_of some 'abdominal segment 8'.
'prothoracic anterior fascicle neuron'==neuron and fasciculates_with some 'prothoracic intersegmental nerve'.
'prothoracic posterior fascicle neuron'==neuron and fasciculates_with some 'prothoracic segmental nerve'.
'mesothoracic anterior fascicle neuron'==neuron and fasciculates_with some 'mesothoracic intersegmental nerve'.
'mesothoracic posterior fascicle neuron'==neuron and fasciculates_with some 'mesothoracic segmental nerve'.
'metathoracic anterior fascicle neuron'==neuron and fasciculates_with some 'metathoracic intersegmental nerve'.
'metathoracic posterior fascicle neuron'==neuron and fasciculates_with some 'metathoracic segmental nerve'.
'abdominal anterior fascicle neuron'==neuron and fasciculates_with some 'abdominal intersegmental nerve'.
'abdominal posterior fascicle neuron'==neuron and fasciculates_with some 'abdominal segmental nerve'.
'peptidergic neuron'==neuron and releases_neurotransmitter some peptide.
'sensory neuron'==neuron and has_function_in some 'detection of stimulus involved in sensory perception'.
...

Editing the ontology

The commands add/1 and rm/1 will assert and retract to the current
ontology. When you're done use can use save_axioms/2 to persist your
results to a file.


?- add brain < has_part some neuron.
% Asserting brain<has_part some neuron.
 into http://purl.obolibrary.org/obo/fbbt.owl
true.

You can always undo:


?- undo.
% Undo: Asserting brain<has_part some neuron.
 into http://purl.obolibrary.org/obo/fbbt.owl
true.

If you have asserted multiple facts, you will keep getting prompted
until you hit return or have undone all facts you added. If you change
your mind again, just type "redo.".

Using a reasoner

Thea comes pre-packaged with both java reasoners and a prolog
rule-based reasoner. We'll assume Pellet, a java reasoner.

Start the reasoner like this:


?- init pellet.
% library(thea2/owl2_java_owlapi) compiled into owl2_reasoner 0.01 sec, 13,608 bytes
% initializing: reasoner 0.76
% completed: reasoner 21.82 time: 21.06
true.

Behind the scenes, Posh has established contact with the java
OWLAPI. If you're having java issues, or want to use a boutique
reasoner the OWLAPI can't talk to, you can always talk to a reasoner
via OWLLink - but that's the subject of another tutorial.

The qi/1 predicate is similar to q/1, but the query is passed to the
reasoner. Here's how we find all neurons:


?- qi X < neuron.
'5-2Ica1'<neuron.
'lamina receptor cell R1'<neuron.
'abdominal 1 desC neuron'<neuron.
'5-2I'<neuron.
'abdominal 2 desC neuron'<neuron.
'lamina receptor cell R3'<neuron.
'5-1I'<neuron.
...

The results are all entailed subclass axioms. If you just want the
classes you can use the SELECT .. WHERE .. idiom:


?- qi X where X < neuron.
'5-2Ica1'.
'lamina receptor cell R1'.
'abdominal 1 desC neuron'.
'5-2I'.
'abdominal 2 desC neuron'.
'lamina receptor cell R3'.
'5-1I'.
'lamina receptor cell R2'.
'4-4I'.
'5-2Icp'.
...

Again the list is quite large. The unix 'head' and 'tail' commands are
quite convenient here:


?- qi X < neuron -- head.

You can use any DL expression, for example someValuesFrom restrictions:


?- qi X < overlaps some neuron.
'photoreceptor cell R2 pigment granule'<overlaps some neuron.
'photoreceptor cell R5 pigment granule'<overlaps some neuron.
neurite<overlaps some neuron.
'photoreceptor cell R6 pigment granule'<overlaps some neuron.
synapse<overlaps some neuron.
'photoreceptor cell R3 pigment granule'<overlaps some neuron.
'photoreceptor cell R7 pigment granule'<overlaps some neuron.
'end plate'<overlaps some neuron.

Or intersections:


?- qi X < neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron MN5'<neuron and overlaps some 'muscle cell'.
'b2 motor neuron'<neuron and overlaps some 'muscle cell'.
'cibarial pump muscle neuron'<neuron and overlaps some 'muscle cell'.
'III1 motor neuron'<neuron and overlaps some 'muscle cell'.
'Nothing'<neuron and overlaps some 'muscle cell'.
'I1 motor neuron'<neuron and overlaps some 'muscle cell'.
'direct flight muscle motor neuron'<neuron and overlaps some 'muscle cell'.
'III3 motor neuron'<neuron and overlaps some 'muscle cell'.
'b1 motor neuron'<neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron MN3'<neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron MN4'<neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron MN1'<neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron MN2'<neuron and overlaps some 'muscle cell'.
'tergal depressor of trochanter muscle motor neuron'<neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron'<neuron and overlaps some 'muscle cell'.
'dorsal tp motor neuron'<neuron and overlaps some 'muscle cell'.
'ventral tp motor neuron'<neuron and overlaps some 'muscle cell'.
true.
?- qi X < neuron and overlaps some 'muscle cell' -- wc.
      17     160    1202
true.

Note that nothing is asserted to overlap in fbbt. Where is the
reasoner getting these inferences from? Let's have a look at the property:


 ?- l overlaps.
* Axiom Type: annotationAssertion
annotationAssertion('http://purl.obolibrary.org/obo/IAO_id',overlaps,overlaps).
annotationAssertion(label,overlaps,overlaps).
* Axiom Type: objectProperty
objectProperty(overlaps).
* Axiom Type: subPropertyOf
part_of@<overlaps.
partially_overlaps@<overlaps.
true.

part_of holds whenever overlaps holds (the "@<" is plsyn for
subPropertyOf/2), so in fact our query above gives the same results as
querying for parts of a muscle cell.

In fact, overlaps should hold whenever we have a chain of has_part and
part_of. Let's add this axiom. The plsyn for subproperties and property
chains is a bit abstruse and non-obvious:


?- add overlaps @< has_part*part_of.

You can always type the full prolog functional syntax if you prefer:


?- add subPropertyOf(overlaps,propertyChain([has_part,part_of])).

(the add/1 command takes care of translating your labels to IRIs).

With the current version of fbbt we have to also add this:


?- add has_part inverseOf part_of.

Unfortunately, this has the negative consequence of slowing Pellet
down a lot. Let's backtrack.


?- undo.
% Undo: Asserting has_part inverseOf part_of.
 into http://purl.obolibrary.org/obo/fbbt.owl
true ;
% Undo: Asserting propertyChain([has_part,part_of])@<overlaps.
 into http://purl.obolibrary.org/obo/fbbt.owl
true.

Advanced Ontology Processing

Let's say we want to start asserting spatial non-overlaps in our
ontology - for example, a leg and a wing have no parts in common:


?- add wing < not has_part some (part_of some leg).

but it would be very tedious to do this for all possible
partitions. What if instead we start from the basis that the current
ontology is correct, and assert non-overlap for all sibling parts that
cannot be proved to overlap?

First let's do some exploration - let's try and query for
part_of-siblings directly asserted to be parts of the leg:


?- q row(P1,P2) where P1 < part_of some leg, P2 < part_of some leg.
row(coxa,coxa).
row(coxa,tibia).
row(coxa,trochanter).
row(coxa,femur).
...

Note this includes reflexive pairs. We can exclude reflexive pairs by
doing an equality test. We might try and do this in this way:


?- q row(P1,P2) where P1 < part_of some leg, P2 < part_of some leg, P1\=P2.
true.

But this returns no results. Why? We can check using the tr/2
predicate to translate our shorthand above to the actual prolog goal:


?- tr((P1 < part_of some leg, P2 < part_of some leg, P1\=P2),X).
X = (subClassOf(P1, someValuesFrom('http://purl.obolibrary.org/obo/TODO_part_of', 
                                    'http://purl.obolibrary.org/obo/FBbt_00004640')), 
     subClassOf(P2, someValuesFrom('http://purl.obolibrary.org/obo/TODO_part_of', 
                                    'http://purl.obolibrary.org/obo/FBbt_00004640')), 
     differentIndividuals([[P1, P2]])).

Here "\=" is being translated to differentIndividuals/2. Now that we
are doing more advanced hybrid queries we have to exercise a little
more control by being explicit about which parts are shorthand. We can
use pq/1 to execute a query without implicit translation and
explicitly translate using g/1:


?- pq row(P1,P2) where g((P1 < part_of some leg, P2 < part_of some leg)), P1\=P2.
row(coxa,tibia).
row(coxa,trochanter).
row(coxa,femur).
row(coxa,joint).
row(coxa,'tarsal segment').
row(coxa,pretarsus).
row(tibia,coxa).
row(tibia,trochanter).
row(tibia,femur).
row(tibia,joint).
row(tibia,'tarsal segment').
row(tibia,pretarsus).
row(trochanter,coxa).
...

In the above, only the sections inside the "g()" term are translated
from the shorthand syntax.

We're still fetching symmetric pairs. We can avoid this using the
prolog @< comparison operator (which again has a different meaning in
our shorthand):


?- pq row(P1,P2) where g((P1 < part_of some leg, P2 < part_of some leg)), P1@<P2.
...

We can go ahead and assert non-overlap for all direct parts of the leg:


?- add P1 < not has_part some part_of some P2
    where g(P1 < part_of some leg), g(P2 < part_of some leg), P1\==P2.
% Asserting coxa<not has_part some part_of some tibia.
 into http://purl.obolibrary.org/obo/fbbt.owl
% Asserting coxa<not has_part some part_of some trochanter.
 into http://purl.obolibrary.org/obo/fbbt.owl
% Asserting coxa<not has_part some part_of some femur.
 into http://purl.obolibrary.org/obo/fbbt.owl
...

But we probably shouldn't do this - our assumption is too
strong. There's no reason to believe that all parts of the leg should
be spatially disconnected.

To see a counter-example, try:


?- pq row(X,P1,P2) where 
    W='embryonic head',g(P1 < part_of some W), g(P2 < part_of some W), P1@<P2, 
    gi(X < part_of some P1 and part_of some P2).
row('deutero/tritocerebral embryonic fiber tract founder cluster','embryonic antennal segment','embryonic intercalary segment').
row('A subperineurial glial cell (subesophageal)','embryonic mandibular segment','embryonic maxillary segment').
row('A subperineurial glial cell (subesophageal)','embryonic mandibular segment','embryonic labial segment').
row('A subperineurial glial cell (subesophageal)','embryonic maxillary segment','embryonic labial segment').
true.

Here we are querying the asserted database (via g/1) for part-siblings
and the inferred database (via gi/1) to see if there are parts shared
by those part-siblings. We can see there are some parts shared by some
pairs - if we were to assert that these were spatially disconnected
then we would get unsatisfiable classes next time we classified.

We can then write a command that asserts spatial disconnection only in
those cases where it can't be proved there is an existing overlap:


?- add P1 < not has_part some part_of some P2 where W='embryonic head',g(P1 < part_of some W), g(P2 < part_of some W), P1\=P2, \+ gi(X < part_of some P1 and part_of some P2).
% Asserting 'embryonic ocular segment'<not has_part some part_of some 'embryonic procephalic segment'.
 into http://purl.obolibrary.org/obo/fbbt.owl
% Asserting 'embryonic ocular segment'<not has_part some part_of some 'embryonic labral segment'.
 into http://purl.obolibrary.org/obo/fbbt.owl
% Asserting 'embryonic ocular segment'<not has_part some part_of some 'embryonic antennal segment'.
 into http://purl.obolibrary.org/obo/fbbt.owl
...

This is probably still too eager, as we making a closed world
assumption here (the "\+" is a prolog not, which means "cannot be
proved"). There may well be shared parts not yet in our
ontology. Nevertheless, it can be useful to make our initial
constraints too strong and progressively weaken them.

We can write a prolog clause that will execute this update for any
value of W. Use ctrl-z to suspend the shell:


cat > assert_non_overlap.pl
assert_non_overlap(W) :-
  add 
   P1 < not has_part some part_of some P2 
  where 
   g(P1 < part_of some W), g(P2 < part_of some W), 
   P1\=P2, 
   \+ gi(X < part_of some P1 and part_of some P2).

In future sessions just type "consult(assert_non_overlap)." to load
this program.

Template-based search and replace

We can perform even more powerful translations on the ontology using
POPL. To illustrate, let's add some 'overlaps' axioms:


 ?- add 'nervous system' < overlaps some head.
% Asserting 'nervous system'<overlaps some head.
 into http://purl.obolibrary.org/obo/fbbt.owl
true.

We can then translate these axioms using the following POPL expression:


 ?- overlaps some Y ===> has_part some part_of some Y.

We can check the results:


 ?- l 'nervous system' -- 'grep head'.
'nervous system'<has_part some part_of some head.
true.

You can use a where clause for selective replacement:


?- overlaps some Y ===> has_part some part_of some Y 
    where gi(Y < 'organism subdivision').

Ontology editing using templates

Create a file called "nc.pl" with the following contents:


edit_template(nc(X),
              [iri(obo('FBbt',8)),
               class X,
               N-annotationAssertion(label,X,literal(N)),
               multi(Y-subClassOf(X,Y)),
               multi(W-subClassOf(X,part_of some W)),
               multi(Pre-subClassOf(X,develops_from some Pre))
              ]).

You can load this from within a shell session by calling
"consult(nc)". Execute it using "+" like this:


?- +nc.

A new IRI is generated, and you will be prompted for template values, the first one is the class label:


class'http://purl.obolibrary.org/obo/FBbt_10005990'.

% Template: annotationAssertion(label,http://purl.obolibrary.org/obo/FBbt_10005990,literal(_G83))
% Enter value: _G83 >> 

Type a value and hit enter. Fill in values for other fields. If it's a
multi-valued field, keep adding new entries and then hit "enter" when
done:


% Enter value: _G83 >> neuron ABC
% Val=neuron ABC
annotationAssertion(label,'http://purl.obolibrary.org/obo/FBbt_10005990','neuron ABC').

% Template: subClassOf(http://purl.obolibrary.org/obo/FBbt_10005990,_G97)
% Enter value: _G97 >> neuron
% Val=neuron
'http://purl.obolibrary.org/obo/FBbt_10005990'> 

% Template: subClassOf(http://purl.obolibrary.org/obo/FBbt_10005990,part_of some _G108)
% Enter value: _G108 >> mushroom body
% Val=mushroom body
'http://purl.obolibrary.org/obo/FBbt_10005990'> 

% Template: subClassOf(http://purl.obolibrary.org/obo/FBbt_10005990,develops_from some _G122)
% Enter value: _G122 >> 

The database hasn't changed yet. The axioms to be added are
summarized, then you're prompted:


% ------------------
% AXIOMS TO ADD:
% ------------------
% class'http://purl.obolibrary.org/obo/FBbt_10005990'.
class'http://purl.obolibrary.org/obo/FBbt_10005990'.
% annotationAssertion(label,'http://purl.obolibrary.org/obo/FBbt_10005990','neuron ABC').
annotationAssertion('http://www.w3.org/2000/01/rdf-schema#label','http://purl.obolibrary.org/obo/FBbt_10005990',literal('neuron ABC')).
% 'http://purl.obolibrary.org/obo/FBbt_10005990'<neuron.
subClassOf('http://purl.obolibrary.org/obo/FBbt_10005990','http://purl.obolibrary.org/obo/FBbt_00005106').
% 'http://purl.obolibrary.org/obo/FBbt_10005990'<part_of some 'mushroom body'.
subClassOf('http://purl.obolibrary.org/obo/FBbt_10005990',someValuesFrom('http://purl.obolibrary.org/obo/TODO_part_of','http://purl.obolibrary.org/obo/FBbt_00005801')).
% OK? [enter for yes, any other char for no]

% Set ontology to http://purl.obolibrary.org/obo/fbbt.owl
% Asserting class'http://purl.obolibrary.org/obo/FBbt_10005990'.
 into http://purl.obolibrary.org/obo/fbbt.owl
% Asserting annotationAssertion(label,'neuron ABC','neuron ABC').
 into http://purl.obolibrary.org/obo/fbbt.owl
% Asserting 'neuron ABC'<neuron.
 into http://purl.obolibrary.org/obo/fbbt.owl
% Asserting 'neuron ABC'<part_of some 'mushroom body'.
 into http://purl.obolibrary.org/obo/fbbt.owl
% AXIOMS ADDED. Type "undo." to retract.
true.

Web interface

It's also possible to start up bomshell with an embedded ClioPatria
web server. This allows simultaneous querying and ontology
manipulation through both the command line and a web browser pointing
to localhost. Configuring the webserver is outside the scope of this
tutorial.

Finding out more

The shell interface is a quick and dirty wrapper around other Thea
modules. You can figure out exactly what a shall command does by using
the prolog "listing" command. E.g. "listing(q).".

Consult the Thea OWL Lib
website
for more information.

Troubleshooting

JPL installation

The current SWI dmg file for Snow Leopard doesn't appear to install
JPL correctly. This may be temporary. You may have to obtain the SWI source and do this:


cd packages/jpl
./configure
sudo make install

Support for OWLAPIv3 in Thea2

Thea2 now wraps OWLAPI v3. Support is provided for pellet and hermit out of the box.

There is now also additional command line support.

  • download and install SWI-Prolog
  • download Thea2 (for now you have to git clone and get the owlapi3 branch)
  • Add Thea2 to your path:

(assuming you install in ~/thea2)

export PATH="$PATH:$HOME/thea2"

You can then use the thea-jpl script, which takes care of JPL setup, adding the owlapiv3 jars to your classpath etc:

e.g.

thea-jpl testfiles/pizza.owl --reasoner pellet --reasoner-ask-all

shows all inferred axioms

For more examples, see Cookbook.txt

improving on SWI indexing on large databases of facts

SWI, like most prologs provides fast first-argument indexing. Accessing a large database of facts via other arguments can be very slow, as a sequential scan is used. SWI provides index/1, but it doesn’t appear to be very effective.

The index_util module provides faster indexing by rewriting fact clauses to provide multiple entry points.

From the documentation:

This is designed to be a swap-in replacement for index/1. Indexing a
fact with M arguments on N of those arguments will generate N sets of
facts with arguments reordered to take advantage of first-argument
indexing. The original fact will be rewritten.

For example, calling:


materialize_index(my_fact(1,0,1)).

will retract all my_fact/3 facts and generate the following clauses in its place:


my_fact(A,B,C) :-
nonvar(A),
!,
my_fact__ix_1(A,B,C).
my_fact(A,B,C) :-
nonvar(C),
!,
my_fact__ix_3(C,A,B).
my_fact(A,B,C) :-
my_fact__ix_1(A,N,C).

here my_fact__ix_1 and my_fact__ix_3 contain the same data as the original my_fact/3 clause. In the second case, the arguments have been reordered

Speed Improvements

Some users have reported perfomance gains of 1000x. For example, this post

Limitations

  • Single key indexing only. Could be extended for multikeys.
  • Reindexing is not a good idea. It could be smarter about this.
  • Should not be used on dynamic databases.

Does not have to be used with fact (unit clauses) - but the clauses should enumerable

graphviz and blip ontol

blip includes a generic grammar/writer for the graphviz language ‘dot’.

dot is actually quite powerful, and allows for specification of boxes inside boxes. For example, the following blip command line call:


blip -r fma ontol-subset -n Heart -cr subclass -to display

will generate and display a png such as this:
Heart

The ontol/conf directory specifies a number of configulation modules for the ontol library. These can be specified with the "-u" option on the command line. These allows things such as color-coding by ontology. For example "ontol_config_uberon" allows generation of diagrams such as:
phalanx

memoization+persistence

The mis-named “tabling” module (should really be called “memoize”) now allows for persisting memoized calls to a file. See:

http://github.com/cmungall/blipkit/blob/master/packages/blipcore/tabling.pro

(docs up on the pldoc server soon).

To see how this works, consider the transitive closure of the subclass/2 fact, as defined in the ontol_db schema:


subclassT(X,Y):- subclass(X,Y).
subclassT(X,Y):- subclass(X,Z),subclassT(Z,Y).

if you end up using a predicate such as this frequently in one session, you can do this at the start of the session


:- use_module(bio(tabling)).

init :- table_pred(ontol_db:subclassT/2).

This rewrites subclassT/2 behind the scenes. See the code for details.

After loading some subclass/2 facts (e.g. from GO), you then call:


forall(subclassT('GO:0006915',X), writeln(X)). % all ancestors of apoptosis

The first time you call this, the original code is called. The second time you call this, it checks to see if it knows the answer for 'GO:0006915' - it does - it then returns the previously calculated results, which have been asserted to memory.

However, this caching is lost when the prolog db is destroyed at the end of the session. Now you can persist this:


persistent_table_pred(ontol_db:subclassT/2, 'my_cache.pl').

The first time this is called, cache.pl is created. All results of subclassT/2 calls are saved there.

After the session the file is retained. In future sessions, if this is called again, the cache is loaded into memory and appended with the results of future calls.

In future, the module may also be extended to be made 'hookable', with hooks provided for caching to a relational database.

Follow

Get every new post delivered to your Inbox.