Category Archives: ontologies and modeling

Unfolding T-boxes in GO using POPL

“Unfolding a T-box” may sound like some quaint tea ceremony ritual, but in fact in the context of description logics it refers to the iterative replacement of classes by equivalent anonymous class expressions.

Many reasoners take advantage of T-box unfolding behind the scenes. But there may be reasons to unfold your T-box in a more public fashion.

For example, there are frequently criticisms of highly specific wordy GO terms such as:
GO:2001043 positive regulation of cytokinetic cell separation involved in cell cycle cytokinesis

Whilst not as unwieldy as some of the infamous ICD9 examples, some feel that this is taking pre-composition too far. In fact these detailed pre-composed terms are very useful for systems that aren’t capable of consuming anonymous class expressions. However, for some purposes it may be useful to replace this with a nested class expression.

This can be done in 3 lines with a POPL script:

class(X) ===> null where X==_.
annotationAssertion(_,X,_) ===> null where X==_.

X ===>Y where X==Y.

The first two lines remove class declarations and annotation assertions (e.g. label assignments) for any defined classes. The final line does all the work: it replaces every occurrence of a defined class with the equivalent class expression.

This means if we have the following gene association (e.g. from a GAF file):

Class: :Gene1234
Types: 'positive regulation of cytokinetic cell separation involved in cell cycle cytokinesis'

it will be translated to:

Class: :Gene1234
  capable_of some 
    ('biological regulation'
     and (positively_regulates some 
        ('cytokinetic cell separation'
         and (part_of some 
             and (part_of some 'cell cycle'))))))

We can also choose to selectively unfold – e.g. unfold all regulation terms:

X ===>Y where (X==Y, Y='biological regulation and _).

Resulting in:

Class: :Gene1234
  capable_of some
    ('biological regulation'
      and (positively_regulates some 'cytokinetic cell separation involved in cell cycle cytokinesis'))

The same thing could be done in java, but would require significantly more code and messing around with visitor classes. The results would be less declarative, and harder to customize.

What if we want to perform the reverse operation? This is similar to finding the most specific subsuming class, which is a standard reasoner operation. However, in this case we want to find a more specific class expression, which is slightly more difficult. This might be the topic of a future post.


Prolog OWL Shell at OWLED 2011

My slides and paper for POSH are now online:

Manuscript: owled2011_submission_15.pdf

Slides: POSH (slideshare)

Posh — the Prolog OWL Shell

Posh (Prolog OWL Shell) is a command line utility
that wraps the Thea OWL library to allow for advanced querying and
processing of ontologies, combining the power of prolog and OWL


Install SWI

Download and install SWI-Prolog ( This is a
simple point and click procedure for most platforms. If you want to use
reasoners such as Pellet, make sure you have JPL installed (see
troubleshoooting section).

Posh Git

Get the latest version of Thea from github. Git clone is
recommended, but you can also use the githb download link.

Assuming you placed the project in your toplevel directory, set your path:

export PATH="$PATH:$HOME/thea"

You’re now set to use Posh

Getting Started

First we start thea using the –shell option. We also start it in JPL
mode, as we will be making use of the OWLAPI to interface with
reasoners. We’ll also load the OWL translation of the fly anatomy
ontology, from the OBO library.

thea-jpl --jvm-opt -Xmx2048M --shell 

This starts us up in an ehanced prolog shell with the fly anatomy
loaded. You can make arbitrary prolog queries, as in any prolog shell,
for example:

?- member(X,[a,b,c]).
X = a ;
X = b ;
X = c.

If you don’t know prolog, you should still be able to get by. The
crucial syntax to remember is that variables commence with an
uppercase letter (or ‘_’) and each line should be terminated by a
‘.’. Also, if you get stuck, type ‘help.’ for a list of commands.

Let’s start by checking our list of ontologies:

?- ls.

The first thing we’ll do is set some display options. fbbt uses labels
for all classes, so we will make sure these are displayed.

 ?- set display + labels.

We will also choose to display all class expressions as ‘plsyn’. This
looks like a mixture of DL syntax and manchester syntax. Why another
syntax? Because plsyn is pure prolog, which means it can be used
directly in prolog queries and operations. It’s not the most
user-friendly syntax, but it’s worthwhile getting to know if you
intend to be doing any advanced operations from within this shell.

 ?- set display + plsyn.

These settings are automatically saved in your ~/.thearc file.

Type “settings.” to see the full list. To clear the display settings:

 ?- unset display.

You should stick with labels and plsyn for this tutorial. Other useful
values are “tabular” and “combined”.

Initial exploration

We can use the l/1 command to find all axioms associated with a class (in this case ‘wing’):

?- l wing.
% Axiom Type: annotationAssertion
annotationAssertion('',wing,'A flight organ of the adult external thorax that is derived from a dorsal mesothoracic disc.').
annotationAssertion('',annotation(wing,'','A flight organ of the adult external thorax that is derived from a dorsal mesothoracic disc.'),'FBC:gg').
% Axiom Type: class
class wing.
% Axiom Type: subClassOf
'chordotonal organ of wing'<part_of some wing.
'wing hair'<part_of some wing.
wing<develops_from some 'wing disc'.
wing<part_of some 'adult mesothoracic segment'.
wing<part_of some 'adult external thorax'.
tegula<part_of some wing.
'wing hinge'<part_of some wing.
'wing septum'<part_of some wing.
'wing margin'<part_of some wing.
'wing blade'<part_of some wing.
'dorsal wing blade'<part_of some wing.
'ventral wing blade'<part_of some wing.

If the term of interest starts with an uppercase character, or includes a space, you need to quote the label:

?- l 'wing disc'.

Quotes must be escaped:

?- l 'Wheeler\'s organ'.

You can list add the axioms in the current ontology by typing
“lsa.”. This might produce quite a long list. You can query more
precisely using prolog, but if you’re not ready for that, you can rely
on your unix wizardry skills:

?- lsa -- 'grep wing'.

The “–” predicate will pipe the results of the posh command to any
unix command or script.

Trees and graphs

We can draw ascii trees showing the denormalized subclass hierarchy tree using t/1:

?- t wing.
anatomical entity
.material anatomical entity
..anatomical structure
...organism subdivision

Note that this is just the asserted developmental axioms – so
far these are just queries on the ontology structure – we will get to
entailed facts later on.

Use the v/1 command to visualize an object (graphviz required):

?- v 'wing disc'.

By default, this will show the closure over subclass axioms, as well
as positive restrictions. The default behavior can be controlled, and
the graphviz display is highly configurable, but this is out of scope
for this tutorial.

Prolog query shorthand

You can use this shell to query any of the prolog facts in the owl
model database. For example, to find all development axioms:

?- labelAnnotation_value(DF,develops_from),
DF = '',
X = '',
Y = '' ;
DF = '',
X = '',
Y = '' ;
DF = '',
X = '',
Y = '' ;
DF = '',
X = '',
Y = '' .

This isn’t very meaningful – you can use labelAnnotation/2 to map results back to labels:

?- labelAnnotation_value(DF,develops_from),

But this is a bit tedious. Posh has the convenience command q/1
which launches a query takes care of all label/IRI mapping for you

?- q X < develops_from some Y.
'germ layer derivative'<develops_from some 'germ layer'.
'dorsal ridge'<develops_from some 'dorsal ridge primordium'.
'pole bud'<develops_from some 'pole plasm'.
amnioserosa<develops_from some 'amnioserosa primordium'.
ectoderm<develops_from some 'ectoderm anlage'.
'dorsal ectoderm'<develops_from some 'dorsal ectoderm anlage'.
'anterior ectoderm'<develops_from some 'anterior ectoderm anlage'.
'posterior ectoderm'<develops_from some 'posterior ectoderm anlage'.
endoderm<develops_from some 'endoderm anlage'.
mesoderm<develops_from some 'mesoderm anlage'.

Note we’re also using the infix operator ‘<' as a shorthand for the
prolog subClassOf/2 predicate, and the infix operator 'some' as
shorthand for someValuesFrom terms. The result is a bastard hybrid of
prolog, DL syntax by way of Manchester. Unfortunately the default
output is quite stingy with whitespace, and is overly compact.

The following shows all subclass axioms:

?- q _<_.

The list of axioms is quite large. We can filter this set using any
unix command, such as perl or grep using ‘–‘.

?- q X < develops_from some Y -- 'grep muscle'.
'embryonic/larval somatic muscle'<develops_from some 'somatic muscle primordium'.
'longitudinal muscle'<develops_from some 'longitudinal visceral muscle primordium'.
'prothoracic pharyngeal muscle'<develops_from some 'dorsal pharyngeal muscle primordium'.
'abdominal ventral acute muscle 1'<develops_from some 'abdominal ventral acute muscle 1 founder cell'.
'abdominal ventral acute muscle 2'<develops_from some 'abdominal ventral acute muscle 2 founder cell'.
'abdominal ventral acute muscle 3'<develops_from some 'abdominal ventral acute muscle 3 founder cell'.
'abdominal dorsal oblique muscle 3'<develops_from some 'abdominal dorsal oblique muscle 3 founder cell'.
'abdominal lateral oblique muscle 1'<develops_from some 'abdominal lateral oblique muscle 1 founder cell'.
'abdominal dorsal transverse muscle 1'<develops_from some 'abdominal dorsal transverse muscle 1 founder cell'.
'abdominal ventral transverse muscle 1'<develops_from some 'abdominal ventral transverse muscle 1 founder cell'.
'adult myoblast'<develops_from some 'adult muscle precursor primordium'.
'circular visceral muscle fiber'<develops_from some 'circular visceral muscle primordium'.
'dorsal pharyngeal muscle primordium'<develops_from some 'dorsal pharyngeal muscle anlage'.
'dorsal pharyngeal muscle primordium'<develops_from some 'head mesoderm'.
'adult muscle precursor primordium'<develops_from some 'trunk mesoderm'.
'somatic muscle primordium'<develops_from some 'somatic mesoderm'.
'visceral muscle primordium'<develops_from some 'visceral mesoderm'.
'larval muscle system'<develops_from some 'embryonic muscle system'.
'embryonic gonadal sheath muscle'<develops_from some 'gonadal sheath proper primordium'.
'larval gonadal sheath muscle'<develops_from some 'embryonic gonadal sheath muscle'.
'hindgut visceral muscle fiber'<develops_from some 'hindgut visceral muscle primordium'.
'foregut visceral muscle fiber'<develops_from some 'foregut visceral muscle primordium'.
'embryonic/larval midgut longitudinal visceral muscle'<develops_from some 'midgut longitudinal visceral muscle primordium'.
'circular visceral muscle primordium'<develops_from some 'visceral mesoderm'.
'esophageal visceral muscle primordium'<develops_from some 'head mesoderm'.
'esophageal visceral muscle'<develops_from some 'esophageal visceral muscle primordium'.

In Posh, ‘==’ translates to equivalent_to/2, and ‘and’ translates
to intersectionOf, so we can ask for all definitions that
directly use the neuron class:

?- q X == neuron and A.
'abdominal neuron'==neuron and part_of some abdomen.
'A8 neuron'==neuron and part_of some 'abdominal segment 8'.
'prothoracic anterior fascicle neuron'==neuron and fasciculates_with some 'prothoracic intersegmental nerve'.
'prothoracic posterior fascicle neuron'==neuron and fasciculates_with some 'prothoracic segmental nerve'.
'mesothoracic anterior fascicle neuron'==neuron and fasciculates_with some 'mesothoracic intersegmental nerve'.
'mesothoracic posterior fascicle neuron'==neuron and fasciculates_with some 'mesothoracic segmental nerve'.
'metathoracic anterior fascicle neuron'==neuron and fasciculates_with some 'metathoracic intersegmental nerve'.
'metathoracic posterior fascicle neuron'==neuron and fasciculates_with some 'metathoracic segmental nerve'.
'abdominal anterior fascicle neuron'==neuron and fasciculates_with some 'abdominal intersegmental nerve'.
'abdominal posterior fascicle neuron'==neuron and fasciculates_with some 'abdominal segmental nerve'.
'peptidergic neuron'==neuron and releases_neurotransmitter some peptide.
'sensory neuron'==neuron and has_function_in some 'detection of stimulus involved in sensory perception'.

Editing the ontology

The commands add/1 and rm/1 will assert and retract to the current
ontology. When you’re done use can use save_axioms/2 to persist your
results to a file.

?- add brain < has_part some neuron.
% Asserting brain<has_part some neuron.

You can always undo:

?- undo.
% Undo: Asserting brain<has_part some neuron.

If you have asserted multiple facts, you will keep getting prompted
until you hit return or have undone all facts you added. If you change
your mind again, just type “redo.”.

Using a reasoner

Thea comes pre-packaged with both java reasoners and a prolog
rule-based reasoner. We’ll assume Pellet, a java reasoner.

Start the reasoner like this:

?- init pellet.
% library(thea2/owl2_java_owlapi) compiled into owl2_reasoner 0.01 sec, 13,608 bytes
% initializing: reasoner 0.76
% completed: reasoner 21.82 time: 21.06

Behind the scenes, Posh has established contact with the java
OWLAPI. If you’re having java issues, or want to use a boutique
reasoner the OWLAPI can’t talk to, you can always talk to a reasoner
via OWLLink – but that’s the subject of another tutorial.

The qi/1 predicate is similar to q/1, but the query is passed to the
reasoner. Here’s how we find all neurons:

?- qi X < neuron.
'lamina receptor cell R1'<neuron.
'abdominal 1 desC neuron'<neuron.
'abdominal 2 desC neuron'<neuron.
'lamina receptor cell R3'<neuron.

The results are all entailed subclass axioms. If you just want the
classes you can use the SELECT .. WHERE .. idiom:

?- qi X where X < neuron.
'lamina receptor cell R1'.
'abdominal 1 desC neuron'.
'abdominal 2 desC neuron'.
'lamina receptor cell R3'.
'lamina receptor cell R2'.

Again the list is quite large. The unix ‘head’ and ‘tail’ commands are
quite convenient here:

?- qi X < neuron -- head.

You can use any DL expression, for example someValuesFrom restrictions:

?- qi X < overlaps some neuron.
'photoreceptor cell R2 pigment granule'<overlaps some neuron.
'photoreceptor cell R5 pigment granule'<overlaps some neuron.
neurite<overlaps some neuron.
'photoreceptor cell R6 pigment granule'<overlaps some neuron.
synapse<overlaps some neuron.
'photoreceptor cell R3 pigment granule'<overlaps some neuron.
'photoreceptor cell R7 pigment granule'<overlaps some neuron.
'end plate'<overlaps some neuron.

Or intersections:

?- qi X < neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron MN5'<neuron and overlaps some 'muscle cell'.
'b2 motor neuron'<neuron and overlaps some 'muscle cell'.
'cibarial pump muscle neuron'<neuron and overlaps some 'muscle cell'.
'III1 motor neuron'<neuron and overlaps some 'muscle cell'.
'Nothing'<neuron and overlaps some 'muscle cell'.
'I1 motor neuron'<neuron and overlaps some 'muscle cell'.
'direct flight muscle motor neuron'<neuron and overlaps some 'muscle cell'.
'III3 motor neuron'<neuron and overlaps some 'muscle cell'.
'b1 motor neuron'<neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron MN3'<neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron MN4'<neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron MN1'<neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron MN2'<neuron and overlaps some 'muscle cell'.
'tergal depressor of trochanter muscle motor neuron'<neuron and overlaps some 'muscle cell'.
'indirect flight muscle motor neuron'<neuron and overlaps some 'muscle cell'.
'dorsal tp motor neuron'<neuron and overlaps some 'muscle cell'.
'ventral tp motor neuron'<neuron and overlaps some 'muscle cell'.
?- qi X < neuron and overlaps some 'muscle cell' -- wc.
      17     160    1202

Note that nothing is asserted to overlap in fbbt. Where is the
reasoner getting these inferences from? Let’s have a look at the property:

 ?- l overlaps.
* Axiom Type: annotationAssertion
* Axiom Type: objectProperty
* Axiom Type: subPropertyOf

part_of holds whenever overlaps holds (the “@<" is plsyn for
subPropertyOf/2), so in fact our query above gives the same results as
querying for parts of a muscle cell.

In fact, overlaps should hold whenever we have a chain of has_part and
part_of. Let's add this axiom. The plsyn for subproperties and property
chains is a bit abstruse and non-obvious:

?- add overlaps @< has_part*part_of.

You can always type the full prolog functional syntax if you prefer:

?- add subPropertyOf(overlaps,propertyChain([has_part,part_of])).

(the add/1 command takes care of translating your labels to IRIs).

With the current version of fbbt we have to also add this:

?- add has_part inverseOf part_of.

Unfortunately, this has the negative consequence of slowing Pellet
down a lot. Let’s backtrack.

?- undo.
% Undo: Asserting has_part inverseOf part_of.
true ;
% Undo: Asserting propertyChain([has_part,part_of])@<overlaps.

Advanced Ontology Processing

Let’s say we want to start asserting spatial non-overlaps in our
ontology – for example, a leg and a wing have no parts in common:

?- add wing < not has_part some (part_of some leg).

but it would be very tedious to do this for all possible
partitions. What if instead we start from the basis that the current
ontology is correct, and assert non-overlap for all sibling parts that
cannot be proved to overlap?

First let’s do some exploration – let’s try and query for
part_of-siblings directly asserted to be parts of the leg:

?- q row(P1,P2) where P1 < part_of some leg, P2 < part_of some leg.

Note this includes reflexive pairs. We can exclude reflexive pairs by
doing an equality test. We might try and do this in this way:

?- q row(P1,P2) where P1 < part_of some leg, P2 < part_of some leg, P1\=P2.

But this returns no results. Why? We can check using the tr/2
predicate to translate our shorthand above to the actual prolog goal:

?- tr((P1 < part_of some leg, P2 < part_of some leg, P1\=P2),X).
X = (subClassOf(P1, someValuesFrom('', 
     subClassOf(P2, someValuesFrom('', 
     differentIndividuals([[P1, P2]])).

Here “\=” is being translated to differentIndividuals/2. Now that we
are doing more advanced hybrid queries we have to exercise a little
more control by being explicit about which parts are shorthand. We can
use pq/1 to execute a query without implicit translation and
explicitly translate using g/1:

?- pq row(P1,P2) where g((P1 < part_of some leg, P2 < part_of some leg)), P1\=P2.
row(coxa,'tarsal segment').
row(tibia,'tarsal segment').

In the above, only the sections inside the “g()” term are translated
from the shorthand syntax.

We’re still fetching symmetric pairs. We can avoid this using the
prolog @< comparison operator (which again has a different meaning in
our shorthand):

?- pq row(P1,P2) where g((P1 < part_of some leg, P2 < part_of some leg)), P1@<P2.

We can go ahead and assert non-overlap for all direct parts of the leg:

?- add P1 < not has_part some part_of some P2
    where g(P1 < part_of some leg), g(P2 < part_of some leg), P1\==P2.
% Asserting coxa<not has_part some part_of some tibia.
% Asserting coxa<not has_part some part_of some trochanter.
% Asserting coxa<not has_part some part_of some femur.

But we probably shouldn’t do this – our assumption is too
strong. There’s no reason to believe that all parts of the leg should
be spatially disconnected.

To see a counter-example, try:

?- pq row(X,P1,P2) where 
    W='embryonic head',g(P1 < part_of some W), g(P2 < part_of some W), P1@<P2, 
    gi(X < part_of some P1 and part_of some P2).
row('deutero/tritocerebral embryonic fiber tract founder cluster','embryonic antennal segment','embryonic intercalary segment').
row('A subperineurial glial cell (subesophageal)','embryonic mandibular segment','embryonic maxillary segment').
row('A subperineurial glial cell (subesophageal)','embryonic mandibular segment','embryonic labial segment').
row('A subperineurial glial cell (subesophageal)','embryonic maxillary segment','embryonic labial segment').

Here we are querying the asserted database (via g/1) for part-siblings
and the inferred database (via gi/1) to see if there are parts shared
by those part-siblings. We can see there are some parts shared by some
pairs – if we were to assert that these were spatially disconnected
then we would get unsatisfiable classes next time we classified.

We can then write a command that asserts spatial disconnection only in
those cases where it can’t be proved there is an existing overlap:

?- add P1 < not has_part some part_of some P2 where W='embryonic head',g(P1 < part_of some W), g(P2 < part_of some W), P1\=P2, \+ gi(X < part_of some P1 and part_of some P2).
% Asserting 'embryonic ocular segment'<not has_part some part_of some 'embryonic procephalic segment'.
% Asserting 'embryonic ocular segment'<not has_part some part_of some 'embryonic labral segment'.
% Asserting 'embryonic ocular segment'<not has_part some part_of some 'embryonic antennal segment'.

This is probably still too eager, as we making a closed world
assumption here (the “\+” is a prolog not, which means “cannot be
proved”). There may well be shared parts not yet in our
ontology. Nevertheless, it can be useful to make our initial
constraints too strong and progressively weaken them.

We can write a prolog clause that will execute this update for any
value of W. Use ctrl-z to suspend the shell:

cat >
assert_non_overlap(W) :-
   P1 < not has_part some part_of some P2 
   g(P1 < part_of some W), g(P2 < part_of some W), 
   \+ gi(X < part_of some P1 and part_of some P2).

In future sessions just type “consult(assert_non_overlap).” to load
this program.

Template-based search and replace

We can perform even more powerful translations on the ontology using
POPL. To illustrate, let’s add some ‘overlaps’ axioms:

 ?- add 'nervous system' < overlaps some head.
% Asserting 'nervous system'<overlaps some head.

We can then translate these axioms using the following POPL expression:

 ?- overlaps some Y ===> has_part some part_of some Y.

We can check the results:

 ?- l 'nervous system' -- 'grep head'.
'nervous system'<has_part some part_of some head.

You can use a where clause for selective replacement:

?- overlaps some Y ===> has_part some part_of some Y 
    where gi(Y < 'organism subdivision').

Ontology editing using templates

Create a file called “” with the following contents:

               class X,
               multi(W-subClassOf(X,part_of some W)),
               multi(Pre-subClassOf(X,develops_from some Pre))

You can load this from within a shell session by calling
“consult(nc)”. Execute it using “+” like this:

?- +nc.

A new IRI is generated, and you will be prompted for template values, the first one is the class label:


% Template: annotationAssertion(label,,literal(_G83))
% Enter value: _G83 >> 

Type a value and hit enter. Fill in values for other fields. If it’s a
multi-valued field, keep adding new entries and then hit “enter” when

% Enter value: _G83 >> neuron ABC
% Val=neuron ABC
annotationAssertion(label,'','neuron ABC').

% Template: subClassOf(,_G97)
% Enter value: _G97 >> neuron
% Val=neuron

% Template: subClassOf(,part_of some _G108)
% Enter value: _G108 >> mushroom body
% Val=mushroom body

% Template: subClassOf(,develops_from some _G122)
% Enter value: _G122 >> 

The database hasn’t changed yet. The axioms to be added are
summarized, then you’re prompted:

% ------------------
% ------------------
% class''.
% annotationAssertion(label,'','neuron ABC').
annotationAssertion('','',literal('neuron ABC')).
% ''<neuron.
% ''<part_of some 'mushroom body'.
% OK? [enter for yes, any other char for no]

% Set ontology to
% Asserting class''.
% Asserting annotationAssertion(label,'neuron ABC','neuron ABC').
% Asserting 'neuron ABC'<neuron.
% Asserting 'neuron ABC'<part_of some 'mushroom body'.
% AXIOMS ADDED. Type "undo." to retract.

Web interface

It’s also possible to start up bomshell with an embedded ClioPatria
web server. This allows simultaneous querying and ontology
manipulation through both the command line and a web browser pointing
to localhost. Configuring the webserver is outside the scope of this

Finding out more

The shell interface is a quick and dirty wrapper around other Thea
modules. You can figure out exactly what a shall command does by using
the prolog “listing” command. E.g. “listing(q).”.

Consult the Thea OWL Lib
for more information.


JPL installation

The current SWI dmg file for Snow Leopard doesn’t appear to install
JPL correctly. This may be temporary. You may have to obtain the SWI source and do this:

cd packages/jpl
sudo make install

Support for OWLAPIv3 in Thea2

Thea2 now wraps OWLAPI v3. Support is provided for pellet and hermit out of the box.

There is now also additional command line support.

  • download and install SWI-Prolog
  • download Thea2 (for now you have to git clone and get the owlapi3 branch)
  • Add Thea2 to your path:

(assuming you install in ~/thea2)

export PATH="$PATH:$HOME/thea2"

You can then use the thea-jpl script, which takes care of JPL setup, adding the owlapiv3 jars to your classpath etc:


thea-jpl testfiles/pizza.owl --reasoner pellet --reasoner-ask-all

shows all inferred axioms

For more examples, see Cookbook.txt

Thea paper accepted for OWLED2009

Provisional version of the paper up on:

(new) Thea website

(pending reivisions)

ICLP 2009

My invited talk from ICLP-2009 is available from slideshare:


Unfortunately I didn’t get to go into detail in some sections, particularly Thea.

Overall I had an excellent time, good to meet many LP luminaries and users.

Towards portability with Thea2

The Thea OWL package is currently SWI-specific. It would be nice to use this with other prologs, particularly to take advantage of tabling in combination with DLP programs generated from OWL.

I’m impressed by the prolog-commons effort, particularly the convergence we are seeing between Yap and Prolog. Currently the core parts of Thea work with Yap, although there are some annoyances (Yap is unfamiliar with the useful debug/3). Unfortunately the excellent semweb package is still SWI-specific, so you will need to convert your ontology to axioms in prolog syntax first. OWL-XML should in principle be possible, as Yap-6 includes the SWI sgml package, although this appears not be working yet.

For other prologs the lack of a standard module system is the main hindrance. I have added a simple translator to the Thea2 makefile that will strip module declarations generating mostly ISO conformant prolog that can be used with GNU Prolog and XSB. Again, this is just for the core parts. XSB does include the sgml package so parsing OWL-XML is possible with some difficutly, although there are some annoyances such as incompatibilities in the load_structure/3 predicate.

I’m encouraged to hear that these 4 open source prologs are converging on a standard module system, so we should have better compatbility in the future. Converging on a FLI may be too much to ask, so it would be useful to have prolog implementations of xml and rdf parsing to use as fallbacks if the C libs are not present or usable.

Translating between logic programs and OWL/SWRL

Using the Thea2 library, it’ possible to translate certain OWL-DL ontologies into logic programs, and then query over them using LP systems such as Yap or XSB. Only the DLP subset is translated, and care must be taken to avoid common pitfalls.

It’s now also possible to do the reverse translation for a similar subset of LP programs. For example, if we have a file, with contents:
uncle_of(U,C) :- brother_of(U,P),parent_of(P,C).

We can do a simple syntactic translation to SWRL from the command line as follows:
swipl -g "[owl2_io],convert_axioms('',pl_swrl, 'uncle.swrl',owl, [])"

This yields the rather voluminous SWRL ontology:

<swrl:argument1 rdf:resource="#v1"/>
<swrl:argument2 rdf:resource="#v3"/>
<swrl:propertyPredicate rdf:resource=""/>
<swrl:argument1 rdf:resource="#v3"/>
<swrl:argument2 rdf:resource="#v2"/>
<swrl:propertyPredicate rdf:resource=""/>
<rdf:rest rdf:resource="&rdf;nil"/>
<swrl:argument1 rdf:resource="#v1"/>
<swrl:argument2 rdf:resource="#v2"/>
<swrl:propertyPredicate rdf:resource=""/>
<rdf:rest rdf:resource="&rdf;nil"/>

Personally I prefer authoring rules in prolog syntax in emacs and then converting via Thea, but YMMV.

We can also translate the SWRL into a property chain axiom subPropertyOf(propertyChain([uncle_of,brother_of]),uncle_of), eliminating the rule altogether:

swipl -g "[owl2_io],convert_axioms('',pl_swrl_owl, 'uncle.owl',owl, [])"

If you look at the results in Protege4 you will see:

brother_of o parent_of ➞ uncle_of

Other patterns also recognized include domain/ranges, transitivity, invereProperties, etc

These kind of syntactic translations are useful for interoperability. The benefits of writing ontology fragments as rules becomes more apparent if we consider the following program

% if part_of holds for a process, the specific
% relation is processual_part_of
processual_part_of(P,W) :- part_of(P,W),process(W). %
% if part_of holds for an object, the specific
% relation is spatial_part_of
static_part_of(P,W) :- part_of(P,W),object(W). %

The two rules in this program can distinguish two relations based on the domain of the relation. For example, we can ask

?- processual_part_of(X,Y).

And get back the answers p1-p2, but not ob1-ob2.

This is a fairly trivial program. Yet it’s not immediately obvious how to code this in OWL. There is a way but it is rather ingenious/baroque, involving the creation of two fake properties and some self-restrictions and property chains.

Fortunately, Thea2 will help us with this too.

swipl -g "[owl2_io],convert_axioms('testfiles/',pl_swrl_owl,'ppo.owl',owl,[])"

Here I show the results in Thea/owlpl syntax:

subClassOf('_d:process', hasSelf('_d:process_p')).
subClassOf('_d:object', hasSelf('_d:object_p')).
subPropertyOf(propertyChain(['_d:part_of', '_d:process_p']), '_d:processual_part_of').
subPropertyOf(propertyChain(['_d:part_of', '_d:object_p']), '_d:static_part_of').
propertyAssertion('_d:part_of', p1, p2).
propertyAssertion('_d:part_of', ob1, ob2).
classAssertion('_d:process', p1).
classAssertion('_d:process', p2).
classAssertion('_d:object', ob1).
classAssertion('_d:object', ob2).

here all the nasty ‘fake property’ stuff is taken care of for us behind the scenes.

We can demonstrate it works with the following sparql query via Pellet2:

SELECT * WHERE {?x <; ?y}

which yields:

Query Results (1 answers):
x | y
p1 | p2

Of course, there are many prolog programs that cannot be translated to OWL, and many OWL ontologies for which logic programs cannot be created. Each paradigm has its own strengths and weaknesses. Greater interoperability between the two can only help.

Can Hibernate do this?

The impedance mismatch problem is a well known one in mapping between an object-oriented language and a relational database. Tools such as Hibernate for Java make this easier for the ‘simple’ cases but can be awkward to use where one wants to take advantage of complex query processing within the DBMS. In contrast, prolog is a natural extension to the relational model (for the most part), which makes shifting between the two paradigms much easier.

Draxler’s sql_compiler can be used to easily map prolog goals to SQL queries. We can also use some neat tricks to dynamically map portions of prolog programs (specifically those in the pure prolog subset with no recursion) to on-the-fly SQL.

This blog post covers some of the methods that can be used.

Finding the least common ancestor of two classes in an ontology

In, the predicate subclass/2 is used to relate a class
to its superclass. For example:


This is an extensional predicate – i.e. a clause with only a head, no body that is designed to be asserted at run-time or compiled from a prolog database.

subclassT/2 is the transitive version of this predicate, and subclassRT/2 is the
reflexive transitive version (i.e. subclassRT(human,human)). These are intensional predicates – they are expressed as rules.

The class_pair_subclass_lca/3 definition looks like this:

      \+ (( subclassRT(X,CA),

i.e. LCA is the lowest common ancestor of X and Y by subclass if:

  • subclassRT/2 holds between X and LCA (i.e. LCA is an ancestor of or identical to X)
  • subclassRT/2 holds between Y and LCA (i.e. LCA is an ancestor of or identical to Y)
  • there is no common ancestor of X and Y that is more ‘recent’ than LCA
    (We could avoid some repetition by defining CA first, and then LCA from there)

The above predicate will work given a prolog database of subclass/2 facts. For example, the NCBI Taxonomy (warning – large file).

If you have installed blip you can issue the following on the command line:

blip -i -f ontol_db:pro -u ontol_db\
 findall "(class(X,'Mus musculus'),class(Y,'Homo sapiens'),class_pair_subclass_lca(X,Y,A))" -select A -label

This is equivalent to the prolog program:

:- use_module(bio(ontol_db)).
:- use_module(bio(io)).
  class(X,'Mus musculus'),
  class(Y,'Homo sapiens'),

(This may take a while for the initial download, but the ontology is cached for future references)

After a short wait this will yield A=NCBITaxon:314146 (which has the scientific name ‘Euarchontoglires’. The -label option will use entity_label/2 to find the labels for class IDs). We can speed this up by using memoization/tabling of the recursive predicates, but that isn’t covered here.

Mapping to SQL

So far so good. This clause will also work for relational databases too. Here we demonstrate this using an instance of the OBD database in which the transitive closure of the relations are stored as extensional predicates in a table called ‘link’.

The first way to do this is to bind all predicates in the model to the database:

blip-obd -debug sql -r obd/pkb2 -sqlbind ontol_db:all -sqlbind metadata_db:all\
 findall "(class(X,'Mouse'),class(Y,'Human'),class_pair_subclass_lca(X,Y,A))" -select A -label

blip-obd is an alias for blip -u ontol_db -u blipkit_sql -u ontol_sqlmap_obd

ontol_sqlmap_obd contains mappings between ontol_db model facts and the obd schema.

It contains mapping rules such as the following:

ontol_db:subclassT(X,Y) <-
  node(XI,X,_,_),link(XI,RI,YI,_),node(YI,Y,_,_),node(RI,'OBO_REL:is_a',_,_),\+ (XI=YI).

The tables ‘node’ and ‘link’ are used for classes and transitive
reflexive relationships in OBD. We have to specify extra joins here
as like many relational databases this uses internal integer
artificial/surrogate keys. We don’t want to expose these at the ontol_db model
level, so they are hidden in the mapping.

We use sqlbind/2 to use all mappings in the sqlmap module.

Note that this results in class_pair_subclass_lca/3 goals being dynamically rewritten to a goal that executes a SQL query. This is done entirely automatically by examining the prolog rule – there is no human-specified mapping for this rule. The resulting SQL query is quite large and uses a SELECT .. WHERE NOT EXISTS … pattern with lots of joins. Here it is, in gory detail:

 'NIF_Organism:birnlex_167' ,
 'NIF_Organism:birnlex_516' ,
 node node_3 ,
 link link_1 ,
 node node_4 ,
 link link_2 ,
 node node_5 ,
 node node_6
 node_3.uid = 'NIF_Organism:birnlex_167' AND
 node_3.is_obsolete = 'f' AND
 link_1.node_id = node_3.node_id AND
 link_1.combinator = '' AND
 node_4.uid = 'NIF_Organism:birnlex_516' AND
 node_4.is_obsolete = 'f' AND
 link_2.node_id = node_4.node_id AND
 link_2.predicate_id = link_1.predicate_id AND
 link_2.object_id = link_1.object_id AND
 link_2.combinator = '' AND
 node_5.node_id = link_1.object_id AND
 node_5.is_obsolete = 'f' AND
 node_6.node_id = link_1.predicate_id AND
 node_6.uid = 'OBO_REL:is_a' AND
 node_6.is_obsolete = 'f' AND
 node node_7 ,
 link link_3 ,
 node node_8 ,
 link link_4 ,
 node node_9 ,
 link link_5 ,
 node node_10 ,
 node node_11
 node_7.uid = 'NIF_Organism:birnlex_167' AND
 node_7.is_obsolete = 'f' AND
 link_3.node_id = node_7.node_id AND
 link_3.combinator = '' AND
 node_8.uid = 'NIF_Organism:birnlex_516' AND
 node_8.is_obsolete = 'f' AND
 link_4.node_id = node_8.node_id AND
 link_4.predicate_id = link_3.predicate_id AND
 link_4.object_id = link_3.object_id AND
 link_4.combinator = '' AND
 node_9.node_id = link_3.object_id AND
 node_9.is_obsolete = 'f' AND
 link_5.node_id = link_3.object_id AND
 link_5.predicate_id = link_3.predicate_id AND
 link_5.combinator = '' AND
 node_10.node_id = link_5.object_id AND
 node_10.uid = node_5.uid AND
 node_10.is_obsolete = 'f' AND
 node_11.node_id = link_3.predicate_id AND
 node_11.uid = 'OBO_REL:is_a' AND
 node_11.is_obsolete = 'f' AND
 link_3.object_id != link_5.object_id);


In theory the RDBMS should be able to optimize this, but in practice, as more joins are added the optimizer suffers. Thankfully we have the option of choosing which clauses are rewritten to SQL and which are executed according to normal prolog WAM model.

Mixed prolog-SQL

The following only rewrites the transitive subclass predicates. class_pair_subclass_lca/3 is executed by the prolog engine:

blip-obd -debug sql -r obd/pkb2 -sqlbind ontol_db:class/1 \
   -sqlbind ontol_db:subclassT/2 -sqlbind ontol_db:subclassRT/2 -sqlbind metadata_db:all\
   findall "(class(X,'Mouse'),class(Y,'Human'),class_pair_subclass_lca(X,Y,A))" -select A -label

This results in multiple individual subclassT/2 and subclassRT/2 queries being executed. All CAs are checked until one that is the LCA is found. This more procedural approach may be less efficient, but it is less dependent on database join optimization, which makes its performance more predictable.

This kind of approach is only possible because a subset of prolog is declarative. In specifying what needs to be achieved rather than how, it’s possible to relegate the details to a configuration step.

A prolog library for OWL2 and SWRL

Ontologies are vital for the life sciences. The Web Ontology Language (OWL) offers decidability of reasoning, and now with OWL2 and SWRL reasonably high levels of expressivity.

Vangelis Vassilidis and I are writing Thea2, based on his original Thea library. The redesign introduces prolog predicates for every OWL2 axiom, and prolog terms for owl class and property expressions. We use the SWI-Prolog semweb library for reading/writing to RDF. There is also an (optional) JPL bridge wrapping the Manchester OWLAPI.

There are a number of different reasoning strategies, including:

  • simple but limited backward chaining reasoning
  • using Grosof’s translation to DLP in conjunction with systems such as Yap, XSB or DLV
  • using standard OWL reasoners via JPL (DIG interface from Thea1 still needs ported)

Source: github
Documentation: pldoc