Monthly Archives: June 2010

improving on SWI indexing on large databases of facts

SWI, like most prologs provides fast first-argument indexing. Accessing a large database of facts via other arguments can be very slow, as a sequential scan is used. SWI provides index/1, but it doesn’t appear to be very effective.

The index_util module provides faster indexing by rewriting fact clauses to provide multiple entry points.

From the documentation:

This is designed to be a swap-in replacement for index/1. Indexing a
fact with M arguments on N of those arguments will generate N sets of
facts with arguments reordered to take advantage of first-argument
indexing. The original fact will be rewritten.

For example, calling:


materialize_index(my_fact(1,0,1)).

will retract all my_fact/3 facts and generate the following clauses in its place:


my_fact(A,B,C) :-
nonvar(A),
!,
my_fact__ix_1(A,B,C).
my_fact(A,B,C) :-
nonvar(C),
!,
my_fact__ix_3(C,A,B).
my_fact(A,B,C) :-
my_fact__ix_1(A,N,C).

here my_fact__ix_1 and my_fact__ix_3 contain the same data as the original my_fact/3 clause. In the second case, the arguments have been reordered

Speed Improvements

Some users have reported perfomance gains of 1000x. For example, this post

Limitations

  • Single key indexing only. Could be extended for multikeys.
  • Reindexing is not a good idea. It could be smarter about this.
  • Should not be used on dynamic databases.

Does not have to be used with fact (unit clauses) – but the clauses should enumerable

graphviz and blip ontol

blip includes a generic grammar/writer for the graphviz language ‘dot’.

dot is actually quite powerful, and allows for specification of boxes inside boxes. For example, the following blip command line call:


blip -r fma ontol-subset -n Heart -cr subclass -to display

will generate and display a png such as this:
Heart

The ontol/conf directory specifies a number of configulation modules for the ontol library. These can be specified with the “-u” option on the command line. These allows things such as color-coding by ontology. For example “ontol_config_uberon” allows generation of diagrams such as:
phalanx

memoization+persistence

The mis-named “tabling” module (should really be called “memoize”) now allows for persisting memoized calls to a file. See:

http://github.com/cmungall/blipkit/blob/master/packages/blipcore/tabling.pro

(docs up on the pldoc server soon).

To see how this works, consider the transitive closure of the subclass/2 fact, as defined in the ontol_db schema:


subclassT(X,Y):- subclass(X,Y).
subclassT(X,Y):- subclass(X,Z),subclassT(Z,Y).

if you end up using a predicate such as this frequently in one session, you can do this at the start of the session


:- use_module(bio(tabling)).

init :- table_pred(ontol_db:subclassT/2).

This rewrites subclassT/2 behind the scenes. See the code for details.

After loading some subclass/2 facts (e.g. from GO), you then call:


forall(subclassT('GO:0006915',X), writeln(X)). % all ancestors of apoptosis

The first time you call this, the original code is called. The second time you call this, it checks to see if it knows the answer for ‘GO:0006915’ – it does – it then returns the previously calculated results, which have been asserted to memory.

However, this caching is lost when the prolog db is destroyed at the end of the session. Now you can persist this:


persistent_table_pred(ontol_db:subclassT/2, 'my_cache.pl').

The first time this is called, cache.pl is created. All results of subclassT/2 calls are saved there.

After the session the file is retained. In future sessions, if this is called again, the cache is loaded into memory and appended with the results of future calls.

In future, the module may also be extended to be made ‘hookable’, with hooks provided for caching to a relational database.

blipkit directory reorganization

previously the layout was

blipkit/
    packages/
        blip/
            [LIBRARY]

This has been simplified to:

blipkit/
    packages/
          [LIBRARY]/

Some URLs in some of the previous posts should be modified accordingly

In addition, each sub-library now has a standard organization:

[LIBRARY]/
    conf/ -- configuration modules
    t/ -- plunit tests (.plt files)
        data/ -- data for tests
    examples/ -- example code
    doc/ -- documentation