Skip to main content

HermiT: Reasoning with Large Ontologies

15th August 2008 to 14th February 2012
Ontologies are formal vocabularies of terms, often shared by a community of users. One of the most prominent application areas of ontologies is medicine and the life sciences. For example, the Systematised Nomenclature of Medicine Clinical Terms (SNOMED CT) is a clinical ontology which is being used in the UK Health Service's National Programme for Information Technology (NPfIT). Other examples include GALEN, the Foundational Model of Anatomy (FMA), the National Cancer Institute (NCI) Thesaurus, and the OBO Foundry -- a repository containing about 80 biomedical ontologies.

These ontologies are gradually superseding existing medical classifications and will provide the future platforms for gathering and sharing medical knowledge. Capturing medical records using ontologies will reduce the possibility for data misinterpretation, and will enable information exchange between different applications and institutions.

Medical ontologies are strongly related to description logics (DLs), which provide the formal basis for many ontology languages, most notably the W3C standardised Web Ontology Language (OWL). All the above mentioned ontologies are nowadays available in OWL and, therefore, in a description logic. The developers of medical ontologies have recognised the numerous benefits of using DLs, such as the clear and unambiguous semantics for different modelling constructs, the well-understood tradeoffs between expressivity and computational complexity, and the availability of provably correct reasoners and tools.

The development and application of ontologies crucially depend on reasoning. Ontology classification, i.e., organising classes into a specialisation/generalisation hierarchy, is a reasoning task that plays a major role during ontology development: it provides for the detection of potential modelling errors such as inconsistent class descriptions and missing sub-class relationships. For example, about 180 missing sub-class relationships were detected when the version of SNOMED CT used by the NHS was classified using the DL reasoner FaCT++. Query answering is another reasoning task that is mainly used during ontology-based information retrieval; e.g., in clinical applications query answering might be used to retrieve "all patients that suffer from nut allergies".

Despite the impressive state-of-the-art, modern medical ontologies pose significant challenges to both the theory and practice of DL-based languages. Existing reasoners can efficiently deal with some large ontologies, such as NCI, but many important ontologies are still beyond the reach of available tools. For example, none of the existing reasoners can successfully classify either GALEN or FMA.

Applications currently need to work around these limitations, e.g., by using subsets of ontologies that can be successfully processed. For example, the version of GALEN typically used in practice contains only about 20% of the axioms of the full version; this reduces the interaction between concepts and thus makes the ontology "processable". This is, however, highly undesirable in practice, because it reduces coverage, weakens the conceptualisation of the domain and may prevent the detection of modelling errors.

Furthermore, the amount of data used with ontologies can be orders of magnitude larger than the ontology itself. For example, the annotation of patients' medical records in a single hospital can easily produce data consisting of hundreds of millions of facts, and aggregation at a national level might produce billions of facts. Existing reasoners cannot cope with such data volumes, especially not if ontologies such as GALEN and FMA are used as schemata.

The goal of this project is to develop scalable reasoning algorithms and a prototypical implementation that can efficiently deal with large and complex ontologies and large data sets. Developing such a reasoner will be critical to the success of many ontology based applications.

Sponsors

Principal Investigator

People

Markus Krötzsch
Giorgos Stoilos

Share this: