Domain Specific Information Extraction for Semantic Annotation : A Master Thesis from a Joint European Master Program in Language and Communication Technologies (LCT)

Bok av Zeeshan Ahmed

The main problem with Semantic Annotation is availability of ontology for the domain. Ontology comprises of concept and relationships. In an ontology, a concept may be atomic or defined by a set of properties. This set of properties classifies the concept with other concept in ontology. In this thesis, we present an approach that deals with semantic annotation using properties of concept than simple instance matching technique currently available. In this approach, the document is analyzed for the purpose of identifying these properties using ontology. If the properties found in document match with properties of any concept in ontology, the document is annotated with that concept. In this way, documents are indexed according to these properties. The main target of this thesis is to present approaches of how these properties can be extracted from documents; both for the purpose of semantic annotation and ontology building. To achieve this target, two different approaches to information extraction are presented for Semantic Annotation; namely "Rule Based" and "Dependency Based".