Our primary goal is to design and develop effective kernels for chemical molecules. Here effectiveness refers to the predictive power as well as the runtime efficiency of the kernels. So far we focussed mainly on making use of the 2D structure of the molecules, in future work we will strive to exploit the 3D structure of the molecules more.
Predictive Graph Mining
Graphs are a major tool for modelling objects with complex data structure. Devising learning algorithms that are able to handle graph representations is thus a core issue in knowledge discovery with complex data. Two separate challenges that we tackle are (i) estimation of a function on the set of all graphs and (ii) estimation of a function on the set of vertices of a graph.
Semi-Supervised Regression and Ranking
While transductive and semi-supervised classification is part of the state-of-the-art in machine learning, transductive and semi-supervised ranking and regression are largely understudied. This is in contrast to the need for regression and ranking algorithms in real world problems in which obtaining labelled data is much more expensive and time consuming than obtaining unlabelled data.
Statistical Relational Learning
We study and develop probabilistic machine learning and data mining techniques for structured domains, i.e., domains which are best represented using probabilistic models with a variable number of objects and relations among them. Example domains include bioinformatics, transportation systems, communication networks, social network analysis, robotics, among others. The structures encountered can be as simple as sequences and trees (such as those arising in protein secondary structure prediction and natural language parsing) or as complex as citation graphs, the World Wide Web, and even relational data bases.
Design and analysis of learning algorithms for predicting combinatorial structures.
There are estimates that more than 80% of the information in organizations and enterprises is unstructured text. Text mining is devoted to extracting information, relations and knowledge from unstructured text. It uses automatic data mining and machine learning methods in conjunction with linguistic and other techniques to capture the "meaning" of text.
Transduction on Massive Extensional Databases
One common problem of many transductive and semi-supervised learning approaches is that they scale badly with the amount of unlabelled data, which prohibits the use of massive sets of unlabelled data. We will thus develop transductive and semi-supervised learning algorithms that scale only linearly with the amount of unlabelled data.
More to come ...
This list is still incomplete - entries coming up on research in spatial mining, multi media mining, etc