SmallVariance Asymptotics
The design and use of graphical models for rich probabilistic models in machine learning has proven to be one of the most important research themes of the field. For complex models, particularly those involving continuous variables or distributions such as Bayesian nonparametric processes, scalability remains a concern. Sampling and variational inference methods simply do not work at scale for many realworld analysis problems.
We take as a starting point the connection between kmeans and mixtures of Gaussians (a similar connection exists between PCA and probabilistic PCA): letting the covariances of the clusters in a Gaussian mixture model tend to zero, the completedata log likelihood approaches the kmeans objective and the EM algorithm approaches the kmeans algorithm. Recently, we have been working on extending this idea in two complementary directions: i) to larger graphical models (e.g., hierarchical models and sequential data models), and ii) to models involving Bayesian nonparametric processes. In both cases, one can apply similar smallvariance asymptotics as those connecting EM and kmeans to these more complex models; the resulting models maintain key properties of the original probabilistic models. Crucially, this yields a number of scalable algorithms for problems where the rich probabilistic models cannot be applied.
Some representative publications:
SmallVariance Asymptotics for Exponential Family Dirichlet Process Mixture Models
Ke Jiang, Brian Kulis, & Michael I. Jordan
In Neural Information Processing Systems (NIPS), 2012.
[pdf]
Revisiting kmeans: New Algorithms via Bayesian Nonparametrics
Brian Kulis & Michael I. Jordan
In Proc. 29th International Conference on Machine Learning (ICML), 2012.
[pdf]
Visual Domain Adaptation
In many applications, the distribution of the data used for training may differ from the data used at testing. Using computer vision as an example, we may train an object classifier using images taken from the Internet, but at testing we may be using images taken from a lowresolution robot camera. We have been mainly focusing on the case when we have a large amount of labeled training data (from the source domain), and a small amount of labeled testing data (from the target domain). Using this data, we attempt to learn mappings from source to target, and vice versa. Our approach is based on extensions of metric learninginstead of learning a global transformation to apply over all the data, we learn a global transformation to map data from one domain to another.
Some representative publications:
What You Saw is Not What You Get: Domain Adaptation Using Asymmetric Kernel Transforms
Brian Kulis, Kate Saenko & Trevor Darrell
In Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
Oral Presentation: 3.5% Acceptance Rate
[pdf]
Adapting Visual Category Models to New Domains
Kate Saenko, Brian Kulis, Mario Fritz & Trevor Darrell
In Proc. 11th European Conference on Computer Vision (ECCV), 2010.
[pdf]
Online Learning
There has been considerable interest in the paradigm of online machine learning algorithms, which only observe data points once and do not store the entire data set. Such algorithms are desirable from both theoretical and practical points of view: bounds on worstcase performance can be proven, and these algorithms have shown success on many realworld problems.
My work has examined various online learning problems. At ICML 2010, we focused on the analysis of general convex online learning problems, and proved bounds for a class of learning algorithms that involves implicit update rules. The resulting algorithms show improved performance over the stateoftheart for a variety of problems. Earlier, I worked on proving bounds for the online metric learning problem, with a focus on the theoretical properties of online learning algorithms involving the LogDet divergence. We also considered the problem of how to integrate online learning and hashing for largescale similarity search.
Some representative publications:
Implicit Online Learning
Brian Kulis & Peter Bartlett
In Proc. 27th International Conference on Machine Learning (ICML), 2010.
[pdf]
Online Metric Learning and Fast Similarity Search
Prateek Jain, Brian Kulis, Inderjit Dhillon, & Kristen Grauman
In Neural Information Processing Systems (NIPS), 2008.
Oral Presentation: 2.7% Acceptance Rate
[pdf], Longer version: [pdf]
Hashing Algorithms for Image Search
A common problem in largescale data is that of quickly extracting nearest neighbors to a query from a large database. In computer vision, for example, this problem arises in contentbased image retrieval, 3d image reconstructions, human body pose estimation, and object recognition problems.
My work has developed algorithms for quickly and accurately performing largescale image searches using hashing techniques. We developed algorithms for incorporating hashing methods for learned metrics (CVPR 2008) as well as for performing localitysensitive hashing over arbitrary kernel functions (ICCV 2009), two prominent scenarios arising in modern computer vision applications. We have applied our algorithms to several largescale data sets, including the 80 million images of the Tiny Image data set.
Some representative publications:
Kernelized LocalitySensitive Hashing for Scalable Image Search
Brian Kulis & Kristen Grauman
In Proc. 12th International Conference on Computer Vision (ICCV), 2009.
[pdf] [webpage and code]
Fast Image Search for Learned Metrics
Prateek Jain, Brian Kulis, & Kristen Grauman
In. Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008.
CVPR 2008 Best Student Paper Award
[pdf]
Metric Learning
Metric learning is the problem of learning an appropriate distance function for a given task. For example, similarity searches over largescale data rely on the underlying metric, which may change from task to task or over time. A majority of existing methods for metric learning learn a Mahalanobis distance, but are limited to lowdimensional and/or small data sets, making them impractical for many realworld problems.
In my recent work, I have shown how to learn Mahalanobis metrics in a variety of settings. These include: learning metrics for highdimensional data and nonlinear transformations via kernelization (ICML 2007), learning metrics in online scenarios (NIPS 2008), and learning metrics for largescale data (CVPR 2008). This work has been applied in conjunction with hashing techniques to a variety of problems in computer vision.
Some of our metric learning code is available here.
Some representative publications:
LowRank Kernel Learning with Bregman Matrix Divergences
Brian Kulis, Matyas Sustik, & Inderjit Dhillon
Journal of Machine Learning Research, 10 (Feb): 341376, 2009.
[pdf]
Online Metric Learning and Fast Similarity Search
Prateek Jain, Brian Kulis, Inderjit Dhillon, & Kristen Grauman
In Neural Information Processing Systems (NIPS), 2008.
Oral Presentation: 2.7% Acceptance Rate
[pdf], Longer version: [pdf]
InformationTheoretic Metric Learning
Jason Davis, Brian Kulis, Prateek Jain, Suvrit Sra, & Inderjit Dhillon
In Proc. 24th International Conference on Machine Learning (ICML), 2007.
ICML 2007 Best Student Paper Award
[pdf] [code]
Scalable Semidefinite Programming
Semidefinite programming arises in a number of machine learning problems, including recommender systems, graph cut problems, metric learning, and others. One of the disadvantages to semidefinite programming is the lack of scalability to large problems.
I have studied algorithms for scaling semidefinite programming to large data sets, with a focus on semidefinite programming problems arising in practical machine learning applications. With colleagues at Microsoft Research, I developed algorithms for lowrank semidefinite programming (SDPs with an additional lowrank constraint) for embedding and clustering problems (AISTATS 2007), and with colleagues at UT Austin and Max Planck Institute, I developed a general, scalable firstorder SDP algorithm by analyzing convex perturbations for SDPs (AISTATS 2009).
Some representative publications:
Convex Perturbations for Scalable Semidefinite Programming
Brian Kulis, Suvrit Sra, & Inderjit Dhillon
In Proc. 12th Intl. AISTATS Conference, 2009.
[pdf]
Fast LowRank Semidefinite Programming for Embedding and Clustering
Brian Kulis, Arun Surendran, & John Platt
In Proc. 11th Intl. AISTATS Conference, 2007.
[pdf]
Graph Clustering and Image Segmentation
Spectral clustering partitions the nodes of a graph by minimizing one of several graph cut objectives; it has received a significant amount of attention recently and is used in numerous applications. I worked on Graclus, the first algorithm for optimizing spectral clustering objectives that does not use any eigenvector computation. This is possible due to a mathematical connection between spectral clustering and weighted kernel kmeans. Our resulting multilevel algorithm outperforms the best existing spectral methods in terms of quality, speed, and memory usage. Applications include fast image segmentation, speech segmentation, and gene network clustering.
Graclus is available for download here.
Some representative publications:
Weighted Graph Cuts without Eigenvectors: A Multilevel Approach
Inderjit Dhillon, Yuqiang Guan, & Brian Kulis
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29, no. 11, pp. 19441957, 2007.
[pdf] [code]
SemiSupervised Graph Clustering: A Kernel Approach
Brian Kulis, Sugato Basu, Inderjit Dhillon, & Raymond Mooney
In Proc. 22nd International Conference on Machine Learning (ICML), 2005.
ICML 2005 Best Student Paper Award
[pdf]
Kernel kmeans, Spectral Clustering, and Normalized Cuts
Inderjit Dhillon, Yuqiang Guan, & Brian Kulis
In Proc. 10th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2004.
[pdf]
Topic Tracking
As an undergraduate, I worked on tracking topics in networked data over time. Our goal was to understand how research topics in computer science evolve by analyzing the citation structure of computer science publication networks. We developed an algorithm based on finding natural communitiesclusters that are insensitive to small perturbations. We applied our techniques to the CiteSeer network.
Some representative publications:
Tracking Evolving Communities in Large Linked Networks
John Hopcroft, Omar Khan, Brian Kulis, & Bart Selman
Proc. of the National Academy of Sciences, vol. 101, pp. 52495253, April 2004.
[pdf]
Natural Communities in Large Linked Networks
John Hopcroft, Omar Khan, Brian Kulis, & Bart Selman
In Proc. 9th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2003.
[pdf]
