In this dissertation, we investigate two fundamental problems in large scale graph data mining and graph database management: Scalable graph mining and graph database management tools become increasingly crucial to applications with complex data in domains ranging from software engineering to computational biology.

Customer lifetime value modeling and cross-selling pattern mining are two important areas of data mining applications in marketing sciences. To address the problem of modeling complex correlations in classification and clustering of time series, we propose the functional subspace clustering framework, which assumes that the time series lie on several subspaces with possible deformations.

In this work, we propose a probabilistic online inference framework to iteratively rate user skills in a crowdsourcing environment.

The award will support grad. Xiaosong is currently a professor at North Carolina State University with a joint appointment at Oak Ridge National Laboratory, working on management of scientific data.

Analysis of large scale time series data collected from diverse applications has created new multi-faceted challenges and opportunities. As is known to all, the big data age contributes large-scale diversified information sources, such as structured knowledge bases KBsunstructured texts, and semi-structured tables.

We propose a novel framework, called gIceberg, which performs aggregation over personalized PageRank vectors. Given a graph data set, what are the hidden structural patterns and how can we find them.

It also explores several critical applications in bioinformatics, computer systems, and software engineering, including gene relevance network analysis for functional annotation, and program flow analysis for automated software bug isolation.

The developed concepts, theories, and systems hence increase our understanding of data mining principles in structural pattern discovery, interpretation and search.

Earth science earth science, or geoscience, comprises a wide range of sub-topics that aim to analyze all the factors, notions and theories on our planet for. Starting with the challenges and issues existing in knowledge graph query processing, I will discuss our efforts in addressing these issues, including schemaless graph querying, user feedback, factoid question benchmark, natural language questions, and query routing in collaborative networks.

First, we tackle the challenge of modeling high-dimensional multi-modal correlations in the spatio-temporal data, as accurate modeling of correlations is the key to accurate predictive analysis.

The award will fund her work on storage. The formulation of a general graph information system through this study could provide fundamental supports to graph-intensive applications Topics: Graph mining and graph data management themselves are expensive computational problems since subgraph isomorphism is NP-complete.

The award will support grad. Rule- and motif-based anomaly detection in massive moving object data sets, by X. Moreover, we empirically verify that tables supply rich knowledge that might not exist or is difficult to be identified in existing KBs. The intelligence possessed by current machines is still limited in many aspects.

We also claim that similarity is a subproblem in many applications with multiple graphs, and contribute methods for network alignment and similarity.

Huan Sun (student) and Xifeng Yan (advisor) at University of California, Santa Barbara. Abstract: Today’s paradigm of information search is in the midst of a significant transformation. Question answering (QA) systems that can precisely answer user questions are becoming more and more desired, in contrast to traditional search engines only retrieving lengthy web pages.

