Efficient Keyword-Based Search for Top-K Cells in Text Cube

Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked-structures (e.g.,a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for coring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches, inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches.

Citation: B. Ding, B. Zhao, C. X. Lin, J. Han, C. Zhai, A. N. Srivastava, and N. C. Oza, “Efficient Keyword-Based Search for Top-K Cells in Text Cube,” IEEE Transactions on Knowledge and Data Engineering, 2011.

Data and Resources

tkde11topcells.pdfPDF
tkde11topcells.pdf
Explore
- Preview
- Download

Field	Value
accessLevel	public
accrualPeriodicity	irregular
bureauCode	{026:00}
catalog_@context	https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
catalog_@id	https://data.nasa.gov/data.json
catalog_conformsTo	https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy	https://project-open-data.cio.gov/v1.1/schema/catalog.json
identifier	DASHLINK_515
issued	2012-01-27
landingPage	https://c3.nasa.gov/dashlink/resources/515/
modified	2020-01-29
programCode	{026:029}
publisher	Dashlink
resource-type	Dataset
source_datajson_identifier	true
source_hash	e8da65da1abc24e989adfebd82ef638951822814
source_schema_version	1.1
Groups	AmeriGEOSS National Provider North America
Tags	amerigeo amerigeoss ames ckan dashlink geo geoss nasa national north-america united-states
isopen	False
license_id	notspecified
license_title	License not specified
maintainer	Ashok Srivastava
maintainer_email	ashok.n.srivastava@gmail.com
metadata_created	2025-11-22T06:05:10.646608
metadata_modified	2025-11-22T06:05:10.646612
notes	Previous studies on supporting free-form keyword queries over RDBMSs provide users with linked-structures (e.g.,a set of joined tuples) that are relevant to a given keyword query. Most of them focus on ranking individual tuples from one table or joins of multiple tables containing a set of keywords. In this paper, we study the problem of keyword search in a data cube with text-rich dimension(s) (so-called text cube). The text cube is built on a multidimensional text database, where each row is associated with some text data (a document) and other structural dimensions (attributes). A cell in the text cube aggregates a set of documents with matching attribute values in a subset of dimensions. We define a keyword-based query language and an IR-style relevance model for coring/ranking cells in the text cube. Given a keyword query, our goal is to find the top-k most relevant cells. We propose four approaches, inverted-index one-scan, document sorted-scan, bottom-up dynamic programming, and search-space ordering. The search-space ordering algorithm explores only a small portion of the text cube for finding the top-k answers, and enables early termination. Extensive experimental studies are conducted to verify the effectiveness and efficiency of the proposed approaches. Citation: B. Ding, B. Zhao, C. X. Lin, J. Han, C. Zhai, A. N. Srivastava, and N. C. Oza, “Efficient Keyword-Based Search for Top-K Cells in Text Cube,” IEEE Transactions on Knowledge and Data Engineering, 2011.
num_resources	1
num_tags	11
title	Efficient Keyword-Based Search for Top-K Cells in Text Cube