Model-based cluster analysis of microarray gene-expression data

Background Microarray technologies are emerging as a promising tool for genomic studies. The challenge now is how to analyze the resulting large amounts of data. Clustering techniques have been widely applied in analyzing microarray gene-expression data. However, normal mixture model-based cluster analysis has not been widely used for such data, although it has a solid probabilistic foundation. Here, we introduce and illustrate its use in detecting differentially expressed genes. In particular, we do not cluster gene-expression patterns but a summary statistic, the t-statistic.

      Results
      The method is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle-ear infection. Three clusters were found, two of which contain more than 95% genes with almost no altered gene-expression levels, whereas the third one has 30 genes with more or less differential gene-expression levels.


      Conclusions
      Our results indicate that model-based clustering of t-statistics (and possibly other summary statistics) can be a useful statistical tool to exploit differential gene expression for microarray data.

Data and Resources

Field Value
accessLevel public
bureauCode {009:25}
catalog_@context https://project-open-data.cio.gov/v1.1/schema/catalog.jsonld
catalog_@id https://healthdata.gov/data.json
catalog_conformsTo https://project-open-data.cio.gov/v1.1/schema
catalog_describedBy https://project-open-data.cio.gov/v1.1/schema/catalog.json
identifier https://healthdata.gov/api/views/yh42-xkaf
issued 2025-07-14
landingPage https://healthdata.gov/d/yh42-xkaf
modified 2025-09-06
programCode {009:033}
publisher National Institutes of Health
resource-type Dataset
source_datajson_identifier true
source_hash 17250ca1f9bd48c49cddaa9a7a90d3922f6ec7d8d046ebf009b284473a7ba349
source_schema_version 1.1
theme {NIH}
Groups
  • AmeriGEOSS
  • National Provider
  • North America
Tags
  • AmeriGEO
  • AmeriGEOSS
  • CKAN
  • GEO
  • GEOSS
  • National
  • North America
  • United States
  • cluster-analysis
  • gene-expression
  • microarray-data
  • nih
  • pneumococcal-infection
isopen False
license_id notspecified
license_title License not specified
maintainer NIH
maintainer_email info@nih.gov
metadata_created 2025-09-23T19:27:22.318982
metadata_modified 2025-09-23T19:27:22.318987
notes Background Microarray technologies are emerging as a promising tool for genomic studies. The challenge now is how to analyze the resulting large amounts of data. Clustering techniques have been widely applied in analyzing microarray gene-expression data. However, normal mixture model-based cluster analysis has not been widely used for such data, although it has a solid probabilistic foundation. Here, we introduce and illustrate its use in detecting differentially expressed genes. In particular, we do not cluster gene-expression patterns but a summary statistic, the t-statistic. Results The method is applied to a data set containing expression levels of 1,176 genes of rats with and without pneumococcal middle-ear infection. Three clusters were found, two of which contain more than 95% genes with almost no altered gene-expression levels, whereas the third one has 30 genes with more or less differential gene-expression levels. Conclusions Our results indicate that model-based clustering of t-statistics (and possibly other summary statistics) can be a useful statistical tool to exploit differential gene expression for microarray data.
num_resources 1
num_tags 13
title Model-based cluster analysis of microarray gene-expression data