Communications in Biometry and Crop Science

Communications
in Biometry and Crop Science

 

 

Contents

REGULAR ARTICLE
Combining partially ranked data in plant breeding and biology: II. Analysis with Rasch model

Ivan Simko, John M. Linacre


Communications in Biometry and Crop Science (2010) 5 (1), 56-65.
 

ABSTRACT
Many years of breeding experiments, germplasm screening, and molecular biologic experimentation have generated volumes of sequence, genotype, and phenotype information that have been stored in public data repositories. These resources afford genetic and genomic researchers the opportunity to handle and analyze raw data from multiple laboratories and study groups whose research interests revolve around a common or closely related trait. However, although such data sets are widely available for secondary analysis, their heterogeneous nature often precludes their direct combination and joint exploration. Integration of phenotype information across multiple studies and databases is challenging due to variations in the measurement instruments, endpoint classifications, and biological material employed by each investigator. In the present work, we demonstrate how Rasch measurement model can surmount these problems. The model allows incorporating data sets with partially overlapping variables, large numbers of missing data points and dissimilar ratings of phenotypic endpoints. The model also enables quantifying the extent of heterogeneity between data sets. Biologists can use the model in a data-mining process to obtain combined ratings from various databases and other sources. Subsequently, these ratings can be used for selecting desirable material or (in combination with genotypic information) for mapping genes involved in the particular trait. The model is not limited to genetics and breeding and can be applied in many other areas of biology and agriculture.
 

Key Words:Aggregated ranking; Bradley-Terry model; combining data; rank-order .