To read this content please select one of the options below:

Aggregation consistency and frequency of Chinese words and characters

Clément Arsenault (École de bibliothéconomie et des sciences de l'information, Université de Montréal, Québec, Canada)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 1 September 2006

Downloads

694

Abstract

Purpose

–

Aims to measure syllable aggregation consistency of Romanized Chinese data in the title fields of bibliographic records. Also aims to verify if the term frequency distributions satisfy conventional bibliometric laws.

Design/methodology/approach

–

Uses Cooper's interindexer formula to evaluate aggregation consistency within and between two sets of Chinese bibliographic data. Compares the term frequency distributions of polysyllabic words and monosyllabic characters (for vernacular and Romanized data) with the Lotka and the generalised Zipf theoretical distributions. The fits are tested with the Kolmogorov‐Smirnov test.

Findings

–

Finds high internal aggregation consistency within each data set but some aggregation discrepancy between sets. Shows that word (polysyllabic) distributions satisfy Lotka's law but that character (monosyllabic) distributions do not abide by the law.

Research limitations/implications

–

The findings are limited to only two sets of bibliographic data (for aggregation consistency analysis) and to one set of data for the frequency distribution analysis. Only two bibliometric distributions are tested. Internal consistency within each database remains fairly high. Therefore the main argument against syllable aggregation does not appear to hold true. The analysis revealed that Chinese words and characters behave differently in terms of frequency distribution but that there is no noticeable difference between vernacular and Romanized data. The distribution of Romanized characters exhibits the worst case in terms of fit to either Lotka's or Zipf's laws, which indicates that Romanized data in aggregated form appear to be a preferable option.

Originality/value

–

Provides empirical data on consistency and distribution of Romanized Chinese titles in bibliographic records.

Keywords

Citation

Arsenault, C. (2006), "Aggregation consistency and frequency of Chinese words and characters", Journal of Documentation, Vol. 62 No. 5, pp. 606-633. https://doi.org/10.1108/00220410610688750

Publisher

:

Emerald Group Publishing Limited

To read this content please select one of the options below:

Please note you do not have access to teaching notes

Aggregation consistency and frequency of Chinese words and characters

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Keywords

Citation

Publisher

Related articles

To read this content please select one of the options below:

Please note you do not have access to teaching notes

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Originality/value

Keywords

Citation

Publisher

Related articles

All feedback is valuable

Report an issue or find answers to frequently asked questions