Python：collections.Counter 常用函数及应用

📅 发布时间：2026/7/4 21:48:41 👁️ 浏览次数：

在文本分析、数据统计以及自然语言处理中经常需要对数据进行频次统计frequency counting。例如• 统计一段文本中每个单词出现的次数• 统计日志中某类事件出现的频率• 统计列表中元素的重复情况Python 标准库 collections 提供了一个专门用于频次统计的数据结构Counter。Counter 是一个计数器对象本质上实现了多重集合multiset的行为用于记录元素出现的次数。例如from collections import Counter words [apple, banana, apple, orange, banana, apple]counter Counter(words)print(counter)输出示例Counter({apple: 3, banana: 2, orange: 1})• 键key表示被统计的元素• 值value表示该元素出现的次数计数Counter 基于字典实现底层使用哈希表结构因此对元素计数的更新与查询通常具有接近 (1) 的时间复杂度。一、创建 Counter 对象Counter 可以通过多种方式创建。Counter()创建一个 Counter 对象。Counter([iterable_or_mapping], **kwargs)参数说明• iterable_or_mapping可迭代对象如列表、字符串或映射对象字典用于初始化计数• **kwargs以关键字参数形式提供元素计数返回返回一个 Counter 对象。若不提供参数则创建空 Counter。示例 1从列表创建from collections import Counter data [a, b, a, c, b, a]counter Counter(data)print(counter)输出Counter({a: 3, b: 2, c: 1})示例 2从字符串创建字符串也是可迭代对象因此可以直接统计字符频率。text hello worldcounter Counter(text)print(counter)输出示例Counter({l: 3, o: 2, h: 1, e: 1, : 1, w: 1, r: 1, d: 1})示例 3从字典创建counter Counter({apple: 3, banana: 2})print(counter)输出Counter({apple: 3, banana: 2})示例 4使用关键字参数创建counter Counter(a3, b2, c1)print(counter)输出Counter({a: 3, b: 2, c: 1})说明关键字参数的形式等价于从字典创建Counter({a: 3, b: 2, c: 1})二、访问计数结果Counter 的行为类似于字典可以通过键访问元素计数。counter[element]获取元素计数。示例counter Counter([apple, banana, apple]) print(counter[apple])# 2如果元素不存在Counter 返回 0而不是抛出异常print(counter[orange])# 0这一点与普通字典不同。elements()返回一个迭代器其中每个元素按照其计数重复出现。counter.elements()示例counter Counter({a: 3, b: 2})print(list(counter.elements()))输出[a, a, a, b, b]说明elements() 仅返回计数大于 0 的元素并按照元素在 Counter 中的顺序生成。三、更新与修改计数CounterCounter 可通过一元加号运算删除计数小于等于 0 的元素。counter 会返回一个新的 Counter 对象其中只保留计数大于 0 的元素。示例counter Counter({a: 2, b: 0, c: -1})print(counter)输出Counter({a: 2})update()增加元素计数。counter.update([iterable_or_mapping], **kwargs)参数说明• iterable_or_mapping可迭代对象如列表、字符串或映射对象字典• **kwargs以关键字参数形式增加元素计数示例counter Counter(apple)counter.update(apple)print(counter)输出Counter({p: 4, a: 2, l: 2, e: 2})说明当传入字符串时字符串会被视为字符序列进行统计。subtract()减少元素计数。counter.subtract([iterable_or_mapping], **kwargs)参数说明• iterable_or_mapping可迭代对象如列表、字符串或映射对象字典• **kwargs以关键字参数形式减少元素计数示例 1counter Counter(apple)counter.subtract(ap)print(counter)输出Counter({p: 2, l: 1, e: 1, a: 0})说明subtract() 可能产生负数计数。示例 2c Counter(a4, b2)c.subtract(a1, b2)print(c)输出Counter({a: 3, b: 0})四、常用统计函数Counter 提供了多个用于统计分析的函数。most_common()返回出现频率最高的元素。counter.most_common(nNone)参数说明• n返回前 n 个元素返回返回一个列表每个元素为(element, count)。示例text bananacounter Counter(text)print(counter.most_common(2))输出[(a, 3), (n, 2)]说明结果按计数从高到低排序若计数相同则保持元素首次出现顺序。total()返回 Counter 中所有计数值的总和。counter.total()示例counter Counter(banana)print(counter.total())# 6说明Python 3.10 新增。五、Counter 的数学运算Counter 支持多重集合multiset运算。1、加法counter1 counter2合并计数。示例c1 Counter(apple)c2 Counter(pear) print(c1 c2)# Counter({p: 3, a: 2, e: 2, l: 1, r: 1})2、减法counter1 - counter2只保留大于 0 的计数项。示例print(c1 - c2)# Counter({p: 1, l: 1})3、交集counter1 counter2取最小计数。示例print(c1 c2)# Counter({a: 1, p: 1, e: 1})4、并集counter1 | counter2取最大计数。示例print(c1 | c2)# Counter({p: 2, a: 1, l: 1, e: 1, r: 1})六、英文文本词频统计示例在自然语言处理NLP任务中Counter 常用于统计单词频率。典型流程包括1、文本清洗2、分词tokenization3、停用词过滤4、词频统计示例代码from collections import Counterimport string text Artificial intelligence is transforming the world.Machine learning and deep learning are key technologies in AI.Natural language processing enables computers to understand human language. # 转为小写text text.lower() # 常见特殊符号punctuation string.punctuation # 将特殊符号替换为空格table str.maketrans(punctuation, * len(punctuation))text text.translate(table) # 按空格分词words text.split() # 使用 Counter 统计词频freq Counter(words) # 逐行输出出现频率最高的 10 个词for word, count in freq.most_common(10): print(word, count) 小结Counter 是 Python collections 模块提供的计数器对象本质上是 dict 的子类用于记录元素出现次数。通过 Counter()、update()、subtract()、most_common()、elements() 等方法可以方便地完成频次统计与数据分布分析。Counter 还支持多重集合运算如加法、减法、交集和并集。在文本分析与自然语言处理任务中Counter 常与分词和文本清洗步骤结合使用用于快速构建词频统计模型。“点赞有美意赞赏是鼓励”

相关新闻

最新新闻

日新闻

周新闻

月新闻