Python 统计字数的思路详解

脚本专栏 2026/3/28 佚名

3 2 1

问题描述：

用 Python 实现函数 count_words()，该函数输入字符串 s 和数字 n，返回 s 中 n 个出现频率最高的单词。返回值是一个元组列表，包含出现次数最高的 n 个单词及其次数,即 [(<单词1>, <次数1>), (<单词2>, <次数2>), ... ]，按出现次数降序排列。

您可以假设所有输入都是小写形式，并且不含标点符号或其他字符（只包含字母和单个空格）。如果出现次数相同，则按字母顺序排列。

例如：

print count_words("betty bought a bit of butter but the butter was bitter",3)

输出：

[('butter', 2), ('a', 1), ('betty', 1)]

解决问题的思路：

1. 将字符串s进行空白符分割得到所有的单词列表split_s，如：['betty', 'bought', 'a', 'bit', 'of', 'butter', 'but', 'the', 'butter', 'was', 'bitter']

2. 建立maplist,将split_s转化为元素为元组的列表形式，如：[('betty', 1), ('bought', 1), ('a', 1), ('bit', 1), ('of', 1), ('butter', 1), ('but', 1), ('the', 1), ('butter', 1), ('was', 1), ('bitter', 1)]

3. 合并maplist中元素，元组的第一个索引值相同，则将其第二个索引值相加。

// 备注：准备采用defaultdict。得到的数据如下：{'betty': 1, 'bought': 1, 'a': 1, 'bit': 1, 'of': 1, 'butter': 2, 'but': 1, 'the': 1, 'was': 1, 'bitter': 1}

4. 进行排序，按照key进行字母排序，得到如下：[('a', 1), ('betty', 1), ('bit', 1), ('bitter', 1), ('bought', 1), ('but', 1), ('butter', 2), ('of', 1), ('the', 1), ('was', 1)]

5. 进行二次排序, 按照value进行排序，得到如下：[('butter', 2), ('a', 1), ('betty', 1), ('bit', 1), ('bitter', 1), ('bought', 1), ('but', 1), ('of', 1), ('the', 1), ('was', 1)]

6. 使用切片取出频率较高的*组数据

总结：在python3上不进行defaultdict进行排序结果也是正确的，python2上不正确。defaultdict本身是没有顺序的，要区分列表，所以必须进行排序。

也可尝试自己写，不借助第三方模块

解决方案1（使用defaultdict）：

from collections import defaultdict
"""Count words."""
def count_words(s, n):
  """Return the n most frequently occuring words in s."""
  split_s = s.split()
  map_list = [(k,1) for k in split_s]
  output = defaultdict(int)
  for d in map_list:
    output[d[0]] += d[1]
  output1 = dict(output)
  top_n = sorted(output1.items(), key=lambda pair:pair[0], reverse=False)
  top_n = sorted(top_n, key=lambda pair:pair[1], reverse=True)
  return top_n[:n]
def test_run():
  """Test count_words() with some inputs."""
  print(count_words("cat bat mat cat bat cat", 3))
  print(count_words("betty bought a bit of butter but the butter was bitter", 4))
if __name__ == '__main__':
  test_run()

解决方案2（使用Counter）

from collections import Counter
"""Count words."""
def count_words(s, n):
  """Return the n most frequently occuring words in s."""
  split_s = s.split()
  split_s = Counter(name for name in split_s)
  print(split_s)
  top_n = sorted(split_s.items(), key=lambda pair:pair[0], reverse=False)
  print(top_n)
  top_n = sorted(top_n, key=lambda pair:pair[1], reverse=True)
  print(top_n)
  return top_n[:n]
def test_run():
  """Test count_words() with some inputs."""
  print(count_words("cat bat mat cat bat cat", 3))
  print(count_words("betty bought a bit of butter but the butter was bitter", 4))
if __name__ == '__main__':
  test_run()

总结

以上所述是小编给大家介绍的Python 统计字数的思路详解，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对网站的支持！

python,统计字数

标签：

python,统计字数

免责声明：本站文章均来自网站采集或用户投稿，网站不提供任何软件下载或自行开发的软件！如有用户或公司发现本站内容信息存在侵权行为，请邮件告知！ 858582#qq.com

评论“Python 统计字数的思路详解”

Python 统计字数的思路详解

暂无“Python 统计字数的思路详解”评论...

www.imxmx.com 杰晶网络

8,675无损音乐

1,324高清电影

213破解软件

120,141站长资源

最新文章

群星《奔赴！万人现场第2期》[FLAC/分轨][5

2026/3/28

群星《奇妙浪一夏 (上海迪士尼度假区音乐)》

2026/3/28

群星《奇妙浪一夏 (上海迪士尼度假区音乐)》

2026/3/28

【古典音乐】詹姆斯·高威《季节》1993[WAV+

2026/3/28

贝拉芳蒂《卡里普索之王》SACD[WAV+CUE]

2026/3/28

一句话新闻

苹果官宣WWDC 2024！预计会有大批AI功能 - 2026/3/28

3月27日消息，苹果宣布2024年全球开发者大会（WWDC）将于6月10日至6月14日举行，巧合的是，这次大会与端午假期重合。

苹果官方表示：

在线参加 Apple 每年规模最大的开发者盛会。亲眼见证 Apple 最新平台、技术和工具的发布。了解如何创建和改进你的 App 和游戏。与 Apple 设计师和工程师互动交流，与全球开发者社区建立联系。以上活动均免费在线举行。

探索各种新的工具、框架和功能，助力你打造出理想的 App 和游戏。通过视频讲座学习新技能，与 Apple 专家进行一对一会面，以推进你的项目，完善你的构思。

Swift Student Challenge 旨在支持和鼓舞下一代开发者、创作者和企业家。太平洋时间 3 月 28 日，我们将公布今年的获奖者名单。获奖者将有资格参加在 Apple Park 举办的特别活动。我们还会选出 50 名杰出获胜者，他们将受邀前往库比提诺，获得为期三天的非凡体验，包括参加 Apple Park 的特别活动。

Python 统计字数的思路详解

python,统计字数

对Python中gensim库word2vec的使用详解

用python处理MS Word的实例讲解

评论“Python 统计字数的思路详解”

RTX 5090要首发性能要翻倍！三星展示GDDR7显存

更新动态

友情链接

Python 统计字数的思路详解

python,统计字数

对Python中gensim库word2vec的使用详解

用python处理MS Word的实例讲解

评论“Python 统计字数的思路详解”

RTX 5090要首发 性能要翻倍！三星展示GDDR7显存

更新动态

友情链接

RTX 5090要首发性能要翻倍！三星展示GDDR7显存