一、数据集概况

媒体名称: Podcast
数据来源: https://podcasts.apple.com/
覆盖年度: 2005-12-10 ~ 2023-03-07
博客id数量: 303911
评论条数: 5607021
所含字段: podcast_id、title、content、rating、author_id、created_at、category等

规模庞大,字段内容丰富,适合社会学、新闻与传播学、语言学、经济学、管理学等领域学者开展研究。



二、读取数据

使用 pandas.read_json() 读取

2.1 podcasts.json

import pandas as pd

pdf = pd.read_json('podcasts.json', lines=True)

#查看podcasts.json字段
print(pdf.columns)
pdf

Run

Index(['podcast_id', 'itunes_id', 'slug', 'itunes_url', 'title', 'author',
       'description', 'average_rating', 'ratings_count', 'scraped_at'],
      dtype='object')


2.2 categories.json

cdf = pd.read_json('categories.json', lines=True)

#categories.json字段
print(cdf.columns)
cdf

Run

Index(['podcast_id', 'itunes_id', 'category'], dtype='object')


2.3 reviews.json

rdf = pd.read_json('reviews.json', lines=True)

#reviews.json字段
print(rdf.columns)
rdf

Run

Index(['podcast_id', 'title', 'content', 'rating', 'author_id', 'created_at'],
      dtype='object')



三、实验

3.1 筛选出含某关键词的播客名

podcasts.json 中筛选出含 China中国 的播客记录

china_podcast_df = pdf[pdf['title'].fillna('').str.contains('China')]
china_podcast_df


#查看这86个播客名
print(china_podcast_df.title.values)

Run

['China Arts Podcast'
 'Made in China Podcast: International Business | Crowdfunding | Entrepreneurship'
 'Chinasource Recently Added Resources' 'TIC China Network' 'UNDP China'
 'Wellness in China' 'Party In China' 'Tails From China' 'Focus on China'
 'CEIBS China Knowledge' 'Bottled in China' 'Environment China'
 'China Money Podcast - Audio Episodes'
 'China Money Podcast - Video Episodes'
 'China Jedi Podcast: Expat Life | Chinese Culture | Business | Travel | China'
 'China Digital Marketing Podcast' 'Goodbye China Podcast'
 'History and Story of China' 'Made in China'
 'China Voices: The AmCham Shanghai Podcast'
......
 "China Now's Podcast" 'China: As History Is My Witness'
 'Safeguarding Dunhuang for China and the World' 'Biz China'
 'Chinaman Talks Sports' 'China in the World' 'The History of China'
 "Forbidden City: Inside the Court of China's Emperors"
 'NAFTA at Twenty: Trade, Transformation and the China Factor'
 'NAFTA at Twenty: Trade, Transformation and the China Factor (Audio Only)'
 'China and the Chinese by Herbert Allen Giles' 'China Doing Sweden'
 'China MSG' 'Yellow Star: China News' 'Made in China']

3.2 筛选出含某关键词的内容名

筛选出含 China 的节目标题,注意podcast的title不变,但是每期的内容名(title)是变化的。

#从 reviews.json 中筛选出含 China 或 中国 的评论记录
china_title_df = rdf[rdf['title'].fillna('').str.contains('China|中国')]
china_title_df


print(china_title_df.content.values)

Run

["What's a China?" 'Thanks Justin - from China'
 'American Working in China Coffee Industry' 'Babybee in China'
 'Listening From China!!' 'Right on China.' 'Excellent China Series!'
 'China Trade War episode was fantastic'
 'Really enjoyed the China / Tariff discussion' 'China Review'
 'Beautiful videos of China!' 'Learn about The Real China business'
 'Doing business in China? Listen to this!' 'China'
 "Insightful look into China's growing influence"
 'Great smart brevity on China' 'Great insights about China'
 'Best tech podcast for China'
 'Great introduction to China’s history'
......
 'Jump into the rabbit hole of China Tech 🕳' '你好 from China!'
 'Blong in China'
 'Informational but the misconception of Gaokao in China is awkward (gatteca'
 'Listening from China' 'Not available in China' 'With Love from China'
 'Great talent from China.' 'First time to listen to dj music from China'
 'Emergency China podcast was unreal' 'China Episode' 'China'
 '矮大紧老师的确是现代中国文化圈里面的高山晓辉里的奇松' 'Love the China rant' '中国好'
 'Powerful rant on China much needed' 'NBA and China'
 'Life in China is Awesome!' 'Worthy China Podcast'
 'Learn More About China Now' 'Michael from China'
 'Best Survey of China Lecture in iTunes U' 'China' 'Band in China'
 'Band in China' '关于中国生活有趣的观点' 'Deep and personal angle to look at China'
 'A must-listen podcast for understanding the current and future China'
 'Stop crying about China' 'New podcast from a great China program'
 'Saying hi from China' '终于有一档中国记者做的播客' 'China’s’  Detention Camps'
......
 'Required listening to keep up with contemporary China'
 'Most antiChina guests and content' 'Fantastic China-centric podcast'
 'Great, well rounded look at China' 'Great info and insights on China'
 'The best Podcast on China-related topics' 'Big trouble in little China'
 '中国最好的游戏广播。' '中国第一家做游戏广播的!!' 'The best game radio in China!'
 'Best Podcast on China’s History'
 'Great China Insights and interview topics'
 'Howard Whiteson’s China based interviews are Short Concise well- easy'
 'Excellent source for politics in China' 'Good honest reporting on China'
 "GOD'S Warning About China" 'Hilarious English Pod in China!'
 'Bursting with China Healthcare Insights' 'China oh China'
 'The Real China Story'
 'China’s ambitions and their impact: Insightfully and compellingkt, weaves the micro and the macro'
 'Sets the bar for China and international reporting'
 "Amazingly balanced and detailed account of China's growing influence around the world"
 'On China’s New Silk Road' 'China’s plan for the future'
 'Great new Content on China and Sede Vacante' '没有中国特色'
 '“China Joe need we say more”'
 'Interesting and informative podcast on China'
 'SCTV from the South China Sea' 'China and Omicron' 'Strangers in China'
 'China seems very scary' 'China Lockdown'
 'I travel to China regularly just to listen'
 'Best American News I Can Find in China!!!!']
Selection deleted

3.3 筛选出含某关键词的评论

#从 reviews.json 中筛选出含 China 或 中国 的评论记录
china_reviews_df = rdf[rdf['content'].fillna('').str.contains('China|中国')]
china_reviews_df


四、获取方式

200元,加微信 372335839, 备注【姓名-学校-专业-博客】。



五、相关内容



广而告之