{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "59396417-f73c-47a2-a3f5-9830e5e45870", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "231306\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "/var/folders/y0/4gqxky0s2t94x1c1qhlwr6100000gn/T/ipykernel_64822/616359690.py:3: DtypeWarning: Columns (28,34) have mixed types. Specify dtype option on import or set low_memory=False.\n", " df = pd.read_csv('5112万专利申请全量数据1985-2025年/内蒙古自治区.csv.gz', compression='gzip')\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
专利名称专利类型申请人申请人类型申请人地址申请人国家申请人省份申请人城市申请人区县申请号...专利权人类型统一社会信用代码引证次数被引证次数自引次数他引次数被自引次数被他引次数家族引证次数家族被引证次数
0彩色液位显示计实用新型张明辰; 鲍培林个人内蒙古自治区集宁市桥东四马路三十号中国内蒙古自治区乌兰察布市集宁区CN87215507.2...NaNNaNNaNNaNNaNNaNNaNNaNNaNNaN
\n", "

1 rows × 35 columns

\n", "
" ], "text/plain": [ " 专利名称 专利类型 申请人 申请人类型 申请人地址 申请人国家 申请人省份 申请人城市 \\\n", "0 彩色液位显示计 实用新型 张明辰; 鲍培林 个人 内蒙古自治区集宁市桥东四马路三十号 中国 内蒙古自治区 乌兰察布市 \n", "\n", " 申请人区县 申请号 ... 专利权人类型 统一社会信用代码 引证次数 被引证次数 自引次数 他引次数 被自引次数 \\\n", "0 集宁区 CN87215507.2 ... NaN NaN NaN NaN NaN NaN NaN \n", "\n", " 被他引次数 家族引证次数 家族被引证次数 \n", "0 NaN NaN NaN \n", "\n", "[1 rows x 35 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "\n", "df = pd.read_csv('5112万专利申请全量数据1985-2025年/内蒙古自治区.csv.gz', compression='gzip')\n", "print(len(df))\n", "df.head(1)" ] }, { "cell_type": "code", "execution_count": 3, "id": "23143c74-0965-4d26-8bcf-68600e8a529b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 1、一种彩色液位显示计,包括汽连管法兰(1),水连管法兰(6),汽、水开关旋塞(2)、(7)...\n", "1 1、一种由机座1、机架2、轧轮座3、轧轮4组成的滚压式冷拔成型胎具,其特征是胎具的3~6个压...\n", "2 1、一种由座件1、模芯2和凸筋3组成的生产小型螺纹钢的挤压式冷拔成型胎具,其特征是3~8个凸...\n", "3 1、一种家庭或单位用的自动供水开关装置,它通过一杠杆机构(9)将阀门(2)与悬浮体(10)相...\n", "4 1、本发明对自行车传动系统进行改进,也可以用于其他人力驱动机械,用钢丝绳传动代替传统的链条传...\n", " ... \n", "231301 1.一种立面太阳能板(1)安装装置,其特征在于,包括:\\n本体部件,所述本体部件包括太阳能板...\n", "231302 1.一种窗帘滑动装置,其特征在于,包括:导轨(10)、若干个挂接件(20)和若干个卷簧(30...\n", "231303 1.一种8-羟基喹啉制备的原料除杂装置,包括工作釜(1),其特征在于,所述工作釜(1)的下端...\n", "231304 1.一种防堵塞的筛分机,包括有机肥筛分箱(1),其特征在于:所述有机肥筛分箱(1)的内部活动...\n", "231305 1.一种圆弧变壁厚药形罩同步旋压装置,包括滑杆(1)和活动设置在滑杆(1)上的滑套(2),其...\n", "Name: 主权项内容, Length: 231306, dtype: object" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df['主权项内容']" ] }, { "cell_type": "code", "execution_count": 2, "id": "0978cf8f-6f9d-46d4-9477-20729e3f490d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['专利名称', '专利类型', '申请人', '申请人类型', '申请人地址', '申请人国家', '申请人省份', '申请人城市',\n", " '申请人区县', '申请号', '申请日', '申请年份', '公开公告号', '公开公告日', '公开公告年份', '授权公告号',\n", " '授权公告日', '授权公告年份', 'IPC分类号', 'IPC主分类号', '发明人', '摘要文本', '主权项内容', '当前权利人',\n", " '当前专利权人地址', '专利权人类型', '统一社会信用代码', '引证次数', '被引证次数', '自引次数', '他引次数',\n", " '被自引次数', '被他引次数', '家族引证次数', '家族被引证次数'],\n", " dtype='object')" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#含有的字段\n", "df.columns" ] }, { "cell_type": "code", "execution_count": 9, "id": "17f55b42-b3ac-482a-93cb-5756b9fe4e8d", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
专利名称摘要文本主权项内容申请日专利类型
39189一种基于固态存储介质的RAID系统及方法本发明提供一种基于固态存储介质的RAID系统及方法,系统包括:多个固态存储装置,其中每个固态...1.一种基于固态存储介质的RAID系统,其特征在于,包括:\\n多个固态存储装置,其中每个固态...2014-11-20发明授权
39197一种用于星载AIS信号接收系统的时隙同步方法本发明公开了一种用于星载AIS信号接收系统的时隙同步方法,包括如下步骤:在一个AIS时隙内,...一种用于星载AIS信号接收系统的时隙同步方法,其特征在于,包括如下步骤:初始化步骤,在一个A...2014-05-28发明申请
52863基于人工智能的路面可行度指示器本发明涉及一种基于人工智能的路面可行度指示器,由微型摄像头、微处理器、触觉发生器组成,微处理...一种基于人工智能的路面可行度指示器,由微型摄像头、微处理器、触觉发生器组成,其特征在于:所述...2016-12-08发明申请
53415面向税务咨询业务的智能问答系统本发明属于人工智能技术领域,具体为一种面向税务咨询业务的智能问答系统。该系统包括:一台安装A...1.一种面向税务咨询业务的智能问答系统,其特征在于,包括:\\n一台安装Android操作系统...2016-11-10发明授权
61925人工智能处理器及使用处理器执行向量乘加指令的方法本发明提供一种人工智能处理器及人工智能处理器执行向量乘加指令方法,所述处理器设置于计算装置内...1.一种人工智能处理器,其特征在于,所述处理器设置于计算装置内,所述计算装置用于执行向量乘加...2017-10-30发明申请
..................
224036基于人工智能的边缘网关资源优化管理方法本发明涉及电通信技术领域,具体是基于人工智能的边缘网关资源优化管理方法,包括如下:S1:根据...1.基于人工智能的边缘网关资源优化管理方法,其特征在于,所述边缘网关资源优化管理方法包括如下...2024-07-30发明申请
224142一种基于AI驱动的网络边界威胁检测系统本发明涉及用于监测或测试数据交换网络的技术领域,尤其涉及一种基于AI驱动的网络边界威胁检测系...1.一种基于AI驱动的网络边界威胁检测系统,其特征在于,所述检测系统包括数据采集模块、数据预...2024-08-09发明申请
224221基于人工智能的数字人脸生成方法、装置、设备及介质本申请属于人工智能领域与金融科技领域,涉及一种基于人工智能的数字人脸生成方法、装置、计算机设...1.一种基于人工智能的数字人脸生成方法,其特征在于,包括下述步骤:\\n获取用户输入的目标情感...2024-08-15发明申请
227491一种人工智能系统数据存储装置本实用新型涉及存储装置领域,具体涉及一种人工智能系统数据存储装置,包括箱体与若干存储设备,若...1.一种人工智能系统数据存储装置,包括箱体(1)与若干存储设备(2),其特征在于:若干所述存...2024-01-11实用新型
229259一种AI智能设备马达用轴承本实用新型涉及智能设备技术领域,具体为一种AI智能设备马达用轴承,包括轴承主体,所述轴承主体...1.一种AI智能设备马达用轴承,包括轴承主体(1),其特征在于:所述轴承主体(1)内部的中心...2024-04-28实用新型
\n", "

430 rows × 5 columns

\n", "
" ], "text/plain": [ " 专利名称 \\\n", "39189 一种基于固态存储介质的RAID系统及方法 \n", "39197 一种用于星载AIS信号接收系统的时隙同步方法 \n", "52863 基于人工智能的路面可行度指示器 \n", "53415 面向税务咨询业务的智能问答系统 \n", "61925 人工智能处理器及使用处理器执行向量乘加指令的方法 \n", "... ... \n", "224036 基于人工智能的边缘网关资源优化管理方法 \n", "224142 一种基于AI驱动的网络边界威胁检测系统 \n", "224221 基于人工智能的数字人脸生成方法、装置、设备及介质 \n", "227491 一种人工智能系统数据存储装置 \n", "229259 一种AI智能设备马达用轴承 \n", "\n", " 摘要文本 \\\n", "39189 本发明提供一种基于固态存储介质的RAID系统及方法,系统包括:多个固态存储装置,其中每个固态... \n", "39197 本发明公开了一种用于星载AIS信号接收系统的时隙同步方法,包括如下步骤:在一个AIS时隙内,... \n", "52863 本发明涉及一种基于人工智能的路面可行度指示器,由微型摄像头、微处理器、触觉发生器组成,微处理... \n", "53415 本发明属于人工智能技术领域,具体为一种面向税务咨询业务的智能问答系统。该系统包括:一台安装A... \n", "61925 本发明提供一种人工智能处理器及人工智能处理器执行向量乘加指令方法,所述处理器设置于计算装置内... \n", "... ... \n", "224036 本发明涉及电通信技术领域,具体是基于人工智能的边缘网关资源优化管理方法,包括如下:S1:根据... \n", "224142 本发明涉及用于监测或测试数据交换网络的技术领域,尤其涉及一种基于AI驱动的网络边界威胁检测系... \n", "224221 本申请属于人工智能领域与金融科技领域,涉及一种基于人工智能的数字人脸生成方法、装置、计算机设... \n", "227491 本实用新型涉及存储装置领域,具体涉及一种人工智能系统数据存储装置,包括箱体与若干存储设备,若... \n", "229259 本实用新型涉及智能设备技术领域,具体为一种AI智能设备马达用轴承,包括轴承主体,所述轴承主体... \n", "\n", " 主权项内容 申请日 专利类型 \n", "39189 1.一种基于固态存储介质的RAID系统,其特征在于,包括:\\n多个固态存储装置,其中每个固态... 2014-11-20 发明授权 \n", "39197 一种用于星载AIS信号接收系统的时隙同步方法,其特征在于,包括如下步骤:初始化步骤,在一个A... 2014-05-28 发明申请 \n", "52863 一种基于人工智能的路面可行度指示器,由微型摄像头、微处理器、触觉发生器组成,其特征在于:所述... 2016-12-08 发明申请 \n", "53415 1.一种面向税务咨询业务的智能问答系统,其特征在于,包括:\\n一台安装Android操作系统... 2016-11-10 发明授权 \n", "61925 1.一种人工智能处理器,其特征在于,所述处理器设置于计算装置内,所述计算装置用于执行向量乘加... 2017-10-30 发明申请 \n", "... ... ... ... \n", "224036 1.基于人工智能的边缘网关资源优化管理方法,其特征在于,所述边缘网关资源优化管理方法包括如下... 2024-07-30 发明申请 \n", "224142 1.一种基于AI驱动的网络边界威胁检测系统,其特征在于,所述检测系统包括数据采集模块、数据预... 2024-08-09 发明申请 \n", "224221 1.一种基于人工智能的数字人脸生成方法,其特征在于,包括下述步骤:\\n获取用户输入的目标情感... 2024-08-15 发明申请 \n", "227491 1.一种人工智能系统数据存储装置,包括箱体(1)与若干存储设备(2),其特征在于:若干所述存... 2024-01-11 实用新型 \n", "229259 1.一种AI智能设备马达用轴承,包括轴承主体(1),其特征在于:所述轴承主体(1)内部的中心... 2024-04-28 实用新型 \n", "\n", "[430 rows x 5 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "AI_rela_words = '人工智能|机器学习|AI|NLP|智能问答|智能问答|神经机器翻译|NLU|增量学习'\n", "\n", "mask1 = df['专利名称'].fillna('').str.contains(AI_rela_words)\n", "mask2 = df['摘要文本'].fillna('').str.contains(AI_rela_words)\n", "mask3 = df['主权项内容'].fillna('').str.contains(AI_rela_words)\n", "\n", "#内容太多, 选择需要的字段进行展示\n", "selected_fields = ['专利名称', '摘要文本', '主权项内容', '申请日', '专利类型']\n", "#专利\n", "ai_df = df[mask1 & mask2 & mask3][selected_fields]\n", "ai_df" ] }, { "cell_type": "code", "execution_count": 10, "id": "975de18a-bd9e-4735-a72b-2efea70afdfc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "2014 2\n", "2016 2\n", "2017 13\n", "2018 25\n", "2019 38\n", "2020 65\n", "2021 83\n", "2022 123\n", "2023 4\n", "2024 75\n" ] } ], "source": [ "ai_df['year'] = ai_df[\"申请日\"].apply(lambda d:d[:4])\n", "\n", "for year, ai_year_df in ai_df.groupby('year'):\n", " print(year, len(ai_year_df))" ] }, { "cell_type": "code", "execution_count": 11, "id": "02dd8155-38ab-4559-aa1b-f2c2b1164fc8", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'年度': '2014', '实用新型': 0, '发明公开': 0, '外观设计': 0, '发明授权': 1, '省份': '内蒙古自治区'}\n", "{'年度': '2016', '实用新型': 0, '发明公开': 0, '外观设计': 0, '发明授权': 1, '省份': '内蒙古自治区'}\n", "{'年度': '2017', '实用新型': 1, '发明公开': 0, '外观设计': 0, '发明授权': 3, '省份': '内蒙古自治区'}\n", "{'年度': '2018', '实用新型': 1, '发明公开': 0, '外观设计': 0, '发明授权': 6, '省份': '内蒙古自治区'}\n", "{'年度': '2019', '实用新型': 5, '发明公开': 0, '外观设计': 0, '发明授权': 9, '省份': '内蒙古自治区'}\n", "{'年度': '2020', '实用新型': 12, '发明公开': 0, '外观设计': 0, '发明授权': 14, '省份': '内蒙古自治区'}\n", "{'年度': '2021', '实用新型': 7, '发明公开': 0, '外观设计': 0, '发明授权': 11, '省份': '内蒙古自治区'}\n", "{'年度': '2022', '实用新型': 14, '发明公开': 0, '外观设计': 0, '发明授权': 14, '省份': '内蒙古自治区'}\n", "{'年度': '2023', '实用新型': 0, '发明公开': 0, '外观设计': 0, '发明授权': 0, '省份': '内蒙古自治区'}\n", "{'年度': '2024', '实用新型': 2, '发明公开': 0, '外观设计': 0, '发明授权': 1, '省份': '内蒙古自治区'}\n" ] } ], "source": [ "for year, ai_year_df in ai_df.groupby('year'):\n", " data = dict()\n", " data['年度'] = year\n", " data['实用新型'] = (ai_year_df['专利类型']=='实用新型').sum()\n", " data['发明公开'] = (ai_year_df['专利类型']=='发明公开').sum()\n", " data['外观设计'] = (ai_year_df['专利类型']=='外观设计').sum()\n", " data['发明授权'] = (ai_year_df['专利类型']=='发明授权').sum()\n", " data['省份'] = '内蒙古自治区'\n", " print(data)" ] }, { "cell_type": "code", "execution_count": 13, "id": "43425476-9478-48a4-b784-a9f75f2bc337", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['5112万专利申请全量数据1985-2025年/北京市.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/广西壮族自治区.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/河北省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/海南省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/天津市.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/青海省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/重庆市.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/云南省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/香港特别行政区.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/其他国家.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/湖南省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/内蒙古自治区.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/陕西省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/澳门特别行政区.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/甘肃省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/江苏省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/台湾省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/吉林省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/江西省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/安徽省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/上海市.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/福建省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/黑龙江省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/贵州省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/湖北省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/四川省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/浙江省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/山西省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/广东省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/西藏自治区.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/山东省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/新疆维吾尔自治区.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/辽宁省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/河南省.csv.gz',\n", " '5112万专利申请全量数据1985-2025年/宁夏回族自治区.csv.gz']" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import glob\n", "\n", "files = glob.glob('5112万专利申请全量数据1985-2025年/*.csv.gz')\n", "files" ] }, { "cell_type": "code", "execution_count": null, "id": "d6bc2d25-ee81-4048-a03d-611e25422e18", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "9a17d662-5875-489d-b239-437951cc7c1e", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "30ea5b02-6fcc-4460-b722-1dab48bfa766", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "b0cdb6ab-b886-4e08-9c0d-8f6efb1072fa", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "b4db8105-7fa0-48b2-a740-78dc4c091b78", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 15, "id": "b4bd7c2d-1910-45ec-8580-23e3dfdb6446", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/北京市.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/广西壮族自治区.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/河北省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/海南省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/天津市.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/青海省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/重庆市.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/云南省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/香港特别行政区.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/其他国家.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/湖南省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/内蒙古自治区.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/陕西省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/澳门特别行政区.csv.gz\n", "5112万专利申请全量数据1985-2025年/甘肃省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/江苏省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/台湾省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/吉林省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/江西省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/安徽省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/上海市.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/福建省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/黑龙江省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/贵州省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/湖北省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/四川省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/浙江省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/山西省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/广东省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/西藏自治区.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/山东省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/新疆维吾尔自治区.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/辽宁省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/河南省.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "5112万专利申请全量数据1985-2025年/宁夏回族自治区.csv.gz\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ ":20: SettingWithCopyWarning: \n", "A value is trying to be set on a copy of a slice from a DataFrame.\n", "Try using .loc[row_indexer,col_indexer] = value instead\n", "\n", "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "记录数: 523\n", "CPU times: user 15min 38s, sys: 46.4 s, total: 16min 24s\n", "Wall time: 16min 27s\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
年度实用新型发明公开外观设计发明授权省份
019981000北京市
120000001北京市
\n", "
" ], "text/plain": [ " 年度 实用新型 发明公开 外观设计 发明授权 省份\n", "0 1998 1 0 0 0 北京市\n", "1 2000 0 0 0 1 北京市" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%time\n", "import tqdm import tqdm\n", "\n", "AI_rela_words = '人工智能|机器学习|AI|NLP|智能问答|智能问答|神经机器翻译|NLU|增量学习'\n", "AI_Relatives_Patents = []\n", "\n", "\n", "for file in tqdm(files):\n", " print(file)\n", " prov = file.split('/')[-1].replace('.csv.gz', '')\n", "\n", " df = pd.read_csv(file, \n", " compression='gzip', \n", " usecols=['专利名称', '摘要文本', '主权项内容', '申请日', '专利类型']\n", " )\n", " \n", " mask1 = df['专利名称'].fillna('').str.contains(AI_rela_words)\n", " mask2 = df['摘要文本'].fillna('').str.contains(AI_rela_words)\n", " mask3 = df['主权项内容'].fillna('').str.contains(AI_rela_words)\n", "\n", " ai_df = df[mask1 & mask2 & mask3]\n", " ai_df['year'] = ai_df[\"申请日\"].apply(lambda d:d[:4])\n", " \n", " #保存全国AI专利详情信息\n", " ai_df.to_csv('AI_details.csv', mode='a', index=False)\n", " \n", " for year, ai_year_df in ai_df.groupby('year'):\n", " data = dict()\n", " data['年度'] = year\n", " data['实用新型'] = (ai_year_df['专利类型']=='实用新型').sum()\n", " data['发明公开'] = (ai_year_df['专利类型']=='发明公开').sum()\n", " data['外观设计'] = (ai_year_df['专利类型']=='外观设计').sum()\n", " data['发明授权'] = (ai_year_df['专利类型']=='发明授权').sum()\n", " data['省份'] = prov\n", " AI_Relatives_Patents.append(data)\n", "\n", "\n", "ai_panel_df = pd.DataFrame(AI_Relatives_Patents)\n", "ai_panel_df.to_excel('AI_panel.xlsx', index=False)\n", "print('记录数:', len(ai_panel_df))\n", "ai_panel_df.head(2)" ] }, { "cell_type": "code", "execution_count": 17, "id": "fc6431b7-d0b0-4550-8df0-fcb9a62b98e4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
年度实用新型发明公开外观设计发明授权省份
019981000北京市
120000001北京市
220011001北京市
320020001北京市
420030000北京市
.....................
51820206006宁夏回族自治区
51920215006宁夏回族自治区
52020222005宁夏回族自治区
52120230009宁夏回族自治区
52220241000宁夏回族自治区
\n", "

523 rows × 6 columns

\n", "
" ], "text/plain": [ " 年度 实用新型 发明公开 外观设计 发明授权 省份\n", "0 1998 1 0 0 0 北京市\n", "1 2000 0 0 0 1 北京市\n", "2 2001 1 0 0 1 北京市\n", "3 2002 0 0 0 1 北京市\n", "4 2003 0 0 0 0 北京市\n", ".. ... ... ... ... ... ...\n", "518 2020 6 0 0 6 宁夏回族自治区\n", "519 2021 5 0 0 6 宁夏回族自治区\n", "520 2022 2 0 0 5 宁夏回族自治区\n", "521 2023 0 0 0 9 宁夏回族自治区\n", "522 2024 1 0 0 0 宁夏回族自治区\n", "\n", "[523 rows x 6 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ai_panel_df2 = pd.read_excel('AI_panel.xlsx')\n", "ai_panel_df2" ] }, { "cell_type": "code", "execution_count": null, "id": "0dc2fc86-a2ce-44e5-8c00-3984f0d08069", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 5 }