文本相似 | Lazy Prices公司年报内容变动预示重大风险

一个公司报告文件会有不同部分,我们需要将不同的部分分别识别出来。这里用到正则表达式,可以进行快速的数据清洗和数据抽取。文本转为向量后就可以进行相似度计算,...

2019-12-31 · 2 min · 大邓

当cnsenti遇上streamlit

streamlit是web包,cnsenti是文本分析包,两者结合即可制造在线文本分析网站。...

2018-06-07 · 1 min · 大邓

使用scipy实现层次聚类分析

使用scipy实现层次聚类分析...

2018-05-18 · 3 min · FamouseGuys

视频课程 | Python实证指标构建与文本分析

在科学研究中,数据的获取及分析是最重要的也是最棘手的两个环节!在前大数据时代,一般使用实验法、调查问卷、访谈或者二手数据等方式,将数据整理为结构化的表格数据,之后再使用各种计量分析方法,对这些表格数据进行分析。但大数据时代,网络数据成为各方学者亟待挖掘的潜在宝藏,大量商业信息、社会信息以文本等非结构化、异构型数据格式存储于海量的网页中。那么对于经管为代表的人文社科类专业科研工作者而言,通过Python可以帮助学者解决使用Web数据进行科研面临的两个问题: 网络爬虫技术 解决 如何从网络世界中高效地 采集数据?文本分析技术 解决 如何从杂乱的文本数据中实证指标(情感、态度、刻板印象等)?In scientific research, data acquisition and analysis are the most important and also the most difficult two links! In the pre-big data era, experimental methods, questionnaires, interviews, or second-hand data were generally used to organize data into structured tabular data, and then use various econometric analysis methods to analyze these tabular data. However, in the era of big data, network data has become a potential treasure that scholars from all walks of life urgently need to discover. A large amount of business information and social information are stored in massive web pages in unstructured and heterogeneous data formats such as text. So for the humanities and social sciences professional researchers represented by economics and management, Python can help scholars solve two problems faced by using Web data for scientific research: Web crawler technology solves how to efficiently collect data from the Internet world? Text analysis How can technical solutions extract empirical indicators (sentiment, attitudes, stereotypes, etc.) from messy text data?...

4 min · 大邓