
数据科学家花费 80% 以上的时间来准备数据,这其中主要是数据清洗、数据标注。随着 GPT-4 等大型语言模型 (LLM)的兴起,现在我们可以更高效的准备工作。在本文中,我们将探讨如何使用 LLM 进行数据标注,以提高文本注释的准确性、效率和可扩展性,并最终为 ML 项目带来更好的结果。 Data scientists spend over 80% of their time preparing data, including data labeling. With the rise of Large Language Models (LLMs) like GPT-4, we now have the tools to streamline this process significantly.In this article, we’ll explore how to use LLM for data labeling to enhance the accuracy, efficiency, and scalability of text annotations and ultimately drive better outcomes for ML projects....

2024-08-04 · 2 min · Yuliia Kniazieva

arXiv2024 | 使用大语言模型自动进行定性研究中的扎根理论开发

在当今的学术界,定性研究因其深入挖掘现象背后的原因和逻辑而备受重视。然而,定性数据的分析往往耗时且成本高昂。现在,随着chatGPT这类大语言模型的问世,这一局面可能即将改变。AcademiaOS是一个创新的开源平台,它利用大型语言模型(LLMs)的能力,自动化地进行地面理论的发展,为定性研究带来了新的视角。AcademiaOS is a first attempt to automate grounded theory development in qualitative research with large language models. Using recent large language models’ language understanding, generation, and reasoning capabilities, AcademiaOS codes curated qualitative raw data such as interview transcripts and develops themes and dimensions to further develop a grounded theoretical model, affording novel insights. A user study (n=19) suggests that the system finds acceptance in the academic community and exhibits the potential to augment humans in qualitative research. AcademiaOS has been made open-source for others to build upon and adapt to their use cases....

2024-08-02 · 2 min · Übellacker Thomas

数据集 | 聚焦美股企业社会责任CSR Wire网站新闻数据集(1999-2024)


2024-07-19 · 2 min · 陈世强

数据集(英文) | CBS News新闻数据集(1998 ~ 2024)

新闻数据集研究价值大, 您可从中提取丰富的指标,包括但不限于经济政策不确定性指数EPU 、 媒体关注度指数、文本相似度、情感分析。而且可训练词向量,构建新的词典,开发新的指标指数。计算机自然语言处理、经济学、管理学、新闻传播学、公共管理等领域均可使用。...

2024-07-13 · 2 min · 大邓

数据集 | ChinaDaily 新闻数据集(2008 ~ 2024)

2024-07-12 · 2 min · 大邓