一、词嵌入

前几天刚刚分享了,

大数据时代下社会科学研究方法的拓展—基于词嵌入技术的文本分析的应用

人类在书信、网络论坛留下语言、文字的过程中,也留下了自己的偏见、态度等主观认知信息(偏见、态度)。

词嵌入做为一种词向量模型,可以从文本中计算出隐含的上下文情景信息,态度及偏见。通过词向量距离的测算,就可以间接测得不同群体对某概念(组织、群体、品牌、地域等)的态度偏见。感觉词嵌入技术用处很大,最近整理了下pnas、nature、science中的文献,对了,相当部分的pnas关于词嵌入的论文经常会提供原始数据及代码。

目前有些Python库可以使用词嵌入模型展示人类认知偏见, 如:



二、相关文献

  • 冉雅璇,李志强,刘佳妮,张逸石.大数据时代下社会科学研究方法的拓展——基于词嵌入技术的文本分析的应用[J/OL].南开管理评论:1-27[2022-04-08].http://kns.cnki.net/kcms/detail/12.1288.F.20210905.1337.002.html

  • Kozlowski, A.C., Taddy, M. and Evans, J.A., 2019. The geometry of culture: Analyzing the meanings of class through word embeddings. American Sociological Review, 84(5), pp.905-949.

  • Toubia, O., Berger, J. and Eliashberg, J., 2021. How quantifying the shape of stories predicts their success. Proceedings of the National Academy of Sciences, 118(26).

  • Caliskan A, Bryson JJ, Narayanan A. Semantics derived automatically from language corpora contain human-like biases. Science. 2017;356: 183–186.

  • Garg N, Schiebinger L, Jurafsky D, Zou J. Word embeddings quantify 100 years of gender and ethnic stereotypes . Proceedings of the National Academy of Sciences. 2018. pp. E3635–E3644. doi:10.1073/pnas.1720347115

  • Garg, N., Schiebinger, L., Jurafsky, D. and Zou, J., 2018. Word embeddings quantify 100 years of gender and ethnic stereotypes. Proceedings of the National Academy of Sciences, 115(16), pp.E3635-E3644.

  • Peng, H., Ke, Q., Budak, C., Romero, D.M. and Ahn, Y.Y., 2021. Neural embeddings of scholarly periodicals reveal complex disciplinary organizations. Science Advances, 7(17), p.eabb9004.

  • Waller, I. and Anderson, A., 2021. Quantifying social organization and political polarization in online platforms. Nature, 600(7888), pp.264-268.

  • Arseniev-Koehler, A., Cochran, S.D., Mays, V.M., Chang, K.W. and Foster, J.G., 2022. Integrating topic modeling and word embedding to characterize violent deaths. Proceedings of the National Academy of Sciences, 119(10), p.e2108801119.

  • Bollen, J., Ten Thij, M., Breithaupt, F., Barron, A.T., Rutter, L.A., Lorenzo-Luaces, L. and Scheffer, M., 2021. Historical language records reveal a surge of cognitive distortions in recent decades. Proceedings of the National Academy of Sciences, 118(30).

  • Kim, L., Smith, D.S., Hofstra, B. and McFarland, D.A., 2022. Gendered knowledge in fields and academic careers. Research Policy, 51(1), p.104411.

  • Lawson, M.A., Martin, A.E., Huda, I. and Matz, S.C., 2022. Hiring women into senior leadership positions is associated with a reduction in gender stereotypes in organizational language. Proceedings of the National Academy of Sciences, 119(9), p.e2026443119.

  • Brady, W.J., McLoughlin, K., Doan, T.N. and Crockett, M.J., 2021. How social learning amplifies moral outrage expression in online social networks. Science Advances, 7(33), p.eabe5641.

  • Bailey, A.H., Williams, A. and Cimpian, A., 2022. Based on billions of words on the internet, people= men. Science Advances, 8(13), p.eabm2463.

  • Lewis, M. and Lupyan, G., 2020. Gender stereotypes are reflected in the distributional structure of 25 languages. Nature human behaviour, 4(10), pp.1021-1028.

  • Schramowski, P., Turan, C., Andersen, N., Rothkopf, C.A. and Kersting, K., 2022. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence, 4(3), pp.258-268.

  • Costa-jussà, M.R., 2019. An analysis of gender bias studies in natural language processing. Nature Machine Intelligence, 1(11), pp.495-496.

  • Rodman, E., 2020. A timely intervention: Tracking the changing meanings of political concepts with word vectors. Political Analysis, 28(1), pp.87-111.

  • Bhatia, S., 2017. Associative judgment and vector space semantics. Psychological review, 124(1), p.1.

  • Kurdi, B., Mann, T.C., Charlesworth, T.E. and Banaji, M.R., 2019. The relationship between implicit intergroup attitudes and beliefs. Proceedings of the National Academy of Sciences, 116(13), pp.5862-5871.

  • Charlesworth, T.E., Yang, V., Mann, T.C., Kurdi, B. and Banaji, M.R., 2021. Gender stereotypes in natural language: Word embeddings show robust consistency across child and adult language corpora of more than 65 million words. Psychological Science, 32(2), pp.218-240.

  • Bhatia, S., 2019. Predicting risk perception: New insights from data science. Management Science, 65(8), pp.3800-3823.

  • Rheault, L. and Cochrane, C., 2020. Word embeddings for the analysis of ideological placement in parliamentary corpora. Political Analysis, 28(1), pp.112-133.

  • Yang, K., Lau, R.Y. and Abbasi, A., 2022. Getting Personal: A Deep Learning Artifact for Text-Based Measurement of Personality. Information Systems Research.

  • Rodman, E., 2020. A timely intervention: Tracking the changing meanings of political concepts with word vectors. Political Analysis, 28(1), pp.87-111.

  • Margulis, E.H., Wong, P.C., Turnbull, C., Kubit, B.M. and McAuley, J.D., 2022. Narratives imagined in response to instrumental music reveal culture-bounded intersubjectivity. Proceedings of the National Academy of Sciences, 119(4).

  • Thompson, B., Roberts, S.G. and Lupyan, G., 2020. Cultural influences on word meanings revealed through large-scale semantic alignment. Nature Human Behaviour, 4(10), pp.1029-1038.



三、相关代码



广而告之