TechWeekly-08| 每周有趣有用的技术分享

OCRmyPDF

https://github.com/ocrmypdf/OCRmyPDF

OCRmyPDF 为扫描的 PDF 文件添加了 OCR 文本层，允许搜索或复制粘贴它们。

matplotx

https://github.com/nschloe/matplotx

Matplotlib扩展库，可以提供更多样式，简化样式设定

download

https://github.com/choldgraf/download

在线文件下载模块, 默认含进度条

from download import download
path = download(url, path, progressbar=True)

birdseye

https://github.com/alexmojaki/birdseye

Birdeye 是一个 Python 调试器，它记录函数调用中表达式的值，并让您在函数退出后轻松查看它们。例如：

python-pinyin

https://github.com/mozillazg/python-pinyin

汉语转拼音的 Python 库，将汉字转为拼音,可以用于汉字注音、排序、检索。

>>> from pypinyin import pinyin, lazy_pinyin, Style
>>> pinyin('中心')
[['zhōng'], ['xīn']]
>>> pinyin('中心', heteronym=True)  # 启用多音字模式
[['zhōng', 'zhòng'], ['xīn']]
>>> pinyin('中心', style=Style.TONE3, heteronym=True)
[['zhong1', 'zhong4'], ['xin1']]

textnets

https://github.com/jboynyc/textnets

利用网络做文本分析，可以参考这篇技术文 PNAS | 文本网络分析&文化桥梁Python代码实现

尽管网络分析通常用于描述人与人之间的关系——尤其是在社会科学中——但它也可以应用于词之间的关系。例如，网络关系可以通过文档中单个单词的共现来创建，或者可以使用双模式网络投影在文档之间创建关系。

基于网络的自动文本分析方法的优点是

像社会群体一样，可以通过三元闭包更准确地测量词组的含义——或者任何两个词或术语相互的含义的原则如果将它们放在第三个词的上下文中，可以更准确地理解；
文本网络可以应用于任何长度的文档，这与通常需要大量单词才能正常运行的主题模型不同。在简短的社交媒体文本变得普遍的时代，这是一个显着的优势。
最后，这种方法受益于社区检测跨学科文献的最新进展，可以说它提供了更准确的单词分组方法，这些方法受益于网络内观察到的聚类，而不是词袋模型。

whoogle-search

whoogle-search是一款可以自己架设，能够爬取谷歌搜索结果、无广告、不追踪、保护隐私的搜索引擎工具。

whoogle-search的安装部署方式非常丰富而且简单，可以通过Docker、Heroku、pip、手动等方式进行安装配置。

安装之后配置相应的ip和端口就可以启动whoogle-search服务。

以pip安装配置为例。

安装

pip install whoogle-search

启动服务

whoogle-search --host <your ip> --port <your port>

poetry

https://github.com/python-poetry/poetry

类似于pip，可以对python的项目进行包管理。

futurecoder

https://github.com/alexmojaki/futurecoder

交互式学习Python，供人们自学 Python 编程，尤其是完全的编程初学者。它经过精心设计，可减少挫折感并指导用户，同时确保他们学习如何解决问题。目标是让尽可能多的人学习编程。

attrs

https://github.com/python-attrs/attrs

没有样板的 Python 类, 主要目标是帮助您编写简洁且正确的软件，且不会减慢您的代码速度。案例引自

作者：小明链接：https://zhuanlan.zhihu.com/p/34963159 来源：知乎

class Product(object):
    def __init__(self, id, author_id, category_id, brand_id, spu_id, 
                 title, item_id, n_comments): 
        self.id = id
        self.author_id = author_id
        self.category_id = category_id
        self.brand_id = brand_id
        self.spu_id = spu_id
        self.title = title
        self.item_id = item_id
        self.n_comments = n_comments

如果用attrs，代码会更简洁

import attr

@attr.s(hash=True)
class Product(object):
    id = attr.ib()
    author_id = attr.ib()
    brand_id = attr.ib()
    spu_id = attr.ib()
    title = attr.ib(repr=False, cmp=False, hash=False)
    item_id = attr.ib(repr=False, cmp=False, hash=False)
    n_comments = attr.ib(repr=False, cmp=False, hash=False)

backtrader

https://github.com/mementum/backtrader

投资测量Python回测框架

from datetime import datetime
import backtrader as bt

class SmaCross(bt.SignalStrategy):
    def __init__(self):
        sma1, sma2 = bt.ind.SMA(period=10), bt.ind.SMA(period=30)
        crossover = bt.ind.CrossOver(sma1, sma2)
        self.signal_add(bt.SIGNAL_LONG, crossover)

cerebro = bt.Cerebro()
cerebro.addstrategy(SmaCross)

data0 = bt.feeds.YahooFinanceData(dataname='MSFT', fromdate=datetime(2011, 1, 1),
                                  todate=datetime(2012, 12, 31))
cerebro.adddata(data0)

cerebro.run()
cerebro.plot()

autopep8

https://github.com/hhatto/autopep8

一种自动格式化 Python 代码以符合 PEP 8 样式指南的工具。

OCRmyPDF#

matplotx#

download#

birdseye#

python-pinyin#

textnets#

whoogle-search#

poetry#

futurecoder#

attrs#

backtrader#

autopep8#