小熊回收站|txtai：基于 Transformer 的人工智能搜索引擎( 三 ) 自然语言处理领域正在迅速发展

from txtai.embeddings import Embeddingsfrom txtai.extractor import Extractor# Create embeddings model, backed by sentence-transformers & transformersembeddings = Embeddings({"method": "transformers", "path": "sentence-transformers/bert-base-nli-mean-tokens"})# Create extractor instanceextractor = Extractor(embeddings, "distilbert-base-cased-distilled-squad")下一步是加载一组要提问的结果。下面的示例包含一系列竞赛的体育比分的文本片段：
sections = ["Giants hit 3 HRs to down Dodgers","Giants 5 Dodgers 4 final","Dodgers drop Game 2 against the Giants, 5-4","Blue Jays 2 Red Sox 1 final","Red Sox lost to the Blue Jays, 2-1","Blue Jays at Red Sox is over. Score: 2-1","Phillies win over the Braves, 5-0","Phillies 5 Braves 0 final","Final: Braves lose to the Phillies in the series opener, 5-0","Final score: Flyers 4 Lightning 1","Flyers 4 Lightning 1 final","Flyers win 4-1"]# Add unique id to each section to assist with qa extractionsections = [(uid, section) for uid, section in enumerate(sections)]questions = ["What team won the game?", "What was score?"]execute = lambda query: extractor(sections, [(question, query, question, False) for question in questions])for query in ["Red Sox - Blue Jays", "Phillies - Braves", "Dodgers - Giants", "Flyers - Lightning"]:print("----", query, "----")for answer in execute(query):print(answer)print()# Ad-hoc questionsquestion = "What hockey team won?"print("----", question, "----")print(extractor(sections, [(question, question, question, False)]))上面示例代码的运行结果如下：
我们可以看到 Extractor（抽取器）能够理解上面部分的上下文，并且能够回答相关的问题。 Extractor 组件可以使用 txtai Embeddings 索引以及外部数据存储。这种模块化允许我们选择使用 txtai 中的那些功能来创建自然语言感知的搜索系统。
延伸阅读更为详细 txtai 示例和用例，可以在下面的 notebook 中找到。

Google Colaboratory Part 1： txtai 介绍
Google Colaboratory Part 2： Extractive QA with txtai
Google Colaboratory Part 3：从数据源构建嵌入索引
Google Colaboratory Part 4： Extractive QA with Elasticsearch

结语

自然语言处理正在飞速发展，一年前都不可能实现的事情，现在已经成为可能。本文介绍了一个由人工智能驱动的搜索引擎 txtai ，它可以快速整合强大的模型与对自然语言的深刻理解。 Hugging Face 模型中心有很多基础模型和社区提供的模型，可以用来定制几乎所有数据集的搜索。可能性是无限的，我们很高兴看到人们在 txtai 之上可以建立什么！
作者介绍：
David Mezzetti ， MeuML 创始人 / 首席执行官，专注于应用机器学习解决日常问题。曾与他人共同创立 Data Works ，并将其打造成一家成功的 IT 服务公司。
原文链接：
关注我并转发此篇文章，私信我“领取资料” ，即可免费获得InfoQ价值4999元迷你书！