爱可可AI论文推介(10月9日)( 三 )


爱可可AI论文推介(10月9日)文章插图
爱可可AI论文推介(10月9日)文章插图
爱可可AI论文推介(10月9日)文章插图
5、[CL]WikiLingua: A New Benchmark Dataset for Cross-Lingual Abstractive Summarization
F Ladhak, E Durmus, C Cardie, K McKeown
[Columbia University & Cornell University]
跨语种抽象摘要新基准WikiLingua , 一个跨语言和多语言抽象摘要的基准数据集 , 从WikiHow中提取了18种语言的文章和摘要对 , WikiHow是个高质量的协作资源 , 提供了人工撰写的一系列不同主题的操作指南 。 通过对齐文章中用于描述每个how-to步骤的图像 , 创建了跨语言的金标准文章-摘要对齐 。
We introduce WikiLingua, a large-scale, multilingual dataset for the evaluation of crosslingual abstractive summarization systems. We extract article and summary pairs in 18 languages from WikiHow, a high quality, collaborative resource of how-to guides on a diverse set of topics written by human authors. We create gold-standard article-summary alignments across languages by aligning the images that are used to describe each how-to step in an article. As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our dataset. We further propose a method for direct crosslingual summarization (i.e., without requiring translation at inference time) by leveraging synthetic data and Neural Machine Translation as a pre-training step. Our method significantly outperforms the baseline approaches, while being more cost efficient during inference.
爱可可AI论文推介(10月9日)文章插图
爱可可AI论文推介(10月9日)文章插图


推荐阅读