爱可可AI论文推介(10月15日) LG-机器学习CL-计算与语言AS-音频与语

LG - 机器学习 CL - 计算与语言 AS - 音频与语音 IR - 信息检索
1、[LG]Characterising Bias in Compressed Models
S Hooker, N Moorosi, G Clark, S Bengio, E Denton
[Google Research]
模型压缩放大了深度网络的偏差。深度网络通过裁剪、量化等技术实现了高水平压缩，总体误差基本没有变化，但有一组数据承担了不成比例的高误差部分，称为子集压缩识别样本(CIE) ，对于CIE部分，压缩放大了算法偏差，对未充分表示的特征进行不成比例的修剪会影响性能，与通常公平性意义上的考虑相一致。 CIE集合可通过标注点来进行隔离。
The popularity and widespread use of pruning and quantization is driven by the severe resource constraints of deploying deep neural networks to environments with strict latency, memory and energy requirements. These techniques achieve high levels of compression with negligible impact on top-line metrics (top-1 and top-5 accuracy). However, overall accuracy hides disproportionately high errors on a small subset of examples; we call this subset Compression Identified Exemplars (CIE). We further establish that for CIE examples, compression amplifies existing algorithmic bias. Pruning disproportionately impacts performance on underrepresented features, which often coincides with considerations of fairness. Given that CIE is a relatively small subset but a great contributor of error in the model, we propose its use as a human-in-the-loop auditing tool to surface a tractable subset of the dataset for further inspection or annotation by a domain expert. We provide qualitative and quantitative support that CIE surfaces the most challenging examples in the data distribution for human-in-the-loop auditing.
文章插图
文章插图
2、[CL] Tired of Topic Models? Clusters of Pretrained Word Embeddings Make for Fast and Good Topics too!
S Sia, A Dalmia, S J. Mielke
[Johns Hopkins University]
预训练词嵌入聚类主题分析，对预训练的词嵌入进行聚类，同时合并文档信息进行加权聚类，并重排头部单词，实现无监督文本主题分析。实验表明，预训练的词嵌入(上下文化或非上下文化) ，与TF加权K-Means和基于TF的重排相结合，以较低复杂度和较低的运行时间，为传统主题建模提供了一种可行的替代方案。
Topic models are a useful analysis tool to uncover the underlying themes within document collections. The dominant approach is to use probabilistic topic models that posit a generative story, but in this paper we propose an alternative way to obtain topics: clustering pre-trained word embeddings while incorporating document information for weighted clustering and reranking top words. We provide benchmarks for the combination of different word embeddings and clustering algorithms, and analyse their performance under dimensionality reduction with PCA. The best performing combination for our approach performs as well as classical topic models, but with lower runtime and computational complexity.
文章插图

爱可可AI论文推介(10月15日)

推荐阅读

决战！平安京|决战平安京，返校季大揭秘，新手最全攻略！！！

十大蓝牙小音箱排行榜，酷狗蘑蘑小音节奏闪灯一键DJ“城会玩”

风筝|《追风筝的人》：每个人终其一生，都在追逐属于自己的命运

电视剧|同样是扮演“乞丐”，孙俪流鼻涕，周星驰抢狗盆，而他本色出演！

赵露思|《骄阳似我》又传出新阵容，李现将搭档赵露思，你认为符合吗？

顶级奢侈品集团Kering的代表品牌及产地和代表作品或风格有哪些

易简财经|27万股民今夜无眠，黄了！两家券商千亿级合并告吹

华为|华为迎来大消息！俄外长刚刚表态，俄罗斯准备与中国以及华为开展5G技术合作

『包不同』疫情之下，京东有惊喜！

#泌尿外科李成方#长期疲劳会影响身体的哪些部分？

亚美尼亚人|二十世纪初，亚美尼亚人遭到了大屠杀，100万人死于土耳其人之手

『雷帝网』CFO称汽车业回报周期长亏损是阶段性的，恒大健康亏近50亿

拜登|美国下任总统已定？拜登犯下“致命”错误，这次奥巴马也“保不住”他！

精子能有杀菌消炎作用吗

怎么样让别人愿意了解自己

时尚旅游守在富豪门前32年，谁能让他动就奖励一万，印度最“尽职”保安

口腔科张兴医生：种植牙要如何护理？牙科医生教你4招，轻松延长使用寿命！很实用

权志龙否认吸毒传闻，曾被指在机场行为异常，疑似毒瘾发作，12年前曾承认吸毒

李云迪和朗朗谁厉害？

高三最后一个学期是啥心态、