羊驼系列大模型和ChatGPT差多少？详细测评后，我沉默了( 二 ) _大模型

╔═══════════╦══════════╦║Model║ Accuracy ║╠═══════════╬══════════╬║ ChatGPT║100%║║ Vicuna║85%║ ║ MPT║30%║╚═══════════╩══════════╩下面是一个 MPT 答错的例子：

文章插图
结论在这个非常简单的测试中，测试者使用相同的问题、相同的 prompt 得出的结论是：ChatGPT 在准确性方面远远超过了 Vicuna 和 MPT 。
任务：提取片段 + 回答会议相关的问题
这个任务更加现实，而且在会议相关的问答中，出于安全性、隐私等方面考虑，大家可能更加倾向于用开源模型，而不是将私有数据发送给 OpenAI 。
以下是一段会议记录（翻译结果来自 DeepL ，仅供参考）：

文章插图

文章插图

测试者给出的第一个测试问题是：「Steven 如何看待收购一事？」， prompt 如下：

qa_attempt1 = guidance ('''{{#system~}}{{llm.default_system_prompt}}{{~/system}}{{#user~}}You will read a meeting transcript, then extract the relevant segments to answer the following question:Question: {{query}}Here is a meeting transcript:----{{transcript}}----Please answer the following question:Question: {{query}}Extract from the transcript the most relevant segments for the answer, and then answer the question.{{/user}}{{#assistant~}}{{gen 'answer'}}{{~/assistant~}}''')

ChatGPT 给出了如下答案：

文章插图

虽然这个回答是合理的，但 ChatGPT 并没有提取任何对话片段作为答案的支撑（因此不符合测试者设定的规范）。测试者在 notebook 中迭代了 5 个不同的 prompt ，以下是一些例子：

qa_attempt3 = guidance ('''{{#system~}}{{llm.default_system_prompt}}{{~/system}}{{#user~}}You will read a meeting transcript, then extract the relevant segments to answer the following question:Question: {{query}}Here is a meeting transcript:----{{transcript}}----Based on the above, please answer the following question:Question: {{query}}Please extract from the transcript whichever conversation segments are most relevant for the answer, and then answer the question.Note that conversation segments can be of any length, e.g. including multiple conversation turns.Please extract at most 3 segments. If you need less than three segments, you can leave the rest blank.As an example of output format, here is a fictitious answer to a question about another meeting transcript.CONVERSATION SEGMENTS:Segment 1: Peter and John discuss the weather.Peter: John, how is the weather today?John: It's raining.Segment 2: Peter insults JohnPeter: John, you are a bad person.Segment 3: BlankANSWER: Peter and John discussed the weather and Peter insulted John.{{/user}}{{#assistant~}}{{gen 'answer'}}{{~/assistant~}}''')

在这个新的 prompt 中， ChatGPT 确实提取了相关的片段，但它没有遵循测试者规定的输出格式（它没有总结每个片段，也没有给出对话者的名字）。
不过，在构建出更复杂的 prompt 之后， ChatGPT 终于听懂了指示：

qa_attempt5 = guidance ('''{{#system~}}{{llm.default_system_prompt}}{{~/system}}{{#user~}}You will read a meeting transcript, then extract the relevant segments to answer the following question:Question: What were the main things that hAppened in the meeting?Here is a meeting transcript:----Peter: HeyJohn: HeyPeter: John, how is the weather today?John: It's raining.Peter: That's too bad. I was hoping to go for a walk later.John: Yeah, it's a shame.Peter: John, you are a bad person.----Based on the above, please answer the following question:Question: {{query}}Please extract from the transcript whichever conversation segments are most relevant for the answer, and then answer the question.Note that conversation segments can be of any length, e.g. including multiple conversation turns.Please extract at most 3 segments. If you need less than three segments, you can leave the rest blank.{{/user}}{{#assistant~}}CONVERSATION SEGMENTS:Segment 1: Peter and John discuss the weather.Peter: John, how is the weather today?John: It's raining.Segment 2: Peter insults JohnPeter: John, you are a bad person.Segment 3: BlankANSWER: Peter and John discussed the weather and Peter insulted John.{{~/assistant~}}{{#user~}}You will read a meeting transcript, then extract the relevant segments to answer the following question:Question: {{query}}Here is a meeting transcript:----{{transcript}}----Based on the above, please answer the following question:Question: {{query}}Please extract from the transcript whichever conversation segments are most relevant for the answer, and then answer the question.Note that conversation segments can be of any length, e.g. including multiple conversation turns.Please extract at most 3 segments. If you need less than three segments, you can leave the rest blank.{{~/user}}{{#assistant~}}{{gen 'answer'}}{{~/assistant~}}''')
上一页
1
2
3
4
5
下一页
		  	





























推荐阅读

           
                  
              
                  发生交通事故后应做什么，发生交通事故后第一时间怎么办 
                
                   
                
              
            

                  
              
                  皮面鞋和绒面哪个老气 绒面牛皮鞋怎么保洁 
                
                   
                
              
            

                  
              
                  湖南匡老五辣业 干辣椒种植产地调差大解析 
                
                   
                
              
            

                  
              
                  冰可诺|云顶金刚不坏蛇女 4盾4星4秘绝对防御 
                
                   
                
              
            

                  
              
                  痤疮大如鹌鹑蛋 (口服)中药1月看不见——烟台中西医肛肠医院中医科张伟先主任妙手回春 
                
                   
                
              
            

                  
              
                  苹果手机直角边框复古设计iPhone 12系列，iPhone8价格底线持续下降 
                
                   
                
              
            

                  
              
                  半月谈网|当网暴变成黑色“生意” 吃瓜谨防变“帮凶” 
                
                   
                
              
            

                  
              
                  #大头娃娃#郴州再现大头娃娃事件，家长得知真相后崩溃：我女儿喝了2年饮料 
                
                   
                
              
            

                  
              
                  新泰市行政审批服务局|窗口服务再提升——市行政审批服务局召开第二季度表扬会，创先争优勇担当 
                
                   
                
              
            

                  
              
                  布洛芬可以退烧吗 布洛芬缓释胶囊的作用与功效 
                
                   
                
              
            

                  
              
                  喝桂花茶的禁忌,桂花茶的采制技巧和保健功效 
                
                   
                
              
            

                  
              
                  打响湖南黑茶产业 借鉴福建走精品茶叶发展之路 
                
                   
                
              
            

                  
              
                  普通二本学校本科毕业生，是该留在本地的省会城市还是去上海找工作 ? 
                
                   
                
              
            

                  
              
                  汽车|净资产仅剩1亿！汽车行业洗牌，力帆历经“至暗时刻” 
                
                   
                
              
            

                  
              
                  智能电视两大系统UI对比，原来系统体验好和无广告才是重中之重 
                
                   
                
              
            

                  
              
                  周三|欧美股市全线上涨，黄金一举收复1960美元关口 
                
                   
                
              
            

                  
              
                  华为|14亿！华为或再次“血洗”全球？日媒：5G仅仅是个开始 
                
                   
                
              
            

                  
              
                  对公司|两连板东宝生物收函：上半年净利降六成，股价却13天涨92% 
                
                   
                
              
            

                  
              
                  钟表上 当分针和时针 重合和180度时 分别都是啥时刻 
                
                   
                
              
            

                  
              
                  电容单位的换算和表示方法 电容的单位是什么 
                
                   
                
              
            

          

一文读懂什么是AIGC、ChatGPT、大模型 

考研|军官职业发展系列谈之三——“考研” 

大模型赛道正“热”：卷场景、卷芯片、卷人才 

AI大模型的未来市场在中国 

“AI的商业化路线已经清晰” 2023京东“赶考”千亿级产业大模型 

鱼龙混杂大模型：谁在蹭热点？谁有真实力？ 

MathGPT来了！专攻数学大模型，解题讲题两手抓 

大模型“群雄逐鹿”，科大讯飞何以脱颖而出？ 

除了推出大模型，AI发展还应做什么 

欧莱雅护肤系列分别适用的年龄段;欧莱雅护肤品哪种好用?