深度推理模型写高考英语作文谁更强?记者实测,名校英语教师点评

0次浏览     发布时间:2025-06-09 09:19:00    

6月8日下午,2025年高考英语科目考试结束。在今年高考英语北京卷的作文题目中,李华又一次成为“主角”。

与此同时,这场"语言能力大考"也成为AI大模型的试金石——从精准审题到地道表达,从逻辑连贯到文化适配,大语言模型能否作为“李华”完成高考英语作文题目的独特要求?在各个大模型均已具备“深度思考”模式的当下,“AI考生”能拿到多少分?对此,新京报贝壳财经对国内外主流大语言模型进行了测评。

本次测评选择2025年高考英语北京卷作文题目,测评对象为DeepSeek R1、ChatGPT o3、通义千问Qwen3、腾讯混元T1、讯飞星火X1、百度文心X1六款深度推理模型产品,并邀请北京市十一学校一分校英语老师韩宪昌、深圳中学英语教师赵文嘉参照往年高考评分标准对上述大模型生成的英语作文打分并作点评。

本着客观公正和“盲评”原则,韩宪昌老师和赵文嘉老师给出了专业评分。本次参赛的六名大模型“AI考生”得分揭晓后由高到低的排名分别为:讯飞星火 X1、DeepSeek R1、百度文心X1、通义千问Qwen3、ChatGPT o3、腾讯混元T1。

具体来看,2025高考英语北京卷作文题目的要求为:

假设你是红星中学高三学生李华。联合国正面向全球青少年开展倡议征集活动。你的外国好友Jim打算参加,为此发来邮件,就倡议内容询问你的建议。请你用英文给他回复,内容包括:

1. 提出的建议;

2. 建议的理由。

提示词:倡议 proposal

注意:

1.词数100左右;

2.开头和结尾已给出,不计入总词数。

Dear Jim,

______________________________________________________________________

Yours,

Li Hua

以下是将上述作文题目要求作为提示词输入大模型后,六款大模型产品在打开深度思考模式,并关闭了联网搜索模式后生成的结果,以及韩宪昌老师和赵文嘉老师给出的评分和评语:

DeepSeek R1

Dear Jim,

I'm thrilled to hear you're joining the UN initiative! For your proposal, I suggest focusing on promoting plastic-free campuses worldwide.

My advice is to encourage students to reduce single-use plastics by bringing reusable bottles and bags, recycling plastic waste, and organizing school-wide awareness campaigns.

I recommend this for two reasons. First, plastic pollution is a global crisis harming wildlife and ecosystems. Schools, as micro-communities, can drive real change by adopting eco-friendly habits early. Second, young people’s collective action can inspire broader societal shifts. Starting locally creates ripples of impact—proving our generation’s power to protect the planet.

This proposal not only tackles an urgent issue but also unites students across borders. I believe it’s practical, impactful, and truly aligns with the UN’s goals.

Go for it—I’m rooting for you!

Yours,

Li Hua

韩宪昌老师评分:内容7分,语言8分,结构4分 → 总分19分

评语:内容详实,建议“无塑校园”具体,理由全面(环境危机与青年行动),高度相关主题。语言精准,句式丰富(复合句),无错误。结构严谨,逻辑流畅,结尾有力。但是,明显超字数限制,扣除1分。

赵文嘉老师评分:19分

评语:该模型内容上要点都有提及,进行了细节补充;语言上熟练运用现在分词做后置定语,介词短语做状语等;结构严谨且逻辑流畅。但是UN initiatives翻译不够具体到位,以及字数较多实际考试不一定可以写完。

ChatGPT o3

Dear Jim,

I recommend you propose a project named “Zero-Plastic Lunchbox.” The idea is to encourage secondary schools to replace single-use lunch packaging with reusable stainless-steel containers. Participating students record their plastic reduction on a shared app and post creative waste-cutting tips each week.

This works for two reasons. First, plastic waste is a major threat to marine life, yet teenagers can curb it through a simple lunch-time habit. Second, the app’s quantifiable data will impress the UN panel by demonstrating clear environmental benefits and worldwide scalability.

Yours,

Li Hua

韩宪昌老师评分:内容6分,语言5分,结构3分 → 总分14分

评语:内容覆盖要点,但“零塑饭盒”理由浅显(未深挖全球影响)。语言有错误(如“This works”口语化),句式单调。结构基本清晰,但发展不足。

赵文嘉老师评分:15分

评语:开头缺乏背景,参加UN倡议收集活动,结束得比较生硬,结构上缺乏结尾如祝福期望。部分选词用法不准确及表达不够完整。理由不够深入。

通义千问Qwen3

Dear Jim,

I'm glad to hear about your interest in the UN proposal campaign. I suggest focusing on "Promoting Digital Literacy for Rural Youth". Many young people in remote areas lack access to quality tech education, limiting their future opportunities.

Your proposal could call for global partnerships to provide affordable devices, online courses, and teacher training in underserved regions. This aligns with the UN’s Sustainable Development Goals and empowers youth to bridge the digital divide. As students, you can share personal stories to highlight how tech skills transform lives.

I’m sure this idea will resonate with the UN and showcase youth-driven solutions!

Yours,

Li Hua

韩宪昌老师评分:内容6分,语言7分,结构4分 → 总分17分

评语:内容基本完整,建议“农村数字素养”合理,但理由稍简略(未充分展开影响)。语言准确,句式较简单,表达得体。结构清晰,衔接自然。

赵文嘉老师评分:17分

评语:因未使用连接词,结构和逻辑相比稍显逊色。

腾讯混元T1

Dear Jim,

Glad to hear you’re working on the UN youth proposal. Here’s my idea: Focus on "Youth-Led Green Campus Initiatives".

Teens are key to shaping a sustainable future, and schools are where we spend most of our time. Proposing eco-actions—like recycling competitions, energy-saving challenges, or plant-a-tree days—can turn small habits into big change. These activities are easy to organize, fun, and teach us responsibility. Plus, they show the UN how youth drive real-world solutions.

Hope this helps. Can’t wait to hear your final plan—go for it, Jim!

Yours,

Li Hua

韩宪昌老师评分:内容5分,语言5分,结构2分 → 总分12分

评语:内容基本相关,但建议“绿色校园”理由泛泛(缺乏具体支撑)。语言口语化(如“Glad to hear”),句式简单,得体性弱。结构松散,结尾随意。

赵文嘉老师评分:14分

评语:结构不够清晰,缺乏连接词,理由不丰富,不够深入。

讯飞星火 X1

Dear Jim,

I’m glad to help with your proposal! I suggest proposing a "Global Youth Climate Action Network". This initiative would connect teenagers worldwide to share eco-friendly projects, like school recycling programs or community tree-planting, and collaborate on solutions to climate issues.

The reason is twofold: First, it empowers young people to take practical steps against environmental problems while spreading awareness. Second, uniting diverse voices under one platform highlights youth’s role in shaping a sustainable future, which aligns perfectly with the UN’s goals.

Let me know if you need further details!

Yours,

Li Hua

韩宪昌老师评分:内容8分,语言8分,结构4分 → 总分20分

评语:内容完整,建议“全球气候行动网络”明确,理由充分且紧扣主题。语言基本准确,句式多样(如“twofold”结构)。结构清晰,段落衔接自然。

赵文嘉老师评分:19分

评语:审题上题目理解准确,内容上核心要点都提到了且做了较为深入的解释。语言上用了较多形式多变的句法结构。结构上第一段与第二段篇幅相当,通常来讲如果采取总分总的结构,可以通过篇幅和内容来突出中间核心段落。

百度文心X1

Dear Jim,

I’m glad to know you’re participating in the UN’s proposal collection. My suggestion is to focus on ""Promoting Traditional Craftsmanship Through Digital Innovation"".

Firstly, many ancient handicrafts are fading due to globalization. By integrating digital tools like 3D modeling or VR, teenagers can help artisans preserve techniques and reach wider audiences. Secondly, this proposal aligns with sustainability—reviving local crafts reduces industrial waste and supports eco-friendly production. Lastly, it fosters cultural exchange, as young people worldwide could share stories behind these crafts, enhancing mutual understanding.

I believe this idea balances creativity, cultural preservation, and tech engagement, which the UN values. Let me know if you need more details!

Yours,

Li Hua

韩宪昌老师评分:内容7分,语言8分,结构4分 → 总分19分

评语:内容创新,建议“数字工艺保护”详实,理由充分(文化可持续性)。语言优秀,句式多样(如“Firstly...Lastly”),无错误。结构合理,过渡流畅。

赵文嘉老师评分:18分

评语:结构清晰干练,理由充分具体,选词和句法准确多样。

梳理两位老师给出的平均分,六款大模型的排名如下:

对于上述AI大模型互相之间的对比,韩宪昌老师表示,它们都能在内容方面切中要点,完成写作任务。但是,内容细节多少、细节逻辑关系强弱、观点表达精准性与简洁性等方面,差别比较大。

赵文嘉老师则表示,有的模型的作文无论是语法,句法还是选词都非常精准到位,且逻辑清晰,理由充分深入,有的模型的作文相比之下则逻辑性欠缺,展开得不够具体深入。

韩老师告诉新京报贝壳财经记者,这些AI生成的作文立意都比较创新,因为AI大模型在短时间内相关信息搜索和整合能力强于考生。而真实考生在有限时间内,很难有特别创新的角度,一般会求稳而采用保守的角度,且是平时训练过的相关的话题内容迁移到考场上。

而在赵老师看来,上述AI大模型生成作文的特点是选词都非常高级和相对准确,可以合理运用不同的词汇来表达。部分语法如非谓语动词,定语从句的应用非常熟练。内容比较深入具体。而和真实考生的区别主要在于选词以及内容表达的深入程度上。但是相比真实考生中写作非常优秀的同学,AI生成的作文句式还是不够丰富,用的比较固定,如分词做后置定语,状语,以及定语从句。除此之外真实考生还会用到各种类型的倒装,动名词的复合结构和独立主格等。

“从具体内容来讲,AI大模型内嵌的评分标准和写作方向差别很大,有的作文内容过于简单,有的过于冗长和复杂,这应该跟大模型内部的数据库有很大关系。”韩老师说。

值得注意的是,上述“AI考生”均为具有深度思考功能的推理大模型。随着今年春节期间DeepSeek R1的走红,各家主流大模型公司对推理大模型的重视程度与日俱增。5月28日,DeepSeek更新了DeepSeek-R1-0528版本;5月21日,腾讯旗下的深度思考模型混元T1和快思考模型混元Turbo S双双迭代;4月29日,阿里巴巴开源新一代通义千问模型Qwen3,其可同时支持推理和非推理任务;4月25日,百度发布了深度思考模型X1 Turbo;4月20日,科大讯飞对深度推理大模型讯飞星火X1进行了全新升级。

可以发现,在今年4至5月,国内主流大模型均进行了推理层面的更新,这也带来了其在英语写作等方面的能力进步,期待国产大模型的下一次进化。

文/罗亦丹

编辑 杨娟娟 校对 王心

相关文章:

深度推理模型写高考英语作文谁更强?记者实测,名校英语教师点评06-09

老师考场外上演“望崽成龙”,老师:比自己高考还紧张,学生安慰:老师你别再哭了。06-08

来了!2025全国高考作文题汇总06-07

红果短剧发布“掘金计划”,投入数亿元预算扶持“拉新好剧”06-06

AI网络+自动驾驶,蘑菇车联获评北京市两业融合成长型试点企业06-06

2025年南山区初中历史青年教师教学能力大赛决赛落幕05-26

对话杭城四大水域,在留下开启西湖文脉探寻之旅05-26

西安:“AI+”赋能城市创新发展05-20