4月18日上午 | Keynote Session | 科研实验大楼一楼
林宙辰
演讲内容摘要
Equivariance is an important property in many data processing tasks such as image recognition. Deep networks can save a lot of parameters and be more robust if equivariance is taken into account. However, existing equivariant deep networks basically focus on rotational (and translational) equivariance, and affine and projective equivariance is hard to fulfill due to the much higher degree of freedom in affine and projective groups. In this talk, I will present our recent work on using differential invariants to design practical affine and projetive equivariant networks. Promising experimental results are also shown.
演讲嘉宾简介
林宙辰是北京大学博雅特聘教授,智能学院副院长,研究领域为机器学习和计算机视觉。他在人工智能核心期刊和会议上发表论文360余篇,出版中英文专著5本,谷歌引用数为43,000余次。他曾多次担任多个业内顶级会议的领域主席和资深领域主席,包括CVPR、ICCV、ICML、NIPS/NeurIPS、AAAI、IJCAI 和ICLR,曾任ICPR 2022程序共同主席、IEEE Transactions on Pattern Analysis and Machine Intelligence(IEEE TPAMI) 编委,现任ICML董事会成员、International Journal of Computer Vision、Optimization Methods and Software和自动化学报的编委。他曾获2024年度中国自动化学会、2023年度中国人工智能学会和2020年度中国计算机学会的科学技术奖自然科学一等奖及第二届“祖冲之奖”年度突出成果奖(均排名第一)。他是中国图象图形学学会(CSIG)出版工委主任及机器视觉专委会原主任、中国自动化学会模式识别与机器智能专委会副主任,CCF、CSIG、AAIA、IAPR和IEEE 的会士,国家杰青,科技部科技创新2030-“新一代人工智能”重大项目负责人。
09:45 - 10:05 Coffee Break
董彬
演讲内容摘要
数学研究长期面临诸多限制效率的瓶颈问题,而人工智能技术的引入为突破这些瓶颈提供了新的可能。在这样的背景下,“AI for Mathematics”(AI4M)作为一个新兴的交叉研究领域应运而生。本报告将首先从数学研究自身的挑战与需求谈起,阐释为什么数学需要AI的深度赋能;继而介绍近年来AI4M领域的一些代表性成果,并比较不同技术路线的优势与局限。在此基础上,报告将指出,要显著提升AI的数学推理能力,核心在于推进数学知识的形式化,即数学的“数字化”。最后,报告将介绍北京大学AI4M团队的整体研究规划,展示团队在形式化模型与工具设计、自动推理系统构建以及高质量评测集开发等方面取得的阶段性成果,并对AI4M的未来发展进行展望。
演讲嘉宾简介
董彬,北京大学博雅特聘教授,任职于北京大学北京国际数学研究中心,兼任北京大学国际机器学习研究中心副主任、北京中关村学院常务副院长。主要研究领域为机器学习、科学计算和计算成像。曾获得2014年求是杰出青年学者奖,受邀在2022年世界数学家大会(ICM)作45分钟报告,入选2023年新基石研究员项目,同年获得王选杰出青年学者奖,受邀在2027年国际工业与应用数学大会(ICIAM)作邀请报告。
TBD
演讲内容摘要
大模型的能力在最近几年取得了突飞猛进的发展,一次次地刷新大家对它的期待,但与之形成鲜明对比的是我们对于大模型机理的认知还是远远不够的。我之前的研究方向是理论计算机,试图理解关于计算与复杂性的机理。近一年来,我主要在学习大模型的知识,特别是关注关于大模型机理的研究,在这个报告中我会汇报一下我这段时间以来的学习体会。
演讲嘉宾简介
陆品燕,上海财经大学“长江学者”特聘教授,计算机与人工智能学院的创院院长。 2009年1月于清华大学计算机系获博士学位后加入微软亚洲研究院,历任理论组副研究员,研究员,主管研究员,2015年12月加盟上海财经大学,领衔组建理论计算机科学研究中心(ITCS)。 他的主要研究方向是理论计算机,近期开始关注大模型算法和机理的研究。他在理论计算机的三大会议STOC/FOCS/SODA共发表论文30余篇,在计算经济学的两大顶级会议EC/WINE共发表论文20余篇。荣获ICALP2007、FAW2010、ISAAC2010、AAMAS2024,WINE 2025 等重要国际会议最佳/杰出论文奖。曾荣获ACM杰出科学家奖(2019)、第八届世界华人数学家大会ICCM数学奖(原晨兴数学奖)银奖(2019)、中国计算机学会青年科学家(2014)、微软学者(2008)、清华大学特等奖学金(2007)等荣誉。
4月18日下午
孔雨晴
演讲内容摘要
Villalobos 等人 (2024) 预测,公开可用的文本数据将在未来十年内枯竭。因此,在无法获取真实标注(ground-truth labels)的情况下提升模型性能变得愈发重要。我们提出了一种无需标注的后处理框架,利用一个虽然较弱但校准效果更好的参考模型,来改进一个性能强大但校准不良的模型。我们的框架能够在不依赖标签的情况下,确保最坏情况下的性能得到严格提升。我们的方法基于对“何时能实现严格提升”这一条件的刻画,即当强模型与参考模型未实现相互校准时。我们对这一条件进行了形式化处理,并将其与经济学中的“无套利”(no-arbitrage)结果建立了联系。我们证明了该问题可以采用 Bregman 投影作为解决方案,前人研究(Mohri 等人,2025)曾将其应用于语言模型的自我提升。我们将 Bregman 投影实现为一种针对强模型输出的高效后处理算法。在不同规模的代表性大语言模型(LLMs)上进行的实验表明,我们的无标注方法显著降低了适当损失(proper losses)和校准误差,实现了与有监督基线相当的性能。
演讲嘉宾简介
孔雨晴,现任北京大学前沿计算研究中心长聘副教授,博士生导师,北京大学博雅青年学者。2018年8月博士毕业于密歇根大学安娜堡分校计算机系,理论计算机方向。2013年6月毕业于中国科学技术大学数学系。主要研究方向是理论计算机与经济学的交叉方向,包括机制设计、信息激励、群体智慧等。在J. ACM,ACM EC,WWW,WINE,ITCS,ACM TEAC,SODA,NeurIPS,ICML,ICLR,AAAI,IJCAI,ECCV等会议期刊发表若干论文。
袁坤
演讲内容摘要
Pre-training LLMs is extremely resource-intensive, making optimizer efficiency critical. Matrix-based optimizers such as Muon and SOAP improve over AdamW by leveraging curvature, but their updates can be overly isotropic—conservative in flat directions and aggressive in sharp ones. We develop a unified Riemannian ODE view showing that preconditioning defines the geometry while momentum acts as Riemannian damping. Building on this insight, we propose LITE, an acceleration strategy that increases effective damping and learning rates along flat trajectories to speed progress in anisotropic landscapes. Experiments across Dense/MoE models (130M–1.3B), datasets (C4, Pile), and schedules validate consistent speedups for Muon and SOAP, supported by theory showing faster convergence along flat directions.
演讲嘉宾简介
北京大学助理教授、研究员。主要从事大规模最优化、大模型训练等方向的研究。
易鸣洋
演讲内容摘要
基于人类反馈的强化学习(Reinforcement Learning from Human Feedback, RLHF)及其变体方法已成为将大型语言模型与人类意图对齐的主导性技术范式。尽管经验层面效果显著,但这些方法在高维场景下的理论泛化特性仍有待深入探索。为此,我们通过算法稳定性理论框架,在线性奖励模型假设下构建了面向大语言模型RLHF的泛化理论体系。与现有基于奖励模型最大似然估计一致性分析的研究不同,本研究的分析立足于端到端学习框架,这更符合实际应用场景。具体而言,我们证明在关键的特征覆盖条件下,策略模型的经验最优解具有阶泛化误差界。进一步地,该结论可推广至基于梯度的学习算法(即梯度上升算法与随机梯度上升算法)所获得的参数解。因此,我们主张本文结果为RLHF处理后大语言模型经验性泛化能力提供了新的理论佐证。
演讲嘉宾简介
易鸣洋博士目前是中国人民大学信息学院助理教授,专注于人工智能基础理论、泛化性理论、生成式 AI 等前沿研究领域;于2022 年博士毕业于中国科学院数学与系统科学研究院,师从知名数学家、中国科学院院士马志明;曾获得“中国科学院优秀博士论文”(中科院系统应届博士生共100 名),“中国科学院院长特别奖”,中国人民大学“青年英才”,华为公司“总裁团队奖”等荣誉称号;已在高水平人工智能领域国际会议/期刊上累计发表论文二十余篇,其中以第一/共同第一作者身份在 ICML,NeurIPS,ICLR,CVPR,UAI 等领域内顶会发表论文十余篇。曾主持过包括国家青年基金项目和CCF-腾讯犀牛鸟项目在内的多个项目。
赵鹏
演讲内容摘要
Bandit models are a key framework for designing algorithms in interactive decision-making. While stochastic linear bandits are well-studied, real-world complexities have led to important extensions like generalized linear bandits (GLB) with nonlinear link functions, and heavy-tailed linear bandits (HvLB) to handle heavy-tailed noise. While optimal regret bounds have been established, existing algorithms are computationally impractical, requiring full data storage and repeated passes over all historical data. In this talk, I will introduce a "one-pass" method based on the Online Mirror Descent framework, a textbook-standard approach for regret optimization whereas we here use it as a statistical estimator. This approach achieves O(1) per-round computational cost while preserving optimal regret for GLB and HvLB. Then I will discuss extensions to online RL theory: (i) RL with multinomial logit function approximation, and (ii) RLHF with on-policy active data collection.
演讲嘉宾简介
赵鹏,南京大学人工智能学院准聘副教授,博士生导师,机器学习与数据挖掘所(LAMDA)成员。研究方向为机器学习基础理论,包括在线学习、随机优化、强化学习理论等。目前发表学术论文60余篇,包括JMLR、COLT、ICML、NeurIPS等顶级期刊和会议。担任Machine Learning (Springer)编委及ICML、NeurIPS等会议领域主席等。入选CCF优秀博士论文激励计划,曾获南京大学“小米青年学者-科技创新奖”、百度奖学金等。
宛袁玉
演讲内容摘要
In this talk, we focus on decentralized online convex optimization (OCO) in changing environments, and aim to minimize adaptive regret and dynamic regret. It is well-known that in the standard OCO, plenty of algorithms with (nearly) optimal bounds on these two metrics have been proposed. However, none of them has been extended into the decentralized OCO, possibly due to the difficulty in handling their commonly used two-level structure. To fill the gap, we first provide novel reductions from minimizing these two metrics of the decentralized OCO to minimizing them of OCO with delayed feedback. Furthermore, we revisit an existing black-box reduction from the delayed OCO to the standard OCO, and prove that it can also convert non-delayed algorithms for adaptive regret and dynamic regret into the delayed setting. Finally, we demonstrate the power of our reductions by establishing nearly optimal bounds on adaptive regret and dynamic regret of the decentralized OCO.
演讲嘉宾简介
宛袁玉,浙江大学百人计划研究员。主要研究方向包括机器学习理论、在线学习与优化、分布式优化等,目前以第一作者或通讯作者发表JMLR、TPAMI、ICML、NeurIPS等CCF-A类论文16篇和理论机器学习顶级会议COLT论文3篇。曾获江苏省人工智能学会优秀博士学位论文奖、ICML最佳审稿人等荣誉。担任了NeurIPS2025的领域主席,并多次担任COLT、ICML、NeurIPS、ICLR、TPAMI等会议和期刊的审稿人。
许天
演讲内容摘要
模仿学习从专家示例中学习策略模型,是大语言模型与具身智能模型的关键底层技术。然而,在长决策时域任务下,模仿学习往往面临误差累积与分布偏移等困难。值得关注的是,对抗模仿学习这类方法展现了优异的实践性能:仅依赖极少量甚至单条专家轨迹,也能在机器人运动控制等长序列决策任务中取得接近专家的性能。这一现象引出了两个长期未解的核心问题:对抗模仿学习为何能够在小样本条件下有效,以及其性能为何不随决策时域增长而显著退化。本报告围绕上述问题,介绍我们对一种基于总变差距离的对抗模仿学习方法(TV-AIL)的理论分析结果。针对一类从机器人运动控制任务抽象得到的MDP,我们证明了TV-AIL 具有$\mathcal{O} (\min \{1, \sqrt{|S|/N}\})$的模仿误差界,该误差界与决策时域无关,且在小样本与大样本条件下均具有理论意义。该结果为对抗模仿学习的优异实践性能提供了理论解释,并揭示了其缓解分布偏移问题的内在机制。分析过程中,我们利用TV-AIL 特有的多阶段策略优化结构,提出了一种新的阶段耦合分析方法;同时,该分析方法也帮助刻画了TV-AIL 在一般MDP 中的最坏情况表现,从而明确其适用范围与潜在局限。
演讲嘉宾简介
许天,南京大学人工智能学院助理研究员(毓秀青年学者),方向为强化学习的基础理论与在大模型上的应用,合作导师为俞扬教授,并加入周志华教授领导的机器学习与数据挖掘研究所(LAMDA)。负责国家自然科学基金博士研究生项目,在TPAMI、NeurIPS、ICML、ICLR 等领域顶级期刊和会议发表论文10 余篇,获得NeurIPS 2024 FITML 研讨会最佳论文第二名,工作推进了模仿学习理论误差的理解和消减方法的发展。
雷云文
演讲内容摘要
Recent developments of stochastic optimization often suggest biased gradient estimators to improve either the robustness, communication efficiency or computational speed. Representative biased stochastic gradient methods (BSGMs) include Zeroth-order stochastic gradient descent (SGD), Clipped-SGD and SGD with delayed gradients. In this talk, we present the first framework to study the stability and generalization of BSGMs for convex and smooth problems. We apply our general result to develop the first stability bound for Zeroth-order SGD with reasonable step size sequences, and the first stability bound for Clipped-SGD. While our stability analysis is developed for general BSGMs, the resulting stability bounds for both Zeroth-order SGD and Clipped-SGD match those of SGD under appropriate smoothing/clipping parameters.
演讲嘉宾简介
Yunwen Lei received the Ph.D. degree from the Wuhan University. He is currently an Assistant Professor with the Department of Mathematics, The University of Hong Kong.His research interests include machine learning, data science, learning theory, and stochastic optimization. He is an associate editor for Machine Learning, Transactions on Machine Learning Research, IEEE Transactions on Neural Networks and Learning Systems, and an area chair for ICML, NeurIPS, ICLR and AISTATS.
何海韵
演讲内容摘要
We study multi-bit watermarking for data generated by stochastic processes, where a hidden message is embedded during sampling and must be decodable by an authorized detector that possesses side information unavailable to unauthorized observers. In high-stakes deployments, a practical watermark must simultaneously control false alarms, preserve generation quality without distorting the output distribution, and support reliable multi-bit decoding. Satisfying all three goals at once inevitably creates fundamental trade-offs. We formulate watermark embedding as a distributional information-embedding problem and watermark detection as a multiple-hypothesis testing problem under distortion and rate constraints, leading to four fundamental metrics: false-alarm probability, detection error probability, distortion, and information rate. Within this information-theoretic framework, we derive matched converse and achievability bounds that characterize the optimal trade-offs and provide scheme-ag.
演讲嘉宾简介
Haiyun He is a Tenure-Track Assistant Professor in the Internet of Things Thrust at the Hong Kong University of Science and Technology (Guangzhou). She is also a Cross-Campus Faculty Affiliate at HKUST. Prior to joining HKUST(GZ), she was a Postdoctoral Associate in the Center for Applied Mathematics at Cornell University, working with Prof. Ziv Goldfeld and Prof. Christina Lee Yu. She earned her Ph.D. in Electrical and Computer Engineering (ECE) from the National University of Singapore (NUS) in Sep. 2022, advised by Prof. Vincent Y. F. Tan. She received her M.S. in ECE from NUS in 2017 and her B.S. in Electronics and Information Engineering from Beihang University, China, in 2016. Her research lies at the intersection of information theory (IT), machine learning (ML) and statistical learning. She aims to uncover the fundamental principles behind learning models' behavior, with a particular focus on how to enhance model generalization, trustworthiness, efficiency, and interpretability. Her work has been published in top-tier IT and ML venues, including IEEE TIT, JMLR, NeurIPS, AISTATS, ISIT, etc. She has served as a reviewer for IEEE TIT, JMLR, IEEE S&P, IEEE TNNLS, etc. In 2022, she was recognized as an EECS Rising Star by UT Austin. Her personal website is: https://haiyun-he.github.io.
张耀宇
演讲内容摘要
Condensation (also known as quantization, clustering, or alignment) is a widely observed phenomenon where neurons in the same layer tend to align with one another during the nonlinear training of deep neural networks (DNNs). It is a key characteristic of the feature learning process of neural networks. In recent years, to advance the mathematical understanding of condensation, we uncover structures regarding the dynamical regime, loss landscape and generalization for deep neural networks, based on which a novel theoretical framework emerges. This presentation will cover these findings in detail. First, I will present results regarding the dynamical regime identification of condensation at the infinite width limit, where small initialization is crucial. Then, I will discuss the mechanism of condensation at the initial training stage and the global loss landscape structure underlying condensation in later training stages, highlighting the prevalence of condensed critical points and globa
演讲嘉宾简介
上海交通大学长聘教轨副教授,2012年于上海交通大学致远学院获应用物理学学士学位及应用数学第二专业,2016年于上海交通大学数学科学学院获数学博士学位,2016年至2020年先后于纽约大学阿布扎比分校&库朗研究所、普林斯顿高等研究院从事博士后研究工作,2020年加入上海交通大学。他的研究领域聚焦深度学习的理论基础,代表性成果包括深度学习的凝聚现象与理论、损失景观的嵌入原则等。
16:00 - 17:30 Poster Session
4月19日上午
09:00 - 10:00 自由讨论
胡天阳
演讲内容摘要
While transformers are typically assumed to be "blank slates" at random initialization, we demonstrate that untrained models exhibit systematic structural biases, including extreme token preferences. We provide a mechanistic explanation by identifying a contraction of token representations along initialization-dependent directions, which is driven by the interaction of asymmetric MLP activations and self-attention value aggregation. We show these biases persist throughout training, forming a stable model identity that enables SeedPrint, a method to fingerprint LLMs by their random initialization. Finally, we establish that attention mechanism’s intra-sequence contraction is causally linked to the attention-sink phenomenon, providing a principled explanation for sink emergence and a pathway for systematic control.
演讲嘉宾简介
胡天阳现任香港中文大学(深圳)数据科学学院助理教授。他于清华大学获得数学学士学位,并分别于芝加哥大学和普渡大学获得统计学硕士和博士学位。在加入香港中文大学(深圳)前,他曾先后在华为诺亚方舟实验室和新加坡国立大学从事人工智能相关研究。其研究聚焦于数理统计与人工智能的交叉领域,包括统计机器学习、可信 AI、特征表示学习、深度生成模型等,旨在通过揭示 AI 模型的深层机制,为设计更有效的新算法提供理论指导。
许志钦
演讲内容摘要
理解深度学习在实际问题中的性能需要考虑模型特征、数据特征以及连接这两部分的优化算法的特征。该报告将从多种角度来分析数据特征,并设计实验来挖掘模型和优化的特征,以理解深度学习的泛化能力和语言模型的推理能力,并对实际的模型训练提供一些参考。我们发现小初始化会使模型更偏好推理的方式来解释数据,而非记忆的方式,这与模型在小初始化有凝聚的现象紧密相关。另外,数据中的一些重要统计量是形成嵌入结构的驱动力,并影响模型的推理能力。
演讲嘉宾简介
许志钦,上海交通大学自然科学研究院/数学科学学院教授。2012年本科毕业于上海交通大学致远学院。2016年博士毕业于上海交通大学,获应用数学博士学位。 2016年至2019年,在纽约大学阿布扎比分校和柯朗研究所做博士后。研究兴趣是大模型记忆和推理机制、频率原则、参数凝聚和能量景观嵌入原则,多尺度神经网络等。
孙嘉城
演讲内容摘要
While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in language modeling, the training efficiency, generation quality and flexibility remain constrained by the masking paradigm. We propose Deletion-Insertion Diffusion language models (DID) that rigorously formulate token deletion and insertion as discrete diffusion processes, replacing the masking and unmasking processes in current MDLMs, which improves training and inference efficiency by eliminating computational overhead in MDLMs. Moreover, we propse Variational Autoencoding Discrete Diffusion (VADD), a novel framework that enhances discrete diffusion with latent variable modeling to implicitly capture correlations among dimensions, to improve the parallel decording.
演讲嘉宾简介
Jiacheng Sun is a senior researcher of Huawei Foundation Model Dept. He received his Ph.D. degree from the School of Mathematical Science, Peking University. Before that, he graduated from the School of Mathematics and Statistics, Xi’an Jiaotong University. He was a visiting scholar at the University of Oxford from October 2014 to October 2015. His research interests include generative models, deep learning theory, etc. His work has been published in NeurIPS, ICML, ICLR, etc.
TBD
演讲内容摘要
Low-precision training is essential for scaling transformer pretraining, but it can introduce delayed instability that is often invisible in the early phase of optimization. In this talk, I present an invariance-based view of this phenomenon. The key observation is that many transformer components contain shift-invariant directions: for example, adding a constant offset to all logits does not change the softmax loss, and LayerNorm is likewise insensitive to constant shifts. In exact arithmetic, gradients cancel along these directions. Under low-precision computation, however, this cancellation is imperfect, and small arithmetic errors can accumulate into harmful parameter drift. I will show how this perspective leads to a simple theoretical picture in which the unstable component behaves like a variance-accumulating stochastic process, and how that analysis suggests a lightweight energy-based signal for early detection. Experiments on GPT-style pretraining under BF16 and FP8 illustrate that this signal can distinguish stable and unstable runs before the training loss clearly separates. I will conclude by discussing the practical implications of this viewpoint, including how symmetry-aware corrections may offer low-cost routes to more stable low-precision pretraining.
演讲嘉宾简介
梁诗宇,上海交通大学长聘教轨副教授,2021 年毕业于美国伊利诺伊大学香槟分校,国家自然科学基金委优秀青年基金(海外)获得者。长期从事机器学习理论、深度神经网络模型时空效率分析及人工智能在海洋与地球系统科学中的交叉应用研究,致力于将前沿 AI 方法引入复杂自然系统建模与极端环境过程重建,在方法论与应用层面形成系统性研究积累;相关工作已发表论文 20 余篇(10篇发表于 ICML、NeurIPS、ICLR、ACL 等 CCF A 类会议期刊,其中一篇引用逾 3,000 次),目前主持国家自然科学基金委优秀青年基金(海外)、科技部重点研发项目课题、国家自然科学基金青年项目及上海市海外高层次人才计划项目。
张智予
演讲内容摘要
Stein's method is a classical "descriptive" framework in probability theory for proving quantitative central limit theorems (CLT). This talk introduces a somewhat surprising algorithmic application of this framework in adversarial online linear optimization (OLO). Focusing on one-dimensional fixed-time OLO on a bounded domain, we present an algorithm based on Stein's method which is capable of achieving various "additively sharp" performance guarantees, surpassing the conventional big-O optimality. In particular, instantiations of this algorithm improve upon the total loss upper bounds of classical baselines including OGD and MWU. Conceptually, our algorithm can be viewed as a "continuous" refinement of a seminal dynamic programming algorithm of T. Cover (1966), improving its computational complexity. Technically, our construction is inspired by the remarkably clean proof of a Wasserstein CLT due to A. Röllin (2018). This is a joint work with Aaditya Ramdas at CMU.
演讲嘉宾简介
张智予现任浙江大学控制科学与工程学院百人计划研究员、博士生导师。此前,他于波士顿大学获得博士学位,并在哈佛大学和卡内基梅隆大学担任博士后研究员。他的研究方向为在线优化、统计学习、以及人工智能与机器人领域的算法应用。特别地,他的研究着重关注为机器学习中的算法设计构建新颖简洁的第一性原理,以此促进理论向应用的转换。
方聪
演讲内容摘要
在现代机器学习应用中,数据往往以流的形式动态到达,要求模型能够基于当前数据样本实时计算梯度并进行持续更新。本报告旨在探讨几类典型的流式学习场景,并系统研究在这些场景下优化训练效率的加速方法。具体内容涵盖以下几个方面:首先,针对非结构化学习问题中的一般性非凸目标函数,我们探讨了方差缩减技术,论证其如何保证算法在寻求一阶与二阶稳定点时达到最快的收敛速度;其次,在结构化问题中,对于广义线性回归所对应的凸优化问题,我们分析了动量方法在宽泛条件下的加速效果;最后,针对张量分解等非凸优化难题,我们研究了模型过参数化表达与适当的梯度归一化方法,论证其是缩小统计与计算之间鸿沟的关键有效策略。
演讲嘉宾简介
方聪,北京大学智能学院助理教授(博导)、国家级青年人才、北京大学博雅青年学者。方聪于2019年在北京大学获得博士学位,先后在普林斯顿大学和宾夕法尼亚大学进行博士后研究。方聪的主要研究方向是机器学习基础理论与算法,已发表包括PNAS、AoS、IEEE T.IT、JMLR、COLT、NeurIPS、PIEEE 等40余篇顶级期刊与会议论文,谷歌学术引用2000余次,担任机器学习顶级会议NeurIPS、ICML领域主席。
王睿杰
演讲内容摘要
面向真实开放环境,图增强大模型正由静态、封闭、端到端训练范式,转向面向噪声数据、序列任务与交互反馈的自演化智能新范式,对鲁棒表征、持续适配与闭环决策提出更高要求。报告围绕数据自演化、任务自演化和反馈自演化三条主线展开,系统总结了动态鲁棒表征、可持续学习调优和智能推理决策等研究布局,重点介绍了面向稀疏含噪信号的跨模态对齐、面向开放多样场景的持续学习与个性化适配,以及利用稀疏反馈实现关联发现、推理规划与行动生成闭环的阶段性进展。
演讲嘉宾简介
王睿杰,教授,博导,北航卓越青年学者,国家级青年人才。博士毕业于美国伊利诺伊大学厄巴纳-香槟分校,本科毕业于上海交通大学计算机系。主要研究方向为开放动态环境下的可信高效人工智能,涵盖自然语言、图与时间序列等数据的理论技术研究。旨在开发多模态可泛化基础模型及系统,利用结构化知识实现可靠的感知、推理与决策,其应用涵盖推荐系统、智能体及科学计算等领域。在NeurIPS、ACL、KDD、WWW、SIGIR、INFOCOM等国际知名会议与期刊上发表论文40余篇,其中CCF-A类论文20余篇,获得IEEE DCOSS唯一最佳论文等奖项。长期担任TKDE、TOIS、TMC、NeurIPS、ICML、ICLR、KDD、ACL Rolling Review等20余个知名会议领域主席、程序委员及期刊审稿人。
夏良昊
演讲内容摘要
在大语言模型时代,如何让AI系统真正理解和推理复杂的关系型数据,是一个亟待解决的关键问题。尽管大语言模型在人工智能领域带来了深刻变革,但其在处理结构化信息和理解复杂知识依赖关系方面仍存在根本性的局限。本研究围绕图基础模型展开,致力于应对这一核心挑战。本次报告将首先介绍我们在图基础模型方面的开创性工作。通过提出新颖的大规模模型构建技术,我们的模型能够有效捕获和推理复杂的图结构信息,实现了以往难以达到的鲁棒且可泛化的图学习能力。我们所提出的框架在处理大规模图结构的同时,能够有效保留关键的拓扑信息,从而突破了传统方法的瓶颈。随后,报告将展示我们的图学习框架如何通过检索增强生成和自主智能体等技术,将图结构化知识融入大语言模型,从而显著提升其推理能力。图基础模型与大语言模型的深度协同,使得对结构化数据的复杂推理成为可能,相关应用涵盖推荐系统、生物信息学、城市计算以及AI智能体等多个领域。本研究为基础模型的发展开辟了新方向,是迈向更强大、更具泛化能力的人工智能系统的重要一步。
演讲嘉宾简介
夏良昊,哈尔滨工业大学(深圳)计算机科学与技术学院教授。2021年于华南理工大学获博士学位,曾在香港大学从事博士后研究并担任研究助理教授。主要研究图学习、推荐系统和大模型智能体。13篇论文入选KDD、SIGIR、WWW等国际顶级会议的最具影响力论文,谷歌学术引用9000余次。曾获ACM MM 2024最佳论文荣誉提名奖、2025年世界人工智能大会云帆奖“明日之星”提名奖,入选2024和2025年斯坦福大学全球前2%科学家榜单。担任SIGIR、AAAI、ARR等国际会议高级程序委员会委员/领域主席,以及多个国际顶级会议和期刊的审稿人。
夏俊
演讲内容摘要
Molecular identification and discovery are vital in bioanalysis, environmental governance, and customs inspection. While spectroscopic tools (e.g., MS, IR, NMR, X-ray) enable molecular probing, current analysis relies heavily on database matching and expert interpretation—limiting novelty detection, efficiency, accuracy, and scalability. To overcome this, we developed SpectraAI: a holistic framework integrating (1) high-quality spectral data curation, (2) spectroscopy-specific large foundation models, (3) AI Agents for automated analysis, (4) biochemically informed algorithms, and (5) real-world deployment in biomedicine and environmental monitoring. In this talk, I will present SpectraAI’s architecture, recent advances, and key future directions—including de novo molecule discovery and multimodal spectral reasoning.
演讲嘉宾简介
Jun Xia is a joint assistant professor at The Hong Kong University of Science and Technology (Guangzhou) and The Hong Kong University of Science and Technology, leading the SpectraAI team with research focused on AI-based spectral data analysis for molecule identification and discovery. He received his Ph.D. degree from Zhejiang University and is a recipient of the KAUST Rising Star in AI honor , DAAD AInet Fellowship and other prestigious academic honors or awards. He has published over 50 papers in top journals and conferences such as Nature Methods, ICML, NeurIPS, and ICLR, including work recognized as the Most Influential Paper at WWW 2022 by PaperDigest and several oral/spotlight presentations at ICML, NeurIPS, CVPR, and AAAI. As an active contributor to the AI for Science community, Jun serves as Area Chair or Senior Program Committee for top venues including NeurIPS, ICLR, KDD AI for Science Track, IJCAI and reviewers for Nature Communications. Jun’s research is supported by funding from the NSFC, Tencent, Ant Group, TeleAI, ZhipuAI, DAAD and other institutions.
4月19日下午
周峰
演讲内容摘要
Transformer models rely on attention mechanism to capture long-range dependencies but suffer from quadratic complexity, limiting their scalability to long sequences. Kernel-based linear attention reduces this complexity but typically relies on fixed or weakly learnable kernels, restricting expressiveness and performance. In this work, we propose Flexformer, a flexible linear Transformer that learns attention kernels in a fully data-driven manner. Flexformer builds on random Fourier feature-based linear attention and treats spectral frequencies as trainable parameters, enabling the model to learn a broad family of attention kernels. We develop both stationary and nonstationary variants, with the latter offering strictly greater expressiveness. Extensive experiments on language modeling and sequence classification demonstrate that Flexformer consistently outperforms baselines. Moreover, Flexformer can be effectively distilled from pretrained Transformers to recover softmax attention and exhibits strong kernel transferability across domains, achieving both high efficiency and competitive performance on long-sequence tasks.
周峰,中国人民大学统计学院副教授,中国人民大学“杰出青年学者”,主要研究领域包括统计机器学习、贝叶斯方法、随机过程、大模型推理加速等,主持国家自然科学基金青年项目、面上项目,在JMLR、STCO、ICML、NeurIPS、ICLR、AAAI、KDD等国际期刊和会议上发表论文40余篇,担任NeurIPS、ICLR、IJCAI、AISTATS等国际会议领域主席,国际期刊《Statistics and Computing》副主编,《Transactions on Machine Learning Research》执行编辑,《Journal of Machine Learning Research》编委,中国商业统计学会人工智能分会副秘书长、全国工业统计学教学研究会青年统计学家协会第二届理事会理事、IEEE高级会员。
戴奔
演讲内容摘要
Semantic segmentation labels each pixel in an image with its corresponding class, and is typically evaluated using the Intersection over Union (IoU) and Dice metrics to quantify the overlap between predicted and ground-truth segmentation masks. In the literature, most existing methods estimate pixel-wise class probabilities, then apply argmax or thresholding to obtain the final prediction. These methods have been shown to generally lead to inconsistent or suboptimal results, as they do not directly maximize segmentation metrics. To address this issue, a novel consistent segmentation framework, RankSEG, has been proposed, which includes RankDice and RankIoU specifically designed to optimize the Dice and IoU metrics, respectively. Although RankSEG almost guarantees improved performance, it suffers from two major drawbacks. First, it is its computational expense-RankDice has a complexity of O(d log d) with a substantial constant factor (where d represents the number of pixels), while RankIoU exhibits even higher complexity O(d^2), thus limiting its practical application. For instance, in LiTS, prediction with RankSEG takes 16.33 seconds compared to just 0.01 seconds with the argmax rule. Second, RankSEG is only applicable to overlapping segmentation settings, where multiple classes can occupy the same pixel, which contrasts with standard benchmarks that typically assume non-overlapping segmentation. In this paper, we overcome these two drawbacks via a reciprocal moment approximation (RMA) of RankSEG with the following contributions: (i) we improve RankSEG using RMA, namely RankSEG-RMA, reduces the complexity of both algorithms to O(d) while maintaining comparable performance; (ii) inspired by RMA, we develop a pixel-wise score function that allows efficient implementation for non-overlapping segmentation settings.
演讲嘉宾简介
My name is Ben Dai (戴奔). I’m an Assistant Professor in the Department of Statistics at The Chinese University of Hong Kong.
邱怡轩
演讲内容摘要
Optimal transport (OT) has emerged as a fundamental tool in modern machine learning, yet its computational cost remains a significant bottleneck for large-scale applications. While harnessing the massive parallelism of modern GPU hardware is critical for efficiency, the de facto standard Sinkhorn algorithm, despite its ease of parallelization, often suffers from slow convergence in challenging problems. More recently, the sparse-plus-low-rank quasi-Newton method offers a balance between convergence rate and per-iteration complexity; however, its efficiency on GPUs is severely hindered by the serial nature of sparse matrix symbolic analysis and irregular memory access patterns. To bridge this gap, we present cuRegOT, a high-performance GPU solver tailored for entropic-regularized OT. We introduce a suite of algorithmic and architectural optimizations, including an amortized symbolic analysis strategy to mitigate CPU bottlenecks, an asynchronous Sinkhorn iterates generation mechanism, and a fused kernel for bandwidth-efficient gradient evaluation. These strategies are backed by rigorous theoretical guarantees ensuring algorithmic convergence. Extensive numerical experiments demonstrate that cuRegOT achieves significant speedups over state-of-the-art GPU-based solvers across a variety of benchmark tasks.
演讲嘉宾简介
邱怡轩,上海财经大学统计与数据科学学院副教授,博士毕业于普渡大学统计系,毕业后曾于卡内基梅隆大学担任博士后研究员。主要研究方向包括深度学习、生成式模型和大规模统计计算与优化等,科研成果发表在统计学国际权威期刊及机器学习顶级会议上。长期参与建设统计学与数据科学社区“统计之都”,是众多开源软件(如 Spectra、LBFGS++、ReHLine、RegOT 等)的开发者与维护者。
TBD
演讲内容摘要
TBD
演讲嘉宾简介
TBD
张景昭
演讲内容摘要
LLM training now spans multiple stages (pretraining → mid-training → SFT → post-training RL), and dataset choice increasingly determines benchmark outcomes and safety. We focus on two data-selection problems. First, in mid-training, we study how models acquire knowledge from mixed sources and show phase-transition behavior: the optimal response flips between sources rather than interpolating, and the critical mixture ratio depends on model size, implying scale-dependent mixture recipes. We then present a practical framework that learns per-domain losses as a function of (model size, mixture) and optimizes mixtures for benchmark scores. Second, for post-training RLVR, we argue for training on hard problems while addressing sparse rewards by augmenting out-of-distribution tasks with hints/partial solutions to speed learning.
演讲嘉宾简介
张景昭现任清华交叉信息研究院助理教授。 博士毕业于麻省理工学院计算机科学专业,导师是Prof. Suvrit Sra和 Prof. Ali Jadbabie。本科毕业于UC Berkeley, 导师是Laura Waller。 张景昭曾获伯克利研究生奖学金,MIT Lim奖学金,IIIS青年学者奖学金, MIT最佳AI&Decision Making 硕士论文, MIT 最佳 AI & Decision Making 博士论文, COLT最佳学生论文等奖项。 研究主要包含优化算法,神经网络训练,算法复杂性分析,机器学习理论,以及人工智能应用。
汪子乔
演讲内容摘要
在当前大模型对齐研究中,一个备受关注的现象是弱到强泛化(Weak-to-Strong Generalization, W2SG),即通过弱教师模型生成伪标签,指导强学生模型进行训练,从而实现学生在目标任务中反超教师的现象。尽管这一现象已被实证观察到,但其理论机理仍未被充分揭示。本报告围绕W2SG的理论分析展开,重点在于用Bregman散度下的广义偏差-方差分解刻画学生与教师之间的风险差异,首次在不依赖假设空间凸性这一强假设的前提下,推导出基于“预测不匹配”的W2SG不等式。我们进一步理论证明对于容量足够大的学生模型,W2SG现象更有可能出现。与此同时,在W2SG损失函数选择方面,我们理论上比较了标准交叉熵与反向交叉熵在W2SG场景下的表现,指出后者在面对教师预测不确定性时更加稳健。此外,我们通过实证分析验证了上述理论发现,包括学生模型容量对W2SG的影响,以及来自多个教师模型的平均监督对提升学生性能的作用。最后我们也会探讨预训练过程对弱到强泛化的促进作用。
演讲嘉宾简介
汪子乔,同济大学计算机科学与技术学院助理教授、博士生导师,入选国家级海外高层次青年人才、上海市白玉兰海外高层次青年人才。博士毕业于加拿大渥太华大学,研究方向为机器学习基础理论、大模型学习理论与算法以及信息论。主持国家自然科学基金青年基金项目1项,参与多项国家级及省部级项目。近几年主要成果发表在人工智能、机器学习及数据挖掘等相关领域国际顶级会议,涵盖NeurIPS、ICML、ICLR、UAI、AAAI、KDD、WWW等,博士论文被提名2025年加拿大人工智能协会最佳博士论文奖,以及提名2025年渥太华大学总督学术奖章和Pierre Laberge论文奖。担任 人工智能会议ICLR、NeurIPS领域主席,以及曾担任2024 年 IEEE 北美信息论暑期学校(NASIT)联合程序主席。