4月18日上午 | Keynote Session
TBD
演讲内容摘要
Most classical regularization-based methods for multi-dimensional imaging data recovery can solely represent multi-dimensional discrete data on meshgrid, which hinders their potential applicability in many scenarios beyond meshgrid. To break this barrier, we propose a series of continuous functional representation methods, which can continuously represent data beyond meshgrid with powerful representation abilities. Specifically, the suggested continuous representation manner, which maps an arbitrary coordinate to the corresponding value, can continuously represent data in an infinite real space. Such an ameliorated representation regime always facilitates better efficiency, accuracy, and wider range of available domains (e.g., non-meshgrid data) of regularization based methods. In this talk, we will introduce how to revolutionize the conventional low-rank, TV, non-local self-similarity regulation methods into their continuous ameliorations, i.e., Low-Rank Tensor Function Representation (termed as LRTFR), neural domain TV (termed as NeurTV), and Continuous Representation-based NonLocal method (termed as CRNL), respectively. We will also show extensive multi-dimensional data recovery applications arising from image processing (like image inpainting and denoising), machine learning (like hyperparameter optimization), and computer graphics (like point cloud upsampling) to validate the favorable performances of our method for continuous representation.
演讲嘉宾简介
孟德宇,西安交通大学数学与统计学院教授。长期致力于机器学习基础理论与算法的研究,近五年来,在机器学习相关领域期刊会议发表论文百余篇,多年入选科瑞维安、艾斯维尔高被引学者。入选长江学者特聘教授,中组部青年拔尖人才计划。现任中国工业与应用数学学会副理事长,CSIAM青工委主任,任TPAMI等七个国内外期刊编委。
董彬
演讲内容摘要
数学研究长期面临诸多限制效率的瓶颈问题,而人工智能技术的引入为突破这些瓶颈提供了新的可能。在这样的背景下,“AI for Mathematics”(AI4M)作为一个新兴的交叉研究领域应运而生。本报告将首先从数学研究自身的挑战与需求谈起,阐释为什么数学需要AI的深度赋能;继而介绍近年来AI4M领域的一些代表性成果,并比较不同技术路线的优势与局限。在此基础上,报告将指出,要显著提升AI的数学推理能力,核心在于推进数学知识的形式化,即数学的“数字化”。最后,报告将介绍北京大学AI4M团队的整体研究规划,展示团队在形式化模型与工具设计、自动推理系统构建以及高质量评测集开发等方面取得的阶段性成果,并对AI4M的未来发展进行展望。
演讲嘉宾简介
董彬,北京大学博雅特聘教授,任职于北京大学北京国际数学研究中心,兼任北京大学国际机器学习研究中心副主任、北京中关村学院常务副院长。主要研究领域为机器学习、科学计算和计算成像。曾获得2014年求是杰出青年学者奖,受邀在2022年世界数学家大会(ICM)作45分钟报告,入选2023年新基石研究员项目,同年获得王选杰出青年学者奖,受邀在2027年国际工业与应用数学大会(ICIAM)作邀请报告。
TBD
演讲内容摘要
大模型的能力在最近几年取得了突飞猛进的发展,一次次地刷新大家对它的期待,但与之形成鲜明对比的是我们对于大模型机理的认知还是远远不够的。我之前的研究方向是理论计算机,试图理解关于计算与复杂性的机理。近一年来,我主要在学习大模型的知识,特别是关注关于大模型机理的研究,在这个报告中我会汇报一下我这段时间以来的学习体会。
演讲嘉宾简介
陆品燕,上海财经大学“长江学者”特聘教授,计算机与人工智能学院的创院院长。 2009年1月于清华大学计算机系获博士学位后加入微软亚洲研究院,历任理论组副研究员,研究员,主管研究员,2015年12月加盟上海财经大学,领衔组建理论计算机科学研究中心(ITCS)。 他的主要研究方向是理论计算机,近期开始关注大模型算法和机理的研究。他在理论计算机的三大会议STOC/FOCS/SODA共发表论文30余篇,在计算经济学的两大顶级会议EC/WINE共发表论文20余篇。荣获ICALP2007、FAW2010、ISAAC2010、AAMAS2024,WINE 2025 等重要国际会议最佳/杰出论文奖。曾荣获ACM杰出科学家奖(2019)、第八届世界华人数学家大会ICCM数学奖(原晨兴数学奖)银奖(2019)、中国计算机学会青年科学家(2014)、微软学者(2008)、清华大学特等奖学金(2007)等荣誉。
4月18日下午
孔雨晴
演讲内容摘要
Villalobos 等人 (2024) 预测,公开可用的文本数据将在未来十年内枯竭。因此,在无法获取真实标注(ground-truth labels)的情况下提升模型性能变得愈发重要。我们提出了一种无需标注的后处理框架,利用一个虽然较弱但校准效果更好的参考模型,来改进一个性能强大但校准不良的模型。我们的框架能够在不依赖标签的情况下,确保最坏情况下的性能得到严格提升。我们的方法基于对“何时能实现严格提升”这一条件的刻画,即当强模型与参考模型未实现相互校准时。我们对这一条件进行了形式化处理,并将其与经济学中的“无套利”(no-arbitrage)结果建立了联系。我们证明了该问题可以采用 Bregman 投影作为解决方案,前人研究(Mohri 等人,2025)曾将其应用于语言模型的自我提升。我们将 Bregman 投影实现为一种针对强模型输出的高效后处理算法。在不同规模的代表性大语言模型(LLMs)上进行的实验表明,我们的无标注方法显著降低了适当损失(proper losses)和校准误差,实现了与有监督基线相当的性能。
演讲嘉宾简介
孔雨晴,现任北京大学前沿计算研究中心长聘副教授,博士生导师,北京大学博雅青年学者。2018年8月博士毕业于密歇根大学安娜堡分校计算机系,理论计算机方向。2013年6月毕业于中国科学技术大学数学系。主要研究方向是理论计算机与经济学的交叉方向,包括机制设计、信息激励、群体智慧等。在J. ACM,ACM EC,WWW,WINE,ITCS,ACM TEAC,SODA,NeurIPS,ICML,ICLR,AAAI,IJCAI,ECCV等会议期刊发表若干论文。
袁坤
演讲内容摘要
Pre-training LLMs is extremely resource-intensive, making optimizer efficiency critical. Matrix-based optimizers such as Muon and SOAP improve over AdamW by leveraging curvature, but their updates can be overly isotropic—conservative in flat directions and aggressive in sharp ones. We develop a unified Riemannian ODE view showing that preconditioning defines the geometry while momentum acts as Riemannian damping. Building on this insight, we propose LITE, an acceleration strategy that increases effective damping and learning rates along flat trajectories to speed progress in anisotropic landscapes. Experiments across Dense/MoE models (130M–1.3B), datasets (C4, Pile), and schedules validate consistent speedups for Muon and SOAP, supported by theory showing faster convergence along flat directions.
演讲嘉宾简介
北京大学助理教授、研究员。主要从事大规模最优化、大模型训练等方向的研究。
易鸣洋
演讲内容摘要
基于人类反馈的强化学习(Reinforcement Learning from Human Feedback, RLHF)及其变体方法已成为将大型语言模型与人类意图对齐的主导性技术范式。尽管经验层面效果显著,但这些方法在高维场景下的理论泛化特性仍有待深入探索。为此,我们通过算法稳定性理论框架,在线性奖励模型假设下构建了面向大语言模型RLHF的泛化理论体系。与现有基于奖励模型最大似然估计一致性分析的研究不同,本研究的分析立足于端到端学习框架,这更符合实际应用场景。具体而言,我们证明在关键的特征覆盖条件下,策略模型的经验最优解具有阶泛化误差界。进一步地,该结论可推广至基于梯度的学习算法(即梯度上升算法与随机梯度上升算法)所获得的参数解。因此,我们主张本文结果为RLHF处理后大语言模型经验性泛化能力提供了新的理论佐证。
演讲嘉宾简介
易鸣洋博士目前是中国人民大学信息学院助理教授,专注于人工智能基础理论、泛化性理论、生成式 AI 等前沿研究领域;于2022 年博士毕业于中国科学院数学与系统科学研究院,师从知名数学家、中国科学院院士马志明;曾获得“中国科学院优秀博士论文”(中科院系统应届博士生共100 名),“中国科学院院长特别奖”,中国人民大学“青年英才”,华为公司“总裁团队奖”等荣誉称号;已在高水平人工智能领域国际会议/期刊上累计发表论文二十余篇,其中以第一/共同第一作者身份在 ICML,NeurIPS,ICLR,CVPR,UAI 等领域内顶会发表论文十余篇。曾主持过包括国家青年基金项目和CCF-腾讯犀牛鸟项目在内的多个项目。
赵鹏
演讲内容摘要
Bandit models are a key framework for designing algorithms in interactive decision-making. While stochastic linear bandits are well-studied, real-world complexities have led to important extensions like generalized linear bandits (GLB) with nonlinear link functions, and heavy-tailed linear bandits (HvLB) to handle heavy-tailed noise. While optimal regret bounds have been established, existing algorithms are computationally impractical, requiring full data storage and repeated passes over all historical data. In this talk, I will introduce a "one-pass" method based on the Online Mirror Descent framework, a textbook-standard approach for regret optimization whereas we here use it as a statistical estimator. This approach achieves O(1) per-round computational cost while preserving optimal regret for GLB and HvLB. Then I will discuss extensions to online RL theory: (i) RL with multinomial logit function approximation, and (ii) RLHF with on-policy active data collection.
演讲嘉宾简介
赵鹏,南京大学人工智能学院准聘副教授,博士生导师,机器学习与数据挖掘所(LAMDA)成员。研究方向为机器学习基础理论,包括在线学习、随机优化、强化学习理论等。目前发表学术论文60余篇,包括JMLR、COLT、ICML、NeurIPS等顶级期刊和会议。担任Machine Learning (Springer)编委及ICML、NeurIPS等会议领域主席等。入选CCF优秀博士论文激励计划,曾获南京大学“小米青年学者-科技创新奖”、百度奖学金等。
宛袁玉
演讲内容摘要
In this talk, we focus on decentralized online convex optimization (OCO) in changing environments, and aim to minimize adaptive regret and dynamic regret. It is well-known that in the standard OCO, plenty of algorithms with (nearly) optimal bounds on these two metrics have been proposed. However, none of them has been extended into the decentralized OCO, possibly due to the difficulty in handling their commonly used two-level structure. To fill the gap, we first provide novel reductions from minimizing these two metrics of the decentralized OCO to minimizing them of OCO with delayed feedback. Furthermore, we revisit an existing black-box reduction from the delayed OCO to the standard OCO, and prove that it can also convert non-delayed algorithms for adaptive regret and dynamic regret into the delayed setting. Finally, we demonstrate the power of our reductions by establishing nearly optimal bounds on adaptive regret and dynamic regret of the decentralized OCO.
演讲嘉宾简介
宛袁玉,浙江大学百人计划研究员。主要研究方向包括机器学习理论、在线学习与优化、分布式优化等,目前以第一作者或通讯作者发表JMLR、TPAMI、ICML、NeurIPS等CCF-A类论文16篇和理论机器学习顶级会议COLT论文3篇。曾获江苏省人工智能学会优秀博士学位论文奖、ICML最佳审稿人等荣誉。担任了NeurIPS2025的领域主席,并多次担任COLT、ICML、NeurIPS、ICLR、TPAMI等会议和期刊的审稿人。
许天
演讲内容摘要
模仿学习从专家示例中学习策略模型,是大语言模型与具身智能模型的关键底层技术。然而,在长决策时域任务下,模仿学习往往面临误差累积与分布偏移等困难。值得关注的是,对抗模仿学习这类方法展现了优异的实践性能:仅依赖极少量甚至单条专家轨迹,也能在机器人运动控制等长序列决策任务中取得接近专家的性能。这一现象引出了两个长期未解的核心问题:对抗模仿学习为何能够在小样本条件下有效,以及其性能为何不随决策时域增长而显著退化。本报告围绕上述问题,介绍我们对一种基于总变差距离的对抗模仿学习方法(TV-AIL)的理论分析结果。针对一类从机器人运动控制任务抽象得到的MDP,我们证明了TV-AIL 具有$\mathcal{O} (\min \{1, \sqrt{|S|/N}\})$的模仿误差界,该误差界与决策时域无关,且在小样本与大样本条件下均具有理论意义。该结果为对抗模仿学习的优异实践性能提供了理论解释,并揭示了其缓解分布偏移问题的内在机制。分析过程中,我们利用TV-AIL 特有的多阶段策略优化结构,提出了一种新的阶段耦合分析方法;同时,该分析方法也帮助刻画了TV-AIL 在一般MDP 中的最坏情况表现,从而明确其适用范围与潜在局限。
演讲嘉宾简介
许天,南京大学人工智能学院助理研究员(毓秀青年学者),方向为强化学习的基础理论与在大模型上的应用,合作导师为俞扬教授,并加入周志华教授领导的机器学习与数据挖掘研究所(LAMDA)。负责国家自然科学基金博士研究生项目,在TPAMI、NeurIPS、ICML、ICLR 等领域顶级期刊和会议发表论文10 余篇,获得NeurIPS 2024 FITML 研讨会最佳论文第二名,工作推进了模仿学习理论误差的理解和消减方法的发展。
雷云文
演讲内容摘要
Recent developments of stochastic optimization often suggest biased gradient estimators to improve either the robustness, communication efficiency or computational speed. Representative biased stochastic gradient methods (BSGMs) include Zeroth-order stochastic gradient descent (SGD), Clipped-SGD and SGD with delayed gradients. In this talk, we present the first framework to study the stability and generalization of BSGMs for convex and smooth problems. We apply our general result to develop the first stability bound for Zeroth-order SGD with reasonable step size sequences, and the first stability bound for Clipped-SGD. While our stability analysis is developed for general BSGMs, the resulting stability bounds for both Zeroth-order SGD and Clipped-SGD match those of SGD under appropriate smoothing/clipping parameters.
演讲嘉宾简介
Yunwen Lei received the Ph.D. degree from the Wuhan University. He is currently an Assistant Professor with the Department of Mathematics, The University of Hong Kong.His research interests include machine learning, data science, learning theory, and stochastic optimization. He is an associate editor for Machine Learning, Transactions on Machine Learning Research, IEEE Transactions on Neural Networks and Learning Systems, and an area chair for ICML, NeurIPS, ICLR and AISTATS.
何海韵
演讲内容摘要
We study multi-bit watermarking for data generated by stochastic processes, where a hidden message is embedded during sampling and must be decodable by an authorized detector that possesses side information unavailable to unauthorized observers. In high-stakes deployments, a practical watermark must simultaneously control false alarms, preserve generation quality without distorting the output distribution, and support reliable multi-bit decoding. Satisfying all three goals at once inevitably creates fundamental trade-offs. We formulate watermark embedding as a distributional information-embedding problem and watermark detection as a multiple-hypothesis testing problem under distortion and rate constraints, leading to four fundamental metrics: false-alarm probability, detection error probability, distortion, and information rate. Within this information-theoretic framework, we derive matched converse and achievability bounds that characterize the optimal trade-offs and provide scheme-ag.
演讲嘉宾简介
Haiyun He is a Tenure-Track Assistant Professor in the Internet of Things Thrust at the Hong Kong University of Science and Technology (Guangzhou). She is also a Cross-Campus Faculty Affiliate at HKUST. Prior to joining HKUST(GZ), she was a Postdoctoral Associate in the Center for Applied Mathematics at Cornell University, working with Prof. Ziv Goldfeld and Prof. Christina Lee Yu. She earned her Ph.D. in Electrical and Computer Engineering (ECE) from the National University of Singapore (NUS) in Sep. 2022, advised by Prof. Vincent Y. F. Tan. She received her M.S. in ECE from NUS in 2017 and her B.S. in Electronics and Information Engineering from Beihang University, China, in 2016. Her research lies at the intersection of information theory (IT), machine learning (ML) and statistical learning. She aims to uncover the fundamental principles behind learning models' behavior, with a particular focus on how to enhance model generalization, trustworthiness, efficiency, and interpretability. Her work has been published in top-tier IT and ML venues, including IEEE TIT, JMLR, NeurIPS, AISTATS, ISIT, etc. She has served as a reviewer for IEEE TIT, JMLR, IEEE S&P, IEEE TNNLS, etc. In 2022, she was recognized as an EECS Rising Star by UT Austin. Her personal website is: https://haiyun-he.github.io.
TBD
演讲内容摘要
TBD
演讲嘉宾简介
TBD
4月19日上午
09:00 - 10:00 自由讨论
胡天阳
演讲内容摘要
While transformers are typically assumed to be "blank slates" at random initialization, we demonstrate that untrained models exhibit systematic structural biases, including extreme token preferences. We provide a mechanistic explanation by identifying a contraction of token representations along initialization-dependent directions, which is driven by the interaction of asymmetric MLP activations and self-attention value aggregation. We show these biases persist throughout training, forming a stable model identity that enables SeedPrint, a method to fingerprint LLMs by their random initialization. Finally, we establish that attention mechanism’s intra-sequence contraction is causally linked to the attention-sink phenomenon, providing a principled explanation for sink emergence and a pathway for systematic control.
演讲嘉宾简介
胡天阳现任香港中文大学(深圳)数据科学学院助理教授。他于清华大学获得数学学士学位,并分别于芝加哥大学和普渡大学获得统计学硕士和博士学位。在加入香港中文大学(深圳)前,他曾先后在华为诺亚方舟实验室和新加坡国立大学从事人工智能相关研究。其研究聚焦于数理统计与人工智能的交叉领域,包括统计机器学习、可信 AI、特征表示学习、深度生成模型等,旨在通过揭示 AI 模型的深层机制,为设计更有效的新算法提供理论指导。
许志钦
演讲内容摘要
理解深度学习在实际问题中的性能需要考虑模型特征、数据特征以及连接这两部分的优化算法的特征。该报告将从多种角度来分析数据特征,并设计实验来挖掘模型和优化的特征,以理解深度学习的泛化能力和语言模型的推理能力,并对实际的模型训练提供一些参考。我们发现小初始化会使模型更偏好推理的方式来解释数据,而非记忆的方式,这与模型在小初始化有凝聚的现象紧密相关。另外,数据中的一些重要统计量是形成嵌入结构的驱动力,并影响模型的推理能力。
演讲嘉宾简介
许志钦,上海交通大学自然科学研究院/数学科学学院教授。2012年本科毕业于上海交通大学致远学院。2016年博士毕业于上海交通大学,获应用数学博士学位。 2016年至2019年,在纽约大学阿布扎比分校和柯朗研究所做博士后。研究兴趣是大模型记忆和推理机制、频率原则、参数凝聚和能量景观嵌入原则,多尺度神经网络等。
孙嘉城
演讲内容摘要
While Masked Diffusion Language Models (MDLMs) relying on token masking and unmasking have shown promise in language modeling, the training efficiency, generation quality and flexibility remain constrained by the masking paradigm. We propose Deletion-Insertion Diffusion language models (DID) that rigorously formulate token deletion and insertion as discrete diffusion processes, replacing the masking and unmasking processes in current MDLMs, which improves training and inference efficiency by eliminating computational overhead in MDLMs. Moreover, we propse Variational Autoencoding Discrete Diffusion (VADD), a novel framework that enhances discrete diffusion with latent variable modeling to implicitly capture correlations among dimensions, to improve the parallel decording.
演讲嘉宾简介
Jiacheng Sun is a senior researcher of Huawei Foundation Model Dept. He received his Ph.D. degree from the School of Mathematical Science, Peking University. Before that, he graduated from the School of Mathematics and Statistics, Xi’an Jiaotong University. He was a visiting scholar at the University of Oxford from October 2014 to October 2015. His research interests include generative models, deep learning theory, etc. His work has been published in NeurIPS, ICML, ICLR, etc.
TBD
演讲内容摘要
TBD
演讲嘉宾简介
TBD
张智予
演讲内容摘要
Stein's method is a classical "descriptive" framework in probability theory for proving quantitative central limit theorems (CLT). This talk introduces a somewhat surprising algorithmic application of this framework in adversarial online linear optimization (OLO). Focusing on one-dimensional fixed-time OLO on a bounded domain, we present an algorithm based on Stein's method which is capable of achieving various "additively sharp" performance guarantees, surpassing the conventional big-O optimality. In particular, instantiations of this algorithm improve upon the total loss upper bounds of classical baselines including OGD and MWU. Conceptually, our algorithm can be viewed as a "continuous" refinement of a seminal dynamic programming algorithm of T. Cover (1966), improving its computational complexity. Technically, our construction is inspired by the remarkably clean proof of a Wasserstein CLT due to A. Röllin (2018). This is a joint work with Aaditya Ramdas at CMU.
演讲嘉宾简介
张智予现任浙江大学控制科学与工程学院百人计划研究员、博士生导师。此前,他于波士顿大学获得博士学位,并在哈佛大学和卡内基梅隆大学担任博士后研究员。他的研究方向为在线优化、统计学习、以及人工智能与机器人领域的算法应用。特别地,他的研究着重关注为机器学习中的算法设计构建新颖简洁的第一性原理,以此促进理论向应用的转换。
方聪
演讲内容摘要
在现代机器学习应用中,数据往往以流的形式动态到达,要求模型能够基于当前数据样本实时计算梯度并进行持续更新。本报告旨在探讨几类典型的流式学习场景,并系统研究在这些场景下优化训练效率的加速方法。具体内容涵盖以下几个方面:首先,针对非结构化学习问题中的一般性非凸目标函数,我们探讨了方差缩减技术,论证其如何保证算法在寻求一阶与二阶稳定点时达到最快的收敛速度;其次,在结构化问题中,对于广义线性回归所对应的凸优化问题,我们分析了动量方法在宽泛条件下的加速效果;最后,针对张量分解等非凸优化难题,我们研究了模型过参数化表达与适当的梯度归一化方法,论证其是缩小统计与计算之间鸿沟的关键有效策略。
演讲嘉宾简介
方聪,北京大学智能学院助理教授(博导)、国家级青年人才、北京大学博雅青年学者。方聪于2019年在北京大学获得博士学位,先后在普林斯顿大学和宾夕法尼亚大学进行博士后研究。方聪的主要研究方向是机器学习基础理论与算法,已发表包括PNAS、AoS、IEEE T.IT、JMLR、COLT、NeurIPS、PIEEE 等40余篇顶级期刊与会议论文,谷歌学术引用2000余次,担任机器学习顶级会议NeurIPS、ICML领域主席。
王睿杰
演讲内容摘要
面向真实开放环境,图增强大模型正由静态、封闭、端到端训练范式,转向面向噪声数据、序列任务与交互反馈的自演化智能新范式,对鲁棒表征、持续适配与闭环决策提出更高要求。报告围绕数据自演化、任务自演化和反馈自演化三条主线展开,系统总结了动态鲁棒表征、可持续学习调优和智能推理决策等研究布局,重点介绍了面向稀疏含噪信号的跨模态对齐、面向开放多样场景的持续学习与个性化适配,以及利用稀疏反馈实现关联发现、推理规划与行动生成闭环的阶段性进展。
演讲嘉宾简介
王睿杰,教授,博导,北航卓越青年学者,国家级青年人才。博士毕业于美国伊利诺伊大学厄巴纳-香槟分校,本科毕业于上海交通大学计算机系。主要研究方向为开放动态环境下的可信高效人工智能,涵盖自然语言、图与时间序列等数据的理论技术研究。旨在开发多模态可泛化基础模型及系统,利用结构化知识实现可靠的感知、推理与决策,其应用涵盖推荐系统、智能体及科学计算等领域。在NeurIPS、ACL、KDD、WWW、SIGIR、INFOCOM等国际知名会议与期刊上发表论文40余篇,其中CCF-A类论文20余篇,获得IEEE DCOSS唯一最佳论文等奖项。长期担任TKDE、TOIS、TMC、NeurIPS、ICML、ICLR、KDD、ACL Rolling Review等20余个知名会议领域主席、程序委员及期刊审稿人。
TBD
演讲内容摘要
TBD
演讲嘉宾简介
TBD
TBD
演讲内容摘要
TBD
演讲嘉宾简介
TBD
4月19日下午
周峰
演讲内容摘要
Autoregressive language models suffer from high inference latency due to their sequential decoding nature. Speculative decoding (SD) mitigates this by employing a lightweight draft model to propose candidate tokens, which are selectively verified by a larger target model. While existing methods either adopt multi-draft strategies to increase acceptance rates or block verification techniques to jointly verify multiple tokens, they remain limited by treating these improvements in isolation. In this work, we propose SpecTr-GBV, a novel SD method that unifies multi-draft and greedy block verification (GBV) into a single framework. By formulating the verification step as an optimal transport problem over draft and target token blocks, SpecTr-GBV improves both theoretical efficiency and empirical performance. We theoretically prove that SpecTr-GBV achieves the optimal expected acceptance length physically attainable within the framework of i.i.d. draft generation, and this bound improves as the number of drafts increases. Empirically, we evaluate SpecTr-GBV across five datasets and four baselines. Our method achieves superior speedup and significantly higher block efficiency while preserving output quality. In addition, we perform comprehensive ablation studies to evaluate the impact of various hyperparameters in the model.
演讲嘉宾简介
周峰,中国人民大学统计学院副教授,中国人民大学“杰出青年学者”,主要研究领域包括统计机器学习、贝叶斯方法、随机过程、大模型推理加速等,主持国家自然科学基金青年项目、面上项目,在JMLR、STCO、ICML、NeurIPS、ICLR、AAAI、KDD等国际期刊和会议上发表论文40余篇,担任NeurIPS、ICLR、IJCAI、AISTATS等国际会议领域主席,国际期刊《Statistics and Computing》副主编,《Transactions on Machine Learning Research》执行编辑,《Journal of Machine Learning Research》编委,中国商业统计学会人工智能分会副秘书长、全国工业统计学教学研究会青年统计学家协会第二届理事会理事、IEEE高级会员。
戴奔
演讲内容摘要
Semantic segmentation labels each pixel in an image with its corresponding class, and is typically evaluated using the Intersection over Union (IoU) and Dice metrics to quantify the overlap between predicted and ground-truth segmentation masks. In the literature, most existing methods estimate pixel-wise class probabilities, then apply argmax or thresholding to obtain the final prediction. These methods have been shown to generally lead to inconsistent or suboptimal results, as they do not directly maximize segmentation metrics. To address this issue, a novel consistent segmentation framework, RankSEG, has been proposed, which includes RankDice and RankIoU specifically designed to optimize the Dice and IoU metrics, respectively. Although RankSEG almost guarantees improved performance, it suffers from two major drawbacks. First, it is its computational expense-RankDice has a complexity of O(d log d) with a substantial constant factor (where d represents the number of pixels), while RankIoU exhibits even higher complexity O(d^2), thus limiting its practical application. For instance, in LiTS, prediction with RankSEG takes 16.33 seconds compared to just 0.01 seconds with the argmax rule. Second, RankSEG is only applicable to overlapping segmentation settings, where multiple classes can occupy the same pixel, which contrasts with standard benchmarks that typically assume non-overlapping segmentation. In this paper, we overcome these two drawbacks via a reciprocal moment approximation (RMA) of RankSEG with the following contributions: (i) we improve RankSEG using RMA, namely RankSEG-RMA, reduces the complexity of both algorithms to O(d) while maintaining comparable performance; (ii) inspired by RMA, we develop a pixel-wise score function that allows efficient implementation for non-overlapping segmentation settings.
演讲嘉宾简介
My name is Ben Dai (戴奔). I’m an Assistant Professor in the Department of Statistics at The Chinese University of Hong Kong.
邱怡轩
演讲内容摘要
Optimal transport (OT) has emerged as a fundamental tool in modern machine learning, yet its computational cost remains a significant bottleneck for large-scale applications. While harnessing the massive parallelism of modern GPU hardware is critical for efficiency, the de facto standard Sinkhorn algorithm, despite its ease of parallelization, often suffers from slow convergence in challenging problems. More recently, the sparse-plus-low-rank quasi-Newton method offers a balance between convergence rate and per-iteration complexity; however, its efficiency on GPUs is severely hindered by the serial nature of sparse matrix symbolic analysis and irregular memory access patterns. To bridge this gap, we present cuRegOT, a high-performance GPU solver tailored for entropic-regularized OT. We introduce a suite of algorithmic and architectural optimizations, including an amortized symbolic analysis strategy to mitigate CPU bottlenecks, an asynchronous Sinkhorn iterates generation mechanism, and a fused kernel for bandwidth-efficient gradient evaluation. These strategies are backed by rigorous theoretical guarantees ensuring algorithmic convergence. Extensive numerical experiments demonstrate that cuRegOT achieves significant speedups over state-of-the-art GPU-based solvers across a variety of benchmark tasks.
演讲嘉宾简介
邱怡轩,上海财经大学统计与数据科学学院副教授,博士毕业于普渡大学统计系,毕业后曾于卡内基梅隆大学担任博士后研究员。主要研究方向包括深度学习、生成式模型和大规模统计计算与优化等,科研成果发表在统计学国际权威期刊及机器学习顶级会议上。长期参与建设统计学与数据科学社区“统计之都”,是众多开源软件(如 Spectra、LBFGS++、ReHLine、RegOT 等)的开发者与维护者。
TBD
演讲内容摘要
TBD
演讲嘉宾简介
TBD
TBD
演讲内容摘要
TBD
演讲嘉宾简介
TBD
TBD
演讲内容摘要
TBD
演讲嘉宾简介
TBD